Frequently Asked Questions
General questions regarding the Portal
The difference between the ODP and EDP is that ODP is the Open Data portal of the European Union containing datasets that are collected and published by the European Institutions. EDP is a European portal that harvests metadata from public sector portals throughout Europe. EDP therefore focuses on data made available by European countries. In addition, EDP also harvests metadata from ODP. The European Commission is currently exploring how to bring those two portals closer together.
The European Data Portal pilots MT@EC, which is a European Commission machine translation service, covering all of the EU's official languages. EDP harvests metadata in the original language of the source portal, which is then translated. Those translations are machine automated translations and could therefore contain errors.
The European Union has adopted legislation to foster the re-use of Open (Government) Data. All the data are available for free and can be used for business creation.
The European Data Portal’s User Manual can be found here.
Harvesting of datasets
The European Data Portal initial content has been collected by harvesting national public data portals. Progressively, the portal will harvest additional data collected by regional, local and domain specific portals.
There are several technical requirements that you must provide in order to be harvested by the European Data Portal. You could find on this page a checklist of the different technical features expected.
If you need further details, you will find a complete documentation of the process in this file.
License: an explicit and legally binding statement of rights, restrictions and obligations of recipients in relation to a specific dataset. Usually expressed through a written contract or through a unilateral statement from the rights holder(s), but it may also be expressed through legislation or other regulatory initiatives.
The European Data Portal harvests all datasets from national, regional and local portals without excluding certain datasets. That means we do not have any influence on the type of licence used, as the licence is provided by the source. However, the promotion of the use of open licences is something we will continue promoting and recommending in the context of the European Data Portal project as well.
The European Data Portal harvests all datasets from national, regional and local portals without excluding certain datasets. That means that the portal does not have any influence on the file format used, as the file is provided by the source. However, the promotion of the use of open file formats is something we will continue promoting and recommending in the context of the European Data Portal project as well.
The datasets stored in the portal need to be of an appropriate quality in the terms of:
- DCAT-AP compliant mapping,
- Available distributions,
- Usage of machine readable distribution formats,
- Usage of known open source licenses.
In order to check the datasets for these quality indicators the Metadata Quality Assurance (MQA) tool was developed. The MQA runs as a periodic process in parallel to the harvesting. CKAN and Virtuoso are filled with metadata through the harvesting process. As CKAN cannot store DCAT-AP formatted datasets directly, the datasets are mapped into a JSON schema that is DCAT-AP compliant. The MQA uses this schema for checking each dataset for its DCAT-AP mapping compliance. If there are any compliance issues detected, for instance a mandatory field is missing, a dataset is considered as not DCAT-AP compliant.
The MQA uses the CKAN API for collecting information about all harvested catalogues, MQA runs through all CKAN catalogues in parallel while collecting the required information to fulfil the quality checks. During this process, several checks are performed for each dataset. The results are stored in the MQA database and propagated via the MQA page on the portal or as downloadable sheets and pdf documents. Downloadable MQA documents are only updated after a MQA run has finished. For one run the MQA needs a couple of days. That is because the MQA checks each distribution of each dataset for its availability. Checking a distribution availability may take several seconds, with almost 800.000 datasets with 2 to 50 distributions per dataset, this takes some time.
The MQA presents its results in two views:
- The landing page called the "Global Dashboard". This view shows aggregated results for the entire EDP portal, i.e. showing the quality details for all catalogues.
- The second view "Catalogue Dashboard". This view allows you to select a specific catalogue for which you want to display the quality details.
The current quality indicators include the following:
- Distribution Statistics
- Accessible Distributions
- Error Status Codes
- DowloadURL existence
- Top 20 catalogues with most accessible distributions (*)
- Ratio machine readable datasets
- Most used distribution formats
- Top 20 catalogues mostly using common machine-readable datasets (*)
- Dataset Compliance Statistics
- Top Violation Occurrences
- Compliant Datasets
- Top 20 catalogues with most DCAT-AP compliant datasets (*)
- Dataset Licence Usage
- Ratio known to unknown licences
- Most used licences
- Top 20 catalogues with most datasets of known licences
(*)The Top 20 indicators are only available for the Global Dashboard View.
Most results of the MQA are presented in charts (pie-charts, bar-charts). I you need further information for a chart, you can always click on the "i" icon in upper right corner of each chart that will provide you additional help. Some charts have the label "?" in the x-axis. This indicates an aggregation of unknown or not-set-entities in the data. For instance, if a chart shows the most used distribution formats and for some distributions, no format is provided.
The visualization tool is dependent on the files provided by the source. It might happen that the format is not accepted or that the files are corrupted at the source. European Data Portal has no influence on the datasets from the harvested portals.
The map search enables to find datasets from a specific region. You only have to type in the region or draw a bounding box on the map, but results are only displayed for datasets that have geo information stored.
API and integration
API access URLs can be found here:
CKAN: https://www.europeandataportal.eu/data/search/ (Note: Only 'Read-Only' actions are currently supported for this API)
Use Cases: https://www.europeandataportal.eu/en/export-use-cases
API Documentation is available for the following system:
Download MQA reports: https://www.europeandataportal.eu/api/mqa/reports/report/en/pdf
How metadata is used checked by MQA: https://www.europeandataportal.eu/mqa/methodology
MQA API: https://www.europeandataportal.eu/api/mqa/cache/
SHACL metadata validation: https://www.europeandataportal.eu/shacl/
Read access to triple store data content: https://www.europeandataportal.eu/data/api/
Integration on any external application with the European Data Portal can only happen at the dataset level by using the existing CKAN-API, via which you may "extract/query" datasets.
e.g. the API calls "https://www.europeandataportal.eu/data/search/ckan/package_search" and returns the list of dataset categorories in Json format.
You can also use the SPARQL-Manager and run customized SPARQL queries against the Virtuoso RDF triple store that is synchronized with the CKAN repository.
Looking for information and data on the portal
Datasets can be exported to WMS, WFS, KML, HTML, Excel, PDF, XML, JSON, RSS, GML, SVG, SHP, PNG, JPEG, GIF, RDF-XML, RDF-Turtle, RDF-N3, OCTET STREAM, JSON-LD and Atom.
The European Data Portal contains different search engines which have different behaviors:
- In the page header, the 'Search' link and the 'Portal Search' box provide results only from editorial content (articles, main menu content etc.).
Example: If you are looking for articles or reports on the portal about the economic value of Open Data, you can search here for "economic value".
- In the middle of the homepage, the ‘Search Datasets’ provides results from all the datasets. It is possible to refine the results with the leftside bar by applying faceted filters.
Example: If you are looking for datasets about pollution, you can search here for "pollution".
- On the 'Datasets' page, the 'Search datasets' provides results from all the datasets. It is possible to refine the results with the leftside bar by applying faceted filters.
Example: If you are looking for datasets in a format you can do data analysis, you can search here for "pollution" and then select the format "CSV" in the leftside bar.
- On the 'Catalogues' page, the 'Search catalogues' provides only results related to catalogues (e.g. search for catalogues by country name, catalogue name etc.).
Example: If you are looking for datasets from Spain, you can search here for "spain".