Den Europæiske Dataportal

Frequently Asked Questions

Frequently Asked Questions

General questions regarding the Portal

How can I contribute to the portal?

You can make suggestions what functionalities you would like to see on the portal via this feedback form. It is possible to provide the URL of your public sector Open Data portal to be harvested by us via this page. Feel free to share your story how you make use of Open Data here.

What is the difference between the Open Data Europe Portal (ODP open-data.europa.eu/fr/data) and European Data Portal (EDP europeandataportal.eu)?

The difference between the ODP and EDP is that ODP is the Open Data portal of the European Union containing datasets that are collected and published by the European Institutions. EDP is a European portal that harvests metadata from public sector portals throughout Europe. EDP therefore focuses on data made available by European countries. In addition, EDP also harvests metadata from ODP. The European Commission is currently exploring how to bring those two portals closer together.

Why do the translations of content contain errors?

The European Data Portal pilots MT@EC, which is a European Commission machine translation service, covering all of the EU's official languages. EDP harvests metadata in the original language of the source portal, which is then translated. Those translations are machine automated translations and could therefore contain errors.

Are there some PDF documentations?

You can download reports and various materials from the Library.

Can I use the data for business?

The European Union has adopted legislation to foster the re-use of Open (Government) Data. All the data are available for free and can be used for business creation.

Where can I find the User Manual of the European Data Portal?

The European Data Portal’s User Manual can be found here.

Harvesting of datasets

How can my portal be harvested?

The European Data Portal initial content has been collected by harvesting national public data portals. Progressively, the portal will harvest additional data collected by regional, local and domain specific portals.

There are several technical requirements that you must provide in order to be harvested by the European Data Portal. You could find on this page a checklist of the different technical features expected.

If you need further details, you will find a complete documentation of the process in this file.

What is a licence?

License: an explicit and legally binding statement of rights, restrictions and obligations of recipients in relation to a specific dataset. Usually expressed through a written contract or through a unilateral statement from the rights holder(s), but it may also be expressed through legislation or other regulatory initiatives.

Why does the portal harvest datasets published under a non commercial licence?

The European Data Portal harvests all datasets from national, regional and local portals without excluding certain datasets. That means we do not have any influence on the type of licence used, as the licence is provided by the source. However, the promotion of the use of open licences is something we will continue promoting and recommending in the context of the European Data Portal project as well.

Why does the portal harvest datasets published in non proprietary file formats?

The European Data Portal harvests all datasets from national, regional and local portals without excluding certain datasets. That means that the portal does not have any influence on the file format used, as the file is provided by the source. However, the promotion of the use of open file formats is something we will continue promoting and recommending in the context of the European Data Portal project as well.

What is analysed by the Metadata Quality Assurance (MQA) Tool?

The datasets stored in the portal need to be of an appropriate quality in the terms of:

  • DCAT-AP compliant mapping,
  • Available distributions,
  • Usage of machine readable distribution formats,
  • Usage of known open source licenses.

In order to check the datasets for these quality indicators the Metadata Quality Assurance (MQA) tool was developed. The MQA runs as a periodic process in parallel to the harvesting. CKAN and Virtuoso are filled with metadata through the harvesting process. As CKAN cannot store DCAT-AP formatted datasets directly, the datasets are mapped into a JSON schema that is DCAT-AP compliant. The MQA uses this schema for checking each dataset for its DCAT-AP mapping compliance. If there are any compliance issues detected, for instance a mandatory field is missing, a dataset is considered as not DCAT-AP compliant.

The MQA uses the CKAN API for collecting information about all harvested catalogues, MQA runs through all CKAN catalogues in parallel while collecting the required information to fulfil the quality checks. During this process, several checks are performed for each dataset. The results are stored in the MQA database and propagated via the MQA page on the portal or as downloadable sheets and pdf documents. Downloadable MQA documents are only updated after a MQA run has finished. For one run the MQA needs a couple of days. That is because the MQA checks each distribution of each dataset for its availability. Checking a distribution availability may take several seconds, with almost 800.000 datasets with 2 to 50 distributions per dataset, this takes some time.

The MQA presents its results in two views:

  • The landing page called the "Global Dashboard". This view shows aggregated results for the entire EDP portal, i.e. showing the quality details for all catalogues.
  • The second view "Catalogue Dashboard". This view allows you to select a specific catalogue for which you want to display the quality details.

The current quality indicators include the following:

  1. Distribution Statistics
    1. Accessible Distributions
    2. Error Status Codes
    3. DowloadURL existence
    4. Top 20 catalogues with most accessible distributions (*)
    5. Ratio machine readable datasets
    6. Most used distribution formats
    7. Top 20 catalogues mostly using common machine-readable datasets (*)
  2. Dataset Compliance Statistics
    1. Top Violation Occurrences
    2. Compliant Datasets
    3. Top 20 catalogues with most DCAT-AP compliant datasets (*)
  3. Dataset Licence Usage
    1. Ratio known to unknown licences
    2. Most used licences
    3. Top 20 catalogues with most datasets of known licences

(*)The Top 20 indicators are only available for the Global Dashboard View.

 

Most results of the MQA are presented in charts (pie-charts, bar-charts). I you need further information for a chart, you can always click on the "i" icon in upper right corner of each chart that will provide you additional help. Some charts have the label "?" in the x-axis. This indicates an aggregation of unknown or not-set-entities in the data. For instance, if a chart shows the most used distribution formats and for some distributions, no format is provided.

Dataset visualisation

Why some datasets generate error in the visualization tool?

The visualization tool is dependent on the files provided by the source. It might happen that the format is not accepted or that the files are corrupted at the source. European Data Portal has no influence on the datasets from the harvested portals.

How to find dataset which contains geo coordinates?

The map search enables to find datasets from a specific region. You only have to type in the region or draw a bounding box on the map, but results are only displayed for datasets that have geo information stored.

API and integration

API and integration

API access URLs can be found here:

CKAN: http://www.europeandataportal.eu/data/en/api/3
SPARQL: http://www.europeandataportal.eu/sparql

API Documentation is available for the following system:
CKAN: http://docs.ckan.org/en/ckan-2.5.2/api/
SPARQL: https://www.w3.org/TR/rdf-sparql-query/

Can I integrate the European Data Portal on an external application?

Integration on any external application with the European Data Portal can only happen at the dataset level by using the existing CKAN-API, via which you may "extract/query" datasets.

e.g. the API calls "http://www.europeandataportal.eu/data/api/3/action/group_list" and returns the list of dataset categorories in Json format.

You can also use the SPARQL-Manager and run customized SPARQL queries against the Virtuoso RDF triple store that is synchronized with the CKAN repository.

Looking for information and data on the portal

In which format can I download the data?

Datasets can be exported to WMS, WFS, KML, HTML, Excel, PDF, XML, JSON, RSS, GML, SVG, SHP, PNG, JPEG, GIF, RDF-XML, RDF-Turtle, RDF-N3, OCTET STREAM, JSON-LD and Atom.

What is the difference between the search engines?

The European Data Portal contains different search engines which have different behaviors:

  • In the page header, the 'Search' link and the 'Portal Search' box provide results only from editorial content (articles, main menu content etc.).
    Example: If you are looking for articles or reports on the portal about the economic value of Open Data, you can search here for "economic value".
  • In the middle of the homepage, the ‘Search Datasets’ provides results from all the datasets. It is possible to refine the results with the leftside bar by applying faceted filters.
    Example: If you are looking for datasets about pollution, you can search here for "pollution".
  • On the 'Datasets' page, the 'Search datasets' provides results from all the datasets. It is possible to refine the results with the leftside bar by applying faceted filters.
    Example: If you are looking for datasets in a format you can do data analysis, you can search here for "pollution" and then select the format "CSV" in the leftside bar.
  • On the 'Catalogues' page, the 'Search catalogues' provides only results related to catalogues (e.g. search for catalogues by country name, catalogue name etc.).
    Example: If you are looking for datasets from Spain, you can search here for "spain".