Frequently Asked Questions

Using the portal

How can I contribute to the portal?

Give us feedback, e.g., suggest new functionalities you would like to see on the portal
Suggest a portal to be harvested by us
Share your story on how you make use of open data

Where could I find more information and documentation?

In the user manual
In the Studies section of the portal.

Can I use the data for business?

All data is available for free and can be used, for example, for business creation. For more details, see legal notice.

Why do the translations of content contain errors?

The portal uses the European Commission machine translation service eTranslation. This service covers all the EU's official languages. Machine translations can contain errors. If you spot any, let us know.

More information about the service is available on the European Commission website.

Which format can I download the data in?

Datasets can be exported to WMS, WFS, KML, HTML, Excel, PDF, XML, JSON, RSS, GML, SVG, SHP, PNG, JPEG, GIF, RDF-XML, RDF-Turtle, RDF-N3, OCTET STREAM, JSON-LD, and Atom.

Why does the portal harvest datasets published in non-proprietary file formats?

The portal collects all datasets from the portals it harvests, without excluding any formats. Data are collected in the file format provided by the source.

What is a licence?

A licence is an explicit and legally binding statement of recipients’ rights, restrictions and obligations in relation to a specific dataset. Usually, it is expressed through a written contract or through a unilateral statement from the rights holder(s), but it may also be expressed through legislation or other regulatory initiatives.

Why does the portal harvest datasets published under a non-commercial licence?

Data.europa.eu collects all datasets from the portals it harvests, without excluding datasets under non-commercial licences. Data is collected with the type of licence provided by the source.

What is analysed by the Metadata Quality Assurance tool?

The datasets stored in the portal need to be of an appropriate quality in terms of:

DCAT-AP-compliant mapping
Available distributions
Usage of machine-readable distribution formats
Usage of known open-source licences.

To check the datasets for these quality indicators the Metadata Quality Assurance (MQA) tool was developed. The MQA runs as a periodic process in parallel to the harvesting. CKAN and Virtuoso are filled with metadata through the harvesting process. As CKAN cannot store DCAT-AP-formatted datasets directly, the datasets are mapped into a JSON (JavaScript Object Notation) schema that is DCAT-AP compliant. The MQA uses this schema for checking each dataset for its DCAT-AP mapping compliance. If there are any compliance issues detected, for instance if a mandatory field is missing, the dataset is considered as not DCAT-AP compliant.

The MQA presents its results in two views.

The landing page or ‘Global Dashboard’. This view shows aggregated results for the entire service, i.e. the quality details for all catalogues.
The second view or ‘Catalogue Dashboard’. This view allows you to select a specific catalogue for which you want to display the quality details.

The current quality indicators include the following.

Distribution statistics:
1. accessible distributions
2. error status codes
3. download URL
4. existence,
5. top 20 catalogues with most accessible distributions,
6. ratio of machine-readable datasets,
7. most-used distribution formats,
8. top 20 catalogues mostly using common machine-readable datasets.
Dataset compliance statistics:
1. top violation occurrences,
2. compliant datasets,
3. top 20 catalogues with most DCAT-AP-compliant datasets.
Dataset licence usage:
1. ratio of known to unknown licences,
2. most used licences,
3. top 20 catalogues with most datasets with known licences.

Why do some datasets generate an error in the visualisation tool?

The visualisation tool uses the files as provided by the source. It is possible that the tool does not accept the provided file format or that the files are corrupted at the source. The portal has no influence on the source files.

How can I find a dataset that contains geographic coordinates?

The map search enables users to find datasets containing geo information from a specific region. You must type in the region or draw a bounding box on the map.

Collecting datasets

How can my portal be harvested?

You find all information on this page.

Reusing data

Which APIs are available and where can I find information about them?

API access URLs:

Search: https://data.europa.eu/api/hub/search/ (Note: Only 'Read-Only' actions are currently supported for this API)
SPARQL: https://data.europa.eu/sparql
Registry: https://data.europa.eu/api/hub/repo/
Use Cases: https://data.europa.eu/en/export-use-cases

API Documentation is available for the following systems:

Search: https://data.europa.eu/api/hub/search/
SPARQL: https://www.w3.org/TR/rdf-sparql-query/
Download MQA reports: https://data.europa.eu/api/mqa/reporter/index.html
How metadata is used checked by MQA: https://data.europa.eu/mqa/methodology
MQA API: https://data.europa.eu/api/mqa/cache/index.html
SHACL metadata validation: https://data.europa.eu/api/mqa/shacl/index.html
SHACL metadata validation UI: https://data.europa.eu/mqa/shacl-validator-ui/
Read access to triple store data content: https://data.europa.eu/api/hub/repo/index.html

Can I integrate an external application with the portal?

Integration on any external application with the portal can only happen at the dataset level by using the existing CKAN-API, via which you may "extract/query" datasets.

E.g., the API calls "https://data.europa.eu/api/hub/search/#tag/Ckan" and returns the list of datasets categorised in JSON format.

You can also use the SPARQL-Manager and run customised SPARQL queries against the Virtuoso RDF triple store that is synchronised with the CKAN repository.