Europees Data Portaal

Collecting data

Collecting data

Inhoudsopgave

 

With all preconditions in place, the data can be collected. Where should you start? What is relevant? This chapter goes into the details of collecting and identifying data. Collecting data can be approached from two angles: quick wins and thorough data management. It highly depends on the infrastructural choices within your organisation. Look at your strategy: Where will the data be managed? Will it be done centrally or is it processed at multiple units? 

General collection process

Create a process for collecting data that suits your situation. The following is a brief description of steps that might come in handy while creating your collection process. This process includes mapping the currently available datasets, prioritizing the datasets, practicing, topics to publish and publishing categories.

/nl/file/collection-processpngcollection-process.png

Different steps of the collection process

 

Map the currently available datasets

Start your Open Data initiative by creating an overview of the data that is already available in your organisation. This is a quick win: the data is there and you will have a list of all data and where it is managed. Ask your data-managing colleagues to help you with this. 

Prioritise the datasets

Not all datasets are relevant to publish right away. To prioritise your list, look at the following criteria: 

  • Can it be published (legally, politically, and organisationally)?
  • Is it of the right quality (and thus does not need thorough manipulation before publication)?
  • What about cleaning, anonymising, good quality and format?
  • Does it belong to one of the high-value topics? 

The datasets that meet these requirements should be prioritised: these are your quick wins. With this list, you have a complete overview of the data and you have identified what can be published, what not and what should be published first. Later on, you can choose to prioritise by demand or other parameters. 

Recommendation: Create Quick Wins and start with those. Practice your collection process first to get acquainted with it. You will be able to improve it, and answer questions that are asked about it.

Practice

Go through the collection process. What steps did you take? Who is responsible for the next part of the process? What is the standard process of collecting and prioritizing data? What will happen if new data is created or a data set is updated? Learn by doing and document the steps. 

The Irish Best Practice Handbook described a best practice around auditing your existing data, and suggests how to become aware of the datasets that are currently available within the organisation. See the Best Practice Statement below.

  1. Each public body should carry out a data audit of the data they currently manage
  2. Information on each data set should be recorded according to the standard metadata format of the national Open Data portal. Information for each data set should include:
    1. Potential for release as Open Data (governed by an ‘Open by Default’ principle) 
    2. Legal information 
    3. Organisational information 
    4. Technical information 
    5. Value assessment: 
      1. Datasets recognised as ‘high-value’ datasets should be released proactively 
      2. Data audit results should be made available on the national Open Data portal to enable users to request the publication (demand-driven publication)

Types of data to publish: the G8 Open Data Charter

Data is created, stored, and distributed covering a large variety of topics and categories. However, not all types of data are of equal relevance. In 2013, the G8 came together to discuss governmental transparency, innovation and accountability. This discussion led to the creation of the “G8 Open Data Charter” (Cabinet Office, 2013): a summary of visions and principles for creating a transparent Government, the opening up of data and its quality and quantity. 

Part of this charter holds valuable and useful guidelines concerning topics, data types and formats, and quality. The most relevant and high quality topics are summarized in the following 14 categories:

Data Category
(alphabetical order)

Example of DataSets
Companies      Company/business register 
Crime and Justice Crime statistics, safety 
Earth observation Meteorological/weather, agriculture, forestry, fishing, and hunting
Education List of schools; performance of schools, digital skills
Energy and Environment Pollution levels, energy consumption
Finance and contracts Transaction spend, contracts let, call for tender, future tenders, local budget, national budget (planned and spent)
Geospatial Topography, postcodes, national maps, local maps
Global Development Aid, food security, extractives, land
Government Accountability and Democracy Government contact points, election results, legislation and statutes, salaries (pay scales), hospitality/gifts
Health Prescription data, performance data
Science and Research Genome data, research and educational activity, experiment results
Statistics National Statistics, Census, infrastructure, wealth, skills
Social mobility and welfare Housing, health insurance and unemployment benefits
Transport and Infrastructure Public transport timetables, access points broadband penetration

The G8 High Value categories of data

The purpose of this list of categories is to ensure that Data Holders focus on the release of the right and most relevant types of data. This does not mean that other categories of data cannot be published. The list above gives an indication of the topics that should have the highest priority, as these datasets are indicated as datasets with the highest potential value. 

Publishing categories

Next to gathering categories, there are publishing categories. You might want to publish your data under another set of categories than the G8 list. Other portals have created their own set of categories as well. Think of your data: under which categories are you going to publish your data?

To provide you with an idea of how to categorise your data, here is an example. Please look at the categorisations as a re-user. Try to imagine that you are looking for a single file: how will you navigate towards it? There are pros and cons of both large and little amounts of categories. Try to find out what suits your purpose best and what you, imagining being a re-user, prefer as a logical structure. The one requirement is that it is automated through metadata.The figure below shows an example of the categorisation used by the European Data Portal linked to the DCAT Application Profile detailed in the next sections of this chapter.

/nl/file/dcaticonspngdcaticons.png

DCAT Icons for Categories

Example from http://www.europeandataportal.eu/

 

Inhoudsopgave