Portail européen de données

Cleaning Open Data

Cleaning Open Data

Session Overview: In this module we introduce the importance of cleaning up data for projects and innovation. We examine some of the most common errors found in open datasets and look at their effect on working with the data. We also introduce OpenRefine and how it improves data management by publishers.

Session number: 11

Expected participants: Relevant to all, special relevance to those working on data projects

Type: Training

Length: 2-3 hours

Exercises: Yes

Web based exercises: Yes

What to bring: Slides, Web-Connected Laptop

 

Session Flow:

  1. Understanding data cleaning - The facilitator should outline the importance of cleaning Open Data both for publishers preparing to Open Data and for users accessing data for their projects. The facilitator should guide the participants through the main types of errors found in Open Data such as mixed numerical scales, duplicated records, redundant data or spelling errors.
  2. Tools for data cleaning - The facilitator should introduce key tools for data cleaning including OpenRefine, Excel and any others relevant to the audience. The facilitator should highlight the key features of each solution and describe for participants the selection criteria for choosing the right tool for the problem.
  3. Practical data cleaning - The participants will undertake a data cleaning exercise using OpenRefine (exercise and datasets below). The facilitator should first play the OpenRefine video (below) then offer support to the participants in guiding them through the exercise steps.

 

Resources:

 

Companion eLearning Modules:

When running this session, we recommend that participants complete the following eLearning module before attending:

How to clean your data

Completion of the module will help your learners develop a shared understanding of the material before the course and allow you to focus in greater depth on those topics of most interest to the trainees.