Over the next few months, all six reports included in the ”Sustainability of (Open) Data Portals Infrastructure” will be summarised using featured highlights. This particular article will focus on the third report: “Data Reuse: A Method for Transforming Principles into Practice”. This report discusses a new approach to create an automated assessment of the re-use of data. The report elaborates on an example to guide portals through crucial aspects for an automated assessment of data re-use and increase engagement from customers.
The Challenge of an Automated Assessment of Data Re-use
An increasing amount of data is published openly on the web with the aim to foster re-use. Despite numerous efforts, portal owners and data publishers do not measure re-use routinely. Nevertheless, data re-usability is stated as one of the four FAIR principles – a compilation of high-level best practices for making data findable, accessible, interoperable and re-usable. While the FAIR metrics provides exemplary metrics for the FAIR principles, measuring FAIRness is not an established practice. There are a variety of best practices and guidelines (thoroughly explained in the report) detailing data sharing and re-use principles. However, the automated assessment of re-use remains a substantial challenge.
The first part of the report “Measuring Use and Impacts of Portals” suggests several solutions to track and assess data re-use automatically, including pixel tracking, dataset citations and enforcing log-ins. However, these methods all have their own set of limitations. Thus, it is vital to address an alternative assessment approach that focuses more on the re-use side of open data than the publishing side, including automation support. This third part of the report presents such an approach and introduces a method that helps portal owners understand what makes a dataset re-usable, using engagement data they can track themselves.
Method & Results
The method consists of the following steps, to be carried out by teams managing open data portals:
- Scope the assessment exercise.
- Define re-use metrics. These depend on the capabilities of your portal and the underlying technical infrastructure.
- Collect reuse metrics (or proxies). For this, you need technical capabilities which may be built into the publishing software being used, or aggregated metrics derived from lower-level system logs.
- Define reuse indicators. These need to be measurable and will be used as features in the prediction model.
- Analyse their distribution for the top-reused group of datasets.
- Use a combination of those features to build a statistical model to predict re-usability.
- Derive recommendations to datasets and publishing processes.
In the report, an extensive example is provided on how to apply the method, showing that it is possible to identify a basket of engagement metrics and predict the re-usability of a dataset based on attributes such as its structure, the way it was published and its documentation. In addition to the example, the report provides recommendations for portal owners to augment their publishing and portal design practice to support and enhance those features of a dataset that are quantifiably linked to higher engagement from users.
Even with current technologies, this approach can be valuable to inform:
- System designers on building functionalities to capture information automatically.
- Publishers in supplying certain information as metadata.
- User experience designers on how to build the interaction process between datasets re-users and the interface of a data portal.
- Portal owners on their portal development.
- Open data users in the wider ecosystem to help them identify the datasets that may be most useful to work with.
As stated, this article focused on a few key findings of the report. For more information on developing a method for automated assessment of your data reuse, explore the full report “Data Re-use: A Method for Transforming Principles into Practice” on the EDP website. Moreover, keep an eye out for our next the EDP team’s fourth featured highlight on 30 September 2020 that will focus on “Funding Portals: A Business Case Approach to Funding Model Longevity”.
For more information or examples on open data, explore the European Data Portal’s (EDP) news archive and featured highlight section. Aware of open data examples or stories? Share them with us via mail, and follow us on Twitter, Facebook or LinkedIn to stay up to date!