AI and data: a crucial combination
Artificial Intelligence (AI) technology has the capacity to extract deeper insights from datasets than other techniques. Examples of AI are: speech recognition, natural language, processing, chatbots or voice bots. To get AI applications to work, big sets of high quality (open) data are necessary. But what requirements does this data have to meet? To answer this question, we need to look more closely at what requirements data need to have to enable successful AI applications.
Use of Open Data for AI
Access to (open) data can unlock the potential of AI applications. An example of an AI project worth noting shows how big amounts of Open Data are used at a police agency that focuses on crime prevention. Based on occurrences of crimes (data) and their frequency, the algorithm developed patterns to identify ‘hotspots’, areas where certain crimes are likely to occur in the future. These ‘hotspots’ are defined geographic areas within which the algorithm can make specific predictions as to the type of crime that could take place and when it is most likely to occur. Underlying these predictions are certain assumptions and patterns, such as that criminals often operate within the same area for long periods of time. This project was very successful, reducing burglaries by 33% and violent crimes by 21% in the city where it was applied.
The ingredients to let AI to function properly
AI systems need Open Data to function. To function properly, significant 1) data volume, 2) data variety and 3) data veracity (the truthfulness of data) are crucial. In traditional data analysis, when bad data is discovered, one can exclude the bad data and start over. However, this way of data cleansing cannot be done on a larger scale. Bad data cannot be detected or ‘pulled out’ of the system that easily. AI techniques draw conclusions from large masses of data and at some point, it becomes impossible to determine on which data elements these predictions are based. Often, when bad data is detected, the whole learning process starts again from the beginning, which is time- and cost-intensive.
For this reason, besides the V’s ‘volume’ and ‘variety’, ‘veracity’ is needed to detect whether the data is of good quality. Veracity refers to the truthfulness of the data. There are several questions to answer to find out whether the data is ‘truthful’:
Together with the three ‘V’s, Open Data can play a pivotal role in achieving benefits through the use of AI. Open Data is data that is made available by (public) organisations, businesses and individuals for anyone to access, use and share. When data is not open, data cannot be re-used for other purposes, such as AI. Access to open data will ‘unlock’ the potential of data-hungry machine AI applications.
Figure 1 Open Data and AI
How to achieve data quality?
Data quality requires an integral approach to make sure that people, processes and systems within organisations meet the demands to achieve continuous data quality. Three factors are important:
With the right software, people and rules in place, organisations can ensure that the organisational data system detects errors and delivers valuable new insights by making use of AI.
The future of AI?
Data is essential for AI as algorithms in these systems need large quantities of high-quality data to perform properly and to develop further by ‘learning’. The continuous promotion of data quality within organisations when using (open) data for AI is therefore essential to gain reliable insights. An important focus for (public) organisations is to make data openly available where possible to let other organisations (or people within their own organisation) use the data. Increasing access to high-quality data volumes is key to unleashing the potential of AI and to allow innovation to flourish.