“Data unification is a major challenge in any data analytics pipeline”, according to Tamr. However, a recent whitepaper from the company indicates that enterprises are now applying three generations of artificial intelligence (AI) in order to address the issue.
In the mid-to-late 1990s, enterprise adoption of data warehouses gave rise to a need for tools to load disparate data sources into them. As a result, “a generation of Extract, Transform and Load (ETL) tools came into existence.”
Early ETL tools focused on the process of data extraction from source systems. In addition to this, these tools helped to transform data elements and load this data into the warehouse’s predetermined data model.
According to Tamr, this rules-based approach was an early form of AI. However, “the volume and variety of data continued to expand, time-consuming manual scripting began impeding organisations ability to quickly derive critical insights needed to drive business decisions.”
In Generation 2, ETL products evolved to include more sophisticated rules. While this was an advancement of Generation 1 capabilities, Generation 2 was still rudimentary by today’s AI standards.
In this period, organisations used AI-based rule technology to solve a variety of data unification problems. In fact, several vendors still provide Master Data Management (MDM) tools.
However, these “human-generated deterministic rules” are unable to scale to meet the data “enterprise data unification requirements” today. As a result, enterprise turned to a Generation 3 unification product – Tamr Unify.
In order to solve problems involving significant data variety, Tamr Unify uses machine learning (ML). Rule-based systems fall short in this area, but Tamr allows customers to apply the 500 manually created rules as training data to construct a classification model.
The model classifies the remaining 18 million transactions. In effect, Tamr uses ML to solve all of the “problems attacked by 2nd generation systems with rules.”
An ML model produces good results if the training data is “reasonably accurate” and covers the data set appropriately. Moreover, every Tamr customer uses humans to check a sample of ML output to maximise accuracy.
While there are three distinct generations of AI, the future of data unification requires a scalable generation solution. As AI is “applied to a broader range of data unification needs”, Generation 3 systems will continue to grow stronger and more comprehensive.
How can you avoid the simple errors most companies make with their data? Listen to our podcast with expert data strategist Jen Stirrup for her invaluable insights