In 1988, Sotheby’s sold a collection of 175 pottery cookie jars for $907,995. These included images of pigs, mice, goats, sheep, Humpty Dumpty, and a large panda. Estimated to sell for $75 each, the jars were eventually sold for over three times their evaluated worth. Ultimately, what valuators failed to consider in their initial estimations was that the collection belonged to famed artist Andy Warhol.

Provenance – the history of ownership and identity of an object – has a direct impact on an object’s value and authenticity. Items owned or used by important historical figures will, of course, carry more value at an auction than objects that do not have the same type of verification.

Though this example highlights the importance of authenticity, context, and accuracy in the art world, the lesson is also applicable to every other industry in the business landscape. For instance, a lack of visibility into the source or origin of data can result in poorly-informed business decisions, that in turn can escalate into costly rectification measures or legal disputes.

In the same way that provenance and context dictate the value of a piece of art, data lineage empowers organisations with a clear understanding of where their data comes from, who uses and accesses it, what it is being used for, and how it is defined. Each piece of the puzzle is solved through data lineage, which ultimately allows organisations to create valuable business insights. Without understanding any of these factors, businesses simply cannot trust their data. In the same way that the pottery cookie jar collection was incorrectly valued due to a lack of understanding around the collection’s provenance, businesses cannot generate valuable business insights without understanding their data’s origin and history.

Using data lineage to drive business insight

A recent survey conducted by O’Reilly on the state of data quality in 2020 has shown that a worryingly low 20% of organisations publish data provenance and data lineage policies in their internal guidelines. Of the remaining respondents who did not have these data lineage policies in place, a majority reported that they still had no plans to implement any data lineage practices. As more businesses become software-driven, and the amount of data collected continues to rise, this approach can result in both a chaotic business environment and a compliance nightmare.

Do you see an impact on recruitment in your company due to the Covid-19 pandemic?

View Results

Loading ... Loading ...

In the same research, the most common problem that organisations had with their data quality was that they had too many data sources and inconsistent data. As a result, technical teams are faced with the arduous and time-consuming challenge of manually defining, cataloguing, and mapping the relationships between their data. For instance, it may take up to an entire year for a developer to archive an organisation’s data flows, especially as those data flows are used, transferred, and are evolving in real-time. Consequently, this delays the business decision-making process, and means less time can be spent on strategic initiatives; creating inefficiency across the entire value chain.

Data lineage, however, helps alleviate any issues of data inconsistency by generating a graph that documents and traces the interdependencies of data held by the organisation; ultimately providing a map of data consistency, accuracy, and completeness. Additionally, the use of automation allows businesses to quickly extract data lineage information from multiple sources as it moves through their system in real-time; creating a live visualisation of how data is flowing throughout the organisation whilst simultaneously saving valuable time. This helps businesses overcome the challenge of centralising, and improving consistency across, their various data sources.

Leveraging automation in data lineage to bolster compliance

Data lineage not only improves efficiency and accelerates time to insight, it also helps organisations bolster their regulatory compliance. From GDPR to CCPA, data regulations are being passed by governments from countries across the world to safeguard both companies and customers from an increasingly data-driven world. Unfortunately, while businesses are making strides in their compliance efforts, they are still lagging behind in their regulatory compliance – in 2019 alone, major enterprises faced fines up to £183 million for mishandling customers’ data; harming not only their bottom lines, but also their corporate reputation and trustworthiness.

These examples of high-profile fines highlight the ramifications for even the most well-resourced, entrenched institutions that fail to provide a clear picture of their data and how they are working to alleviate any risks of data breaches. The larger a business grows, the more complex its data structures become – the amount of their data warehouses multiply, making it more challenging to contextualise and identify the right information required for data compliance audits.

Compliance auditing requires businesses to create various reports, where they must evaluate how data flows through their system, from source to destination. More importantly, the IT and business controls at each stage of the data flow must be made clear, to allow regulators to understand what measures are in place to secure data and determine whether it has been manipulated.

Automated data lineage allows businesses to continuously update their data flows, as well as rapidly extract the exact information needed for audits from various sources. This saves precious time and ensures teams are prepared with the most up-to-date, comprehensive information when data audits or reports are required.

Data lineage provides a roadmap to data consistency, completeness, timeliness, and conformity at every point in the data journey. Additionally, as the business landscape moves faster and regulatory requirements become increasingly demanding, integrating automation into data lineage is also a critical measure for responding to compliance audits in a timely manner. When businesses understand the origin, context, and definition of their data, they can trust their findings and confidently make informed business decisions. Equally importantly, it allows them to clearly present their decision-making logic and demonstrate control over their data when questioned by regulatory bodies.

3 Things That Will Change the World Today

Through strong data lineage policies and practices, businesses can take control of the proverbial auctioneer’s hammer, and, like a true art gallery owner, clearly define and explain the value of their collection of data.


Read more: The science of art: How cancer cells inspired researcher’s emotive paintings