Databricks has been granted a patent for a system that generates dataflow graphs for ETL data processing pipelines defined by SQL queries. The system determines dependencies, enforces expectations, validates the dataflow graph, and processes operations accordingly. This innovation ensures efficient and accurate data processing. GlobalData’s report on Databricks gives a 360-degree view of the company including its patenting strategy. Buy the report here.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

According to GlobalData’s company profile on Databricks, was a key innovation area identified from patents. Databricks's grant share as of May 2024 was 37%. Grant share is based on the ratio of number of grants to total number of patents.

Etl data processing pipeline system with enforced expectations

Source: United States Patent and Trademark Office (USPTO). Credit: Databricks Inc

A newly granted patent (Publication Number: US12008040B2) describes a system that involves a computer processor and a non-transitory computer-readable storage medium with instructions to execute actions. These actions include receiving instructions from a client device to generate an extract, transform, and load (ETL) data processing pipeline defined by a set of operations, creating an in-memory dataflow graph for the pipeline, testing the dataflow graph with sample data, enforcing expectations for operations, and providing processing results back to the client device. The system ensures correct ordering of operations, validates dataflow graphs, and handles conditions on input datasets, such as stopping or quarantining data based on meeting or not meeting conditions.

Furthermore, the patent includes additional claims related to inserting expectation nodes in the dataflow graph, checking conditions on input datasets, counting records, quarantining data, and determining percentages of records not meeting conditions. The system also verifies the dataflow graph for errors without materializing input and output datasets. The patent covers a method and a computer program product embodied in a non-transitory computer-readable medium, all aimed at efficiently managing ETL data processing pipelines, ensuring data quality, and providing accurate results to clients. This innovative system streamlines data processing operations, enhances data validation processes, and improves overall data management within cloud platforms.

To know more about GlobalData’s detailed insights on Databricks, buy the report here.

Data Insights

From

The gold standard of business intelligence.

Blending expert knowledge with cutting-edge technology, GlobalData’s unrivalled proprietary data will enable you to decode what’s happening in your market. You can make better informed decisions and gain a future-proof advantage over your competitors.

GlobalData

GlobalData, the leading provider of industry intelligence, provided the underlying data, research, and analysis used to produce this article.

GlobalData Patent Analytics tracks bibliographic data, legal events data, point in time patent ownerships, and backward and forward citations from global patenting offices. Textual analysis and official patent classifications are used to group patents into key thematic areas and link them to specific companies across the world’s largest industries.