Wipro has been granted a patent for a method and system that extracts information from multi-format input documents. The process involves creating and realigning an HTML document, classifying content using a machine learning model, generating a hierarchy configuration file, and extracting relevant data attributes. GlobalData’s report on Wipro gives a 360-degree view of the company including its patenting strategy. Buy the report here.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

According to GlobalData’s company profile on Wipro, Hybrid cloud mgmt was a key innovation area identified from patents. Wipro's grant share as of July 2024 was 61%. Grant share is based on the ratio of number of grants to total number of patents.

Method for extracting information from multi-format documents

Source: United States Patent and Trademark Office (USPTO). Credit: Wipro Ltd

The patent US12032651B2 outlines a method and system for extracting information from input documents that contain multi-format information. The process begins with the creation of a Hypertext Markup Language (HTML) document from the input document, which consists of at least two merged documents in different formats. The method involves realigning the HTML document based on the spatial arrangement of words and determining a unique document identifier (ID) for each merged document using a pretrained machine learning model. A hierarchy configuration file is then generated, which includes free-flowing text and associated document IDs, categorized into headings, sub-headings, and other text structures. This hierarchy file can be split at the document level, allowing for targeted data extraction.

Further, the system is designed to handle both text and scanned documents. For scanned documents, it converts pages into images and performs pre-processing to validate the information. The number of columns in the HTML document is determined through a series of calculations and clustering techniques, while the realignment of the document is achieved by sorting words based on their properties. The hierarchy configuration file is generated by analyzing text features and categorizing lines accordingly. Finally, the extraction of information from this hierarchy file is facilitated by orchestrating various data extractors, which may include machine learning-based and rule-based extractors, to compile the extracted data into a cohesive output document.

To know more about GlobalData’s detailed insights on Wipro, buy the report here.

Data Insights

From

The gold standard of business intelligence.

Blending expert knowledge with cutting-edge technology, GlobalData’s unrivalled proprietary data will enable you to decode what’s happening in your market. You can make better informed decisions and gain a future-proof advantage over your competitors.

GlobalData

GlobalData, the leading provider of industry intelligence, provided the underlying data, research, and analysis used to produce this article.

GlobalData Patent Analytics tracks bibliographic data, legal events data, point in time patent ownerships, and backward and forward citations from global patenting offices. Textual analysis and official patent classifications are used to group patents into key thematic areas and link them to specific companies across the world’s largest industries.