Wipro has been granted a patent for a method and system that extracts information from multi-format input documents. The process involves creating and realigning an HTML document, classifying content using a machine learning model, generating a hierarchy configuration file, and extracting relevant data attributes. GlobalData’s report on Wipro gives a 360-degree view of the company including its patenting strategy. Buy the report here.
Access deeper industry intelligence
Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.
According to GlobalData’s company profile on Wipro, Hybrid cloud mgmt was a key innovation area identified from patents. Wipro's grant share as of July 2024 was 61%. Grant share is based on the ratio of number of grants to total number of patents.
Method for extracting information from multi-format documents
The patent US12032651B2 outlines a method and system for extracting information from input documents that contain multi-format information. The process begins with the creation of a Hypertext Markup Language (HTML) document from the input document, which consists of at least two merged documents in different formats. The method involves realigning the HTML document based on the spatial arrangement of words and determining a unique document identifier (ID) for each merged document using a pretrained machine learning model. A hierarchy configuration file is then generated, which includes free-flowing text and associated document IDs, categorized into headings, sub-headings, and other text structures. This hierarchy file can be split at the document level, allowing for targeted data extraction.
Further, the system is designed to handle both text and scanned documents. For scanned documents, it converts pages into images and performs pre-processing to validate the information. The number of columns in the HTML document is determined through a series of calculations and clustering techniques, while the realignment of the document is achieved by sorting words based on their properties. The hierarchy configuration file is generated by analyzing text features and categorizing lines accordingly. Finally, the extraction of information from this hierarchy file is facilitated by orchestrating various data extractors, which may include machine learning-based and rule-based extractors, to compile the extracted data into a cohesive output document.
To know more about GlobalData’s detailed insights on Wipro, buy the report here.
Data Insights
From
The gold standard of business intelligence.
Blending expert knowledge with cutting-edge technology, GlobalData’s unrivalled proprietary data will enable you to decode what’s happening in your market. You can make better informed decisions and gain a future-proof advantage over your competitors.

