Genpact has been granted a patent for a method and system to train a machine-learning system by using labeled entities to predict labels for unlabeled entities, updating the labeled entities set based on confidence scores, and storing the ML model. The method allows for training multiple ML models for different document sources. GlobalData’s report on Genpact gives a 360-degree view of the company including its patenting strategy. Buy the report here.

Machine-learning system training using labeled entities and clusters

A recently granted patent (Publication Number: US11886820B2) outlines a method for training a machine-learning (ML) system that involves providing a seed set of labeled entities from a cluster of documents to train the ML model. The trained system then predicts labels for unlabeled entities, selecting a subset based on confidence scores, updating the labeled entities set, and repeating the process until a termination condition is met. The method allows for the creation of multiple ML models for different clusters of documents, enhancing the efficiency and accuracy of the training process.



Furthermore, the patent details a system that implements the method, comprising a processor and memory with instructions to carry out the training process. The system utilizes manually labeled entities from document clusters to train the ML module, predict labels for unlabeled entities, and update the labeled entities set iteratively. By incorporating a hybrid of conditional random fields (CRF) and a long short-term memory (LSTM) classifier, the system aims to improve the accuracy of predictions. Additionally, the termination condition for the training process ensures efficient model creation by setting criteria such as iteration cycles, labeled entities set size, and machine-labeled entities set size. Overall, the patented method and system offer a structured approach to training ML systems, particularly beneficial for tasks involving document analysis and classification.

