Baidu has filed a patent for a data labeling method based on artificial intelligence. The method involves determining samples involved in clustering, pre-clustering the samples to obtain class clusters, receiving labeling information for the clusters, and determining a clustering result based on the labeling information. The method aims to improve data labeling efficiency in fields such as image recognition and natural language processing. GlobalData’s report on Baidu gives a 360-degree view of the company including its patenting strategy. Buy the report here.

According to GlobalData’s company profile on Baidu, Behavioral analytics was a key innovation area identified from patents. Baidu's grant share as of September 2023 was 44%. Grant share is based on the ratio of number of grants to total number of patents.

Data labeling method based on artificial intelligence

Source: United States Patent and Trademark Office (USPTO). Credit: Baidu Inc

A recently filed patent (Publication Number: US20230316709A1) describes a data labeling method based on artificial intelligence. The method involves determining a plurality of samples involved in clustering and performing iterative processing until a convergence condition is met or a specified number of iterations is reached.

During the iterative processing, the method includes pre-clustering the samples based on their vector representation to obtain class clusters. Each class cluster contains at least one sample. The method then receives labeling information for the class clusters and re-determines the samples involved in clustering based on this information. Finally, a clustering result is determined based on the labeling information for the class clusters.

The labeling information for the class clusters includes at least one sub-cluster and a representative sample in each sub-cluster. The method re-determines the samples involved in clustering by taking the representative sample in each sub-cluster as the re-determined samples.

The pre-clustering process involves using a cluster algorithm in combination with a restriction condition to ensure that the obtained class clusters satisfy the restriction condition. The restriction condition can include criteria such as a maximum number of samples in each class cluster or ensuring that samples in each class cluster belong to different clusters in the previous iteration process.

The method also involves determining the density of the samples involved in clustering and performing operations based on the density. This includes determining neighboring samples for each sample involved in clustering and adding the sample to a class cluster based on certain conditions. These conditions include the density of the neighboring sample, the existence of the class cluster to which the neighboring sample belongs, the similarity between the neighboring sample and the sample involved in clustering, and the number of samples in the class cluster.

The patent also describes an electronic apparatus that includes a processor and memory. The memory stores instructions that enable the processor to execute the data labeling method described in the patent.

In summary, the patent describes a data labeling method based on artificial intelligence that involves iterative processing, pre-clustering, and re-determining samples involved in clustering. The method aims to improve the accuracy and efficiency of data labeling for various applications, such as image or text samples. The patent also includes details about the restriction conditions, density-based operations, and an electronic apparatus that can execute the method.

To know more about GlobalData’s detailed insights on Baidu, buy the report here.

Data Insights

From

The gold standard of business intelligence.

Blending expert knowledge with cutting-edge technology, GlobalData’s unrivalled proprietary data will enable you to decode what’s happening in your market. You can make better informed decisions and gain a future-proof advantage over your competitors.

GlobalData

GlobalData, the leading provider of industry intelligence, provided the underlying data, research, and analysis used to produce this article.

GlobalData Patent Analytics tracks bibliographic data, legal events data, point in time patent ownerships, and backward and forward citations from global patenting offices. Textual analysis and official patent classifications are used to group patents into key thematic areas and link them to specific companies across the world’s largest industries.