SAS Institute. has filed a patent for a computer-implemented system that identifies hierarchical taxonomy categories, extracts taxonomy tokens, computes taxonomy vectors, clusters, and constructs a hierarchical taxonomy classifier. The system converts unlabeled datasets to labeled datasets based on the classifier. The claim details the process of tokenizing, computing embedding values, classifying taxonomy categories, and outputting labeled datasets. GlobalData’s report on SAS Institute gives a 360-degree view of the company including its patenting strategy. Buy the report here.

According to GlobalData’s company profile on SAS Institute, Facial recognition AI was a key innovation area identified from patents. SAS Institute's grant share as of January 2024 was 81%. Grant share is based on the ratio of number of grants to total number of patents.

Hierarchical taxonomy classification system for structured datasets

Source: United States Patent and Trademark Office (USPTO). Credit: SAS Institute Inc

The patent application (Publication Number: US20240028621A1) describes a computer-program product stored in a non-transitory machine-readable storage medium that performs operations related to taxonomy labeling of structured datasets. The product involves tokenizing structured datasets, computing embedding values, and classifying datasets into taxonomy categories. It includes a taxonomy classification model that computes taxonomy category labels based on embedding values and distinct clusters. The product also features a reclassification workflow for datasets with low confidence labels and an active learning-based labeling workflow for ambiguous taxonomy category labels.

Furthermore, the patent application details a computer-implemented method and system for taxonomy labeling of structured datasets. The method involves tokenization, embedding value computation, and taxonomy category label prediction for each dataset. It includes reclassification workflows for datasets with low confidence labels and active learning-based workflows for ambiguous labels. The system comprises processors, a database, and computer-readable instructions for performing taxonomy labeling operations. It utilizes a taxonomy classification model to compute category labels based on embedding values and distinct clusters, with additional features for reclassification and active learning-based labeling workflows. The token vectorization model used in the system is based on pre-trained word embeddings trained for a target hierarchical taxonomy, enhancing the accuracy and efficiency of the taxonomy labeling process.

To know more about GlobalData’s detailed insights on SAS Institute, buy the report here.

Premium Insights


The gold standard of business intelligence.

Blending expert knowledge with cutting-edge technology, GlobalData’s unrivalled proprietary data will enable you to decode what’s happening in your market. You can make better informed decisions and gain a future-proof advantage over your competitors.


GlobalData, the leading provider of industry intelligence, provided the underlying data, research, and analysis used to produce this article.

GlobalData Patent Analytics tracks bibliographic data, legal events data, point in time patent ownerships, and backward and forward citations from global patenting offices. Textual analysis and official patent classifications are used to group patents into key thematic areas and link them to specific companies across the world’s largest industries.