iFlytek has filed a patent for a speech recognition method and related products. The method involves acquiring a speech to be recognized and a hot word library, determining audio-related and hot word-related features based on the speech and library, and determining the recognition result of the speech. GlobalData’s report on iFlytek gives a 360-degree view of the company including its patenting strategy. Buy the report here.
According to GlobalData’s company profile on iFlytek, intelligent chatbots was a key innovation area identified from patents. iFlytek's grant share as of June 2023 was 1%. Grant share is based on the ratio of number of grants to total number of patents.
Speech recognition method using hot word library for decoding
A recently filed patent (Publication Number: US20230186912A1) describes a speech recognition method and device. The method involves acquiring a speech to be recognized and a hot word library. Based on the speech and the hot word library, the method determines an audio-related feature and a hot word-related feature at the current decoding time instant. These features are then used to determine the recognition result of the speech at that instant.
The method further includes acquiring decoded result information before the current decoding time instant and using this information along with the hot word library to determine the audio-related feature. The process involves processing the speech and the hot word library using a pre-trained speech recognition model to obtain the recognition result.
The speech recognition model consists of an audio encoding module, a hot word encoding module, a joint attention module, a decoding module, and a classifying module. The audio encoding module encodes the speech to obtain an audio encoding result, while the hot word encoding module encodes each hot word in the library to obtain a hot word encoding result. The joint attention module processes the audio and hot word encoding results to obtain a combined feature used at the current decoding time instant, which includes both the audio-related and hot word-related features. The decoding module processes the combined feature to obtain an output feature, and the classifying module determines the recognition result based on this output feature.
The joint attention module includes a first attention model and a second attention model. The first attention model determines the audio-related feature based on a state vector outputted by the decoding module and the hot word encoding result. The second attention model determines the hot word-related feature based on the audio-related feature. These features are then combined into the combined feature used for recognition.
The classifying module includes fixed commonly-used character nodes and dynamically expandable hot word nodes. It determines the recognition result by calculating probability scores for each node based on the output feature of the decoding module. The recognition result of the speech at the current decoding time instant is determined based on the probability scores of the commonly-used character nodes and the hot word nodes.
The patent also describes a speech recognition device that includes a memory and a processor, as well as a non-transitory machine-readable storage medium with a computer program implementing the speech recognition method when executed by a processor.