The technology industry continues to be a hotbed of innovation, with activity driven by increasing demand for natural and human-like speech synthesis in various applications such as virtual assistants, accessibility tools, and audio content production, and growing importance of technologies such as deep learning, neural networks, and speech synthesis techniques, which contribute to improved speech quality, expressiveness, and customization options for users. In the last three years alone, there have been over 3.6 million patents filed and granted in the technology industry, according to GlobalData’s report on Innovation in Artificial Intelligence: Text to speech systems. Buy the report here.
However, not all innovations are equal and nor do they follow a constant upward trend. Instead, their evolution takes the form of an S-shaped curve that reflects their typical lifecycle from early emergence to accelerating adoption, before finally stabilising and reaching maturity.
Identifying where a particular innovation is on this journey, especially those that are in the emerging and accelerating stages, is essential for understanding their current level of adoption and the likely future trajectory and impact they will have.
300+ innovations will shape the technology industry
According to GlobalData’s Technology Foresights, which plots the S-curve for the technology industry using innovation intensity models built on over 2.5 million patents, there are 300+ innovation areas that will shape the future of the industry.
Within the emerging innovation stage, finite element simulation, ML-enabled blockchain networks and generative adversarial network (GAN), are disruptive technologies that are in the early stages of application and should be tracked closely. Demand forecasting applications, intelligent embedded systems, and deep reinforcement learning are some of the accelerating innovation areas, where adoption has been steadily increasing. Among maturing innovation areas are wearable physiological monitors and smart lighting, which are now well established in the industry.
Innovation S-curve for artificial intelligence in the technology industry
Text to speech systems is a key innovation area in artificial intelligence
Text-to-speech systems, also referred to as speech synthesis or text-to-voice systems, are computer-based tools that transform written text into spoken words. These systems aim to replicate human speech patterns and are utilized in diverse applications such as assisting individuals with disabilities, creating audio versions of written content, and delivering audio notifications for medical equipment.
GlobalData’s analysis also uncovers the companies at the forefront of each innovation area and assesses the potential reach and impact of their patenting activity across different applications and geographies. According to GlobalData, there are 60+ companies, spanning technology vendors, established technology companies, and up-and-coming start-ups engaged in the development and application of text to speech systems.
Key players in text to speech systems – a disruptive innovation in the technology industry
‘Application diversity’ measures the number of different applications identified for each relevant patent and broadly splits companies into either ‘niche’ or ‘diversified’ innovators.
‘Geographic reach’ refers to the number of different countries each relevant patent is registered in and reflects the breadth of geographic application intended, ranging from ‘global’ to ‘local’.
Patent volumes related to text to speech systems
Source: GlobalData Patent Analytics
Microsoft is a leading patent filer in text-to-speech systems. One of the company’s patents focuses on multi-voice font interpolation that enables the creation of computer-generated speech with diverse speaker characteristics and prosody by combining existing fonts. A prediction model in the interpolation engine estimates parameters influencing speaker characteristics and prosody based on the phoneme sequence from the text. The engine generates additional parameter values through weighted interpolation, allowing modification of voice fonts to alter speech style and emotion while preserving the original voice's fundamental qualities. This technology facilitates transplanting speaker characteristics and prosody across voice fonts or generating entirely new attributes for existing voice fonts.
By geographic reach, Dolby Laboratories leads the pack, followed by Interactive Intelligence Group and 24/7 Customer. In terms of application diversity, Zya holds the top position, followed by Casio Computer and ROBLOX.
Text-to-speech systems play a crucial role in enhancing accessibility by providing audio representation of written content, benefiting individuals with visual impairments, or reading difficulties. Additionally, these systems offer a wide range of applications in areas such as voice assistants, language learning, entertainment, and automated customer service, improving user experiences and enabling efficient information consumption. To further understand how artificial intelligence is disrupting the technology industry, access GlobalData’s latest thematic research report on Artificial Intelligence (AI) – Thematic Intelligence.