Google has unveiled VideoPrism, a single model capable of handling various video analysis tasks like classification, retrieval, captioning, and question answering.

VideoPrism is pre-trained on a dataset consisting of 36 million video-text pairs and an additional 582 million video clips.

In a demonstration video, Google explained that VideoPrism uses a two-stage training approach. First, it employs contrastive learning to match videos with their text descriptions, including imperfect ones.

Then, it leverages videos without text descriptions using a masked video modelling framework to predict masked patches in a video.

VideoPrism can be combined with large language models for various video-language tasks such as video-text retrieval, captioning, and question-answering.

After completing tests, Google said that VideoPrism achieves acceptable performance on 30 out of 33 video understanding benchmarks.

VideoPrism was tested on datasets used in scientific domains like ethology, behavioural neuroscience, and ecology.

In a statement, Google said the encoder not only performed well but also surpassed models designed specifically for those tasks, indicating its potential for scientific analysis of video data.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

With this new tool, Google is now one of several Big Tech companies providing content summarisation and detailed research on videos using AI.

OpenAI’s Sora, which launched in February of this year, is a text-to-video platform. However, the software is still a work in progress with multiple weaknesses.

OpenAI said, however, that Sora struggles with accurately simulating the physics of a scene, with a lack of understanding of specific instances of cause and effect