ChatGPT maker OpenAI has unveiled a new flagship version of the technology that powers the AI chatbot. The chatbot can more capably interact with the world using audio, images, and text.
Called GPT-4o, the new model can better understand any input combination of audio, text and images, and respond in a similar fashion, while also responding faster and in a more human way, the firm said.
OpenAI said it was making the new, enhanced model available to all ChatGPT users, even those who do not have a paid subscription with the firm as part of its “mission” to ensure AI technology was “accessible and beneficial to everyone”.
The new model is “much better” than any existing model at understanding and discussing images users share, OpenAI said.
“For example, you can now take a picture of a menu in a different language and talk to GPT-4o to translate it, learn about the food’s history and significance, and get recommendations,” the company said.
“In the future, improvements will allow for more natural, real-time voice conversation and the ability to converse with ChatGPT via real-time video.
“For example, you could show ChatGPT a live sports game and ask it to explain the rules to you.”
During a live stream to announce the new model, the company showed the enhanced chatbot helping to teach maths, engage in casual conversations with a near-human response time and even harmonise and sing with a second device also running ChatGPT.
At a time when the scrutiny around AI tools continues to grow, OpenAI said the new model had been extensively safety tested, using independent experts, and has “safety built-in by design”.
The timing of the unveiling also appeared to be a shot across the bow of OpenAI’s rivals – coming the day before tech giant Google is expected to discuss its own plans for generative AI when it opens its annual developer conference, Google I/O, on Tuesday evening.
The best videos delivered daily
Watch the stories that matter, right from your inbox