With the new GPT-4o model OpenAI takes its ChatGPT to the next level

ChatGPT 4o

Pioneering AI firm OpenAI has launched the latest edition of its LLM, GPT-4o. The flagship model is being made available to all ChatGPT users free of charge, although paying users will get faster access to it.

There is a lot to this update, but OpenAI highlights improvements to capabilities across text, voice and vision, and well as faster performance. Oh, and if you were curious, the "o" in GPT-4o stands for "omni".

See also:

The updates to the model open up a new world of possible uses, and OpenAI says that "GPT-4o is much better than any existing model at understanding and discussing the images you share". It is described as being a "step towards much more natural human-computer interaction -- it accepts as input any combination of text, audio, and image, and generates any combination of text, audio, and image outputs".

So far, so vague. But what does it mean in practice? The company offers up some potential usage scenarios:

You can now take a picture of a menu in a different language and talk to GPT-4o to translate it, learn about the food's history and significance, and get recommendations. In the future, improvements will allow for more natural, real-time voice conversation and the ability to converse with ChatGPT via real-time video. For example, you could show ChatGPT a live sports game and ask it to explain the rules to you. We plan to launch a new Voice Mode with these new capabilities in an alpha in the coming weeks, with early access for Plus users as we roll out more broadly.

The speed of GPT-4o is comparable to that of human responses, and OpenAI also draws attention to massive improvements to translation capabilities and operations in non-English languages.

There have been safety concerns about artificial intelligence from the very beginning, and these are only growing as the technology becomes more powerful. Acknowledging this, OpenAI says:

GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they're discovered.

We recognize that GPT-4o's audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we'll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. We will share further details addressing the full range of GPT-4o's modalities in the forthcoming system card.

There is a wealth of additional information about GPT-4o available here.

© 1998-2024 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.