OpenAI Announces ChatGPT 4o Omni

ChatGPT announced a new version of ChatGPT that can accept audio, image and text input and also generate audio, image and text output. OpenAI calls the new version of ChatGPT 4o, with the “o” standing for “omni,” which is a compound word meaning “all.”

ChatGPT 4o (Omni)

OpenAI described this new version of ChatGPT as a progression toward more natural human-machine interactions that respond to user input at the same speed as human-to-human conversations. The new version matches ChatGPT 4 Turbo in English and significantly outperforms Turbo in other languages. There is a significant improvement in API performance, increasing speed and operating 50% less expensive.

The ad explains:

“As measured by traditional benchmarks, GPT-4o achieves GPT-4 turbo-level performance in text, reasoning, and coding intelligence, while setting new high-water marks in multilingual capabilities, audio and vision”.

Advanced speech processing

The previous method of communicating by voice involved joining three different models to handle the transcription of voice-to-text input where the second model (GPT 3.5 or GPT-4) processes it and outputs text and a third model transcribes the text back to audio. This method is said to lose nuances in different translations.

OpenAI outlined the drawbacks of the previous approach that are (presumably) overcome by the new approach:

“This process means that the main source of intelligence, GPT-4, loses a lot of information: it cannot directly observe pitch, various speakers or background noise, and it cannot emit laughter, sing or express emotions.”

The new version does not need three different models because all inputs and outputs are handled together in one model for end-to-end audio input and output. Interestingly, OpenAI claims that they have not yet explored the full capabilities of the new model or fully understood its limitations.

New rails and an iterative release

OpenAI GPT 4o includes new guardrails and filters to keep it safe and prevent unwanted voice output for security. However, today’s announcement says they’re only implementing text and image input capabilities and limited text and audio output at launch. GPT 4o is available for both free and paid tiers, and Plus users get 5x higher message limits.

Audio capabilities are planned for a limited alpha release to ChatGPT Plus and API users in a few weeks.

The ad explained:

“We recognize that GPT-4o’s audio modalities present a variety of new risks. Today we’re releasing text and image inputs and text outputs. Over the coming weeks and months, we’ll be working on the technical infrastructure, usability through further training and the necessary security to release the other modes. For example, at launch, audio outputs will be limited to a selection of predefined voices and will comply with our existing security policies.”

Read the announcement:

Hello GPT-4o

Featured image by Shutterstock/Photo For Everything

[ad_2]

Source link