The OpenAI announcement will transform the way Mindset AI agents engage with users and knowledge
Published by
Published on
Read time
Category
On Monday, 13th May 2024, OpenAI announced its latest artificial intelligence large language model (LLM), GPT-4o, that can reason across audio, vision, and text in real-time.
At Mindset AI, we want your users to feel like they are talking to a human, not a chatbot. We believe interactions with AI should feel natural. Everything we build is aiming to achieve this objective.
Our team constantly reviews the latest innovations and integrates valuable AI updates into our product as soon as they're available, ensuring your team and customers benefit immediately.
The good news is our team has already started incorporating OpenAI's latest capabilities into our platform for you to use.
Here is a breakdown of OpenAI's announcements:
Speed & intelligence combined
- GPT-4o is 2x faster than GPT4 whilst still having the same level of intelligence
- GPT4o sets a new high score of 88.7% on the Gen AI ‘General Knowledge Test’ (called the 0-shot COT MMLU test)
Improved vision
- GPT-4o has radically improved vision capabilities across the majority of tasks. Vision enables an LLM to understand data inside charts, graphs or images.
Improved non-English language capabilities
- GPT-4o has improved capabilities in 50 non-English languages.
Increased context window
- GPT-4o has a 128K context window and has a knowledge cut-off date of October 2023. Context windows enable the LLM to understand more data after each user's request.
Multi-modal level up (coming soon)
This is a major update. Previously, with GPT-4, you could use ‘Voice Mode’ to talk to ChatGPT, but it had a frustrating average delay of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4). This made conversations with AI feel unnatural, with constant pauses and confusion when interrupting GPT, making it obvious you were speaking to a computer.
Voice Mode in GPT-4 worked by using three separate models: one to transcribe audio to text, GPT-3.5 or GPT-4 to process the text, and another to convert the text back to audio.
This process caused GPT-4 to lose a lot of information—it couldn't directly understand tone, multiple speakers, or background noises, and it couldn't produce laughter, singing, or express emotion.
With Omnimodel Voice Mode, this is the first-ever end-to-end model that handles text, vision, and audio within a single neural network. This allows the new model to respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds. This is similar to human response time.
Watch this demo for a sense of what these new capabilities will feel like:
What does all of this mean for Mindset customers?
This model will enable Mindset to transform how knowledge is acquired, bringing us closer to our goal of creating AI agents that feel human.
Mindset has already integrated GPT-4o into our platform and you will be able to turn on GPT4-o from Tuesday 21st of May. We will continue to incorporate the latest voice capabilities as soon as they become available.
Here are some ways Mindset believes these new multi-modal capabilities will impact you:
Get answers quicker than before
From Tuesday 21st May, you will immediately notice faster responses, when GPT-4o is turned on. GPT-4o is the fastest model created so far, ensuring there is no lag time in responses.
Speak to agents
With our current work on 'Capabilities', agents can run learning scenarios, provide feedback using specific frameworks, act as ideation partners, and more. However, we understand that typing long messages can be frustrating when you are short on time.
With GPT-4o audio, this experience can become a natural conversation. Users will soon be able to talk to the agent, just as they do with Siri or Alexa, and run through scenarios without breaking the flow of learning. Agents will have full conversations with your users, providing an immersive learning experience.
More accurate search through conversation
Mindset has been transforming how your users search for knowledge. We have introduced ‘Chain of Thought’, allowing AI to use logical reasoning to meet users' requests accurately. We have also added clarification steps, enabling agents to ask questions back and much more.
With OpenAIs audio capabilities, search and clarification will be possible through speech.
More accurate Search, across all your data sources
Thanks to a larger context window (128k tokens Vs 32k with GPT4), we can improve search accuracy even further. This allows us to analyse more data and make better connections within your knowledge base.
With upcoming integrations (GDrive, SharePoint, Slack, Teams), users will be able to search across multiple data sources simultaneously, making the process faster and more convenient.
Search for images and ask questions about them
LLMs have previously struggled to understand dense or detailed images, often missing small but critically important details. This changed with GPT-4o.
Mindset has been working on integrating these new vision capabilities into our content ingestion process. This new model will enable us to understand images in your charts, graphs, PDFs, and more.
As a result, your users can search and ask questions about images and receive answers that accurately describe the information within them.
Next Steps
GPT-4o will be available in your admin console on Tuesday 21st May, for your agents.
This will give you the benefits of increased speed and a larger context window for better search accuracy. Next week, we will release new vision capabilities for ingesting charts and graphs, allowing users to ask questions about data that was previously invisible to agents.
Very soon, OpenAI will allow us to access the new voice mode capabilities. When that happens, you'll be the first to know...