18/05/2026
AI is losing the divide between different types of data. New "native multimodal" models like Gemini 3.1 and GPT-5.4 process audio, video, and text in one single step. This allows the AI to understand the world more like a human does, rather than stitching together separate parts.
This change makes tasks like medical diagnosis much more accurate. A single model can look at a patient's records and their X-ray at the same time to find hidden patterns. This architecture also allows users to search through hours of video using simple text questions.