**Microsoft’s New Small Language Model Can Analyze Images – Meet Phi-3 Vision**
*Illustration: The Verge*
—
In a significant stride forward, Microsoft has unveiled a new version of its small language model, Phi-3, which now has the impressive ability to analyze images and provide detailed descriptions of what’s in them.
Dubbed Phi-3-vision, this model is a multimodal system, meaning it can process and interpret both text and images, making it particularly useful for mobile devices. The model is currently available in a preview version, and Microsoft describes it as a 4.2 billion parameter model. To put it simply, the term „parameters“ relates to the complexity of the model and the depth of its training. This upgrades Phi-3-vision to perform various general visual reasoning tasks, such as answering questions about charts or images.
What sets Phi-3-vision apart, however, is its size. It is considerably smaller than other advanced image-focused AI models like OpenAI’s DALL-E or Stability AI’s Stable Diffusion. Unlike these more extensive models, which can generate new images, Phi-3-vision’s primary capability lies in its ability to understand and analyze existing images.
This development marks a significant milestone, particularly for applications on mobile devices, where resource efficiency is crucial. The smaller size and power of Phi-3-vision without compromising its functionality make it a promising tool for a wide range of visual reasoning tasks.
As Microsoft continues to innovate and refine its language models, Phi-3-vision is poised to open up new possibilities for mobile device users and developers who require robust visual analysis capabilities wrapped in a compact, efficient package.
Stay tuned for more updates as we delve deeper into the capabilities and potential applications of Microsoft’s latest breakthrough.
—
**Continue reading…**
source: https://www.theverge.com/2024/5/21/24159282/microsoft-ai-small-language-model-phi-3-vision