With 12 billion parameters, this advanced model is designed for tasks like image captioning and object counting. Pixtral 12B, built on Mistral’s earlier text model Nemo 12B, can handle images of any size or quantity through either URLs or base64 encoding, making it versatile for a wide range of applications in fields requiring visual data analysis, such as content moderation, healthcare, and more.
Although there are no working web demos yet, Mistral has plans to enable testing soon. The Pixtral 12B will be accessible via the company's chatbot platform, Le Chat, and API-serving platform, Le Plateforme, in the near future. While not yet publicly tested, Pixtral 12B’s release marks an important step in Mistral’s efforts to compete in the growing field of AI multimodal models.
Mistral has made Pixtral 12B freely available under the Apache 2.0 licence, allowing users to download, fine-tune, and deploy the model without restrictions. The model can be accessed through a torrent link on GitHub or the AI development platform Hugging Face, making it easily available to developers and researchers.
It’s part of a growing trend of multimodal models, similar to OpenAI’s GPT-4 and Anthropic’s Claude family, that integrate both visual and textual understanding. At 24GB in size, the 12-billion-parameter architecture of Pixtral 12B indicates its strong problem-solving capabilities, as models with higher parameter counts tend to perform better.