Cohere Releases Aya Vision AI Model

Cohere Releases Aya Vision AI Model
  • Aya Vision is a multimodal AI model that can perform tasks like image captioning and text translation
  • The model comes in two flavors: Aya Vision 32B and Aya Vision 8B
  • Aya Vision 32B outperforms models twice its size on certain visual understanding benchmarks
  • Aya Vision 8B scores better on some evaluations than models 10 times its size
  • The models are available on Hugging Face under a Creative Commons 4.0 license
  • Aya Vision was trained using synthetic annotations
  • Cohere released a new benchmark suite called AyaVisionBench

Aya Vision AI Model

Cohere For AI, a nonprofit research lab, has released Aya Vision, a multimodal AI model that can perform various tasks such as writing image captions, answering questions about photos, translating text, and generating summaries in 23 major languages.

Aya Vision comes in two flavors: Aya Vision 32B and Aya Vision 8B. The more sophisticated model, Aya Vision 32B, outperforms models twice its size, including Meta's Llama-3.2 90B Vision, on certain visual understanding benchmarks. Meanwhile, Aya Vision 8B scores better on some evaluations than models 10 times its size.

Both models are available on the AI dev platform Hugging Face under a Creative Commons 4.0 license with Cohere's acceptable use addendum. However, they cannot be used for commercial applications. Cohere trained Aya Vision using a diverse pool of English datasets, which the lab translated and used to create synthetic annotations.

Cohere's use of synthetic annotations is a trend in the AI industry. The company claims that training Aya Vision on synthetic annotations enabled the lab to use fewer resources while achieving competitive performance. This approach also enables greater support for the research community, who often have limited access to compute resources.

Along with Aya Vision, Cohere released a new benchmark suite called AyaVisionBench, designed to probe a model's skills in vision-language tasks like identifying differences between two images and converting screenshots to code. The AI industry is currently facing an evaluation crisis, and Cohere asserts that AyaVisionBench is a step towards rectifying this issue.