Mistral Launches AI-Ready PDF Conversion API

- Mistral launches new API to convert PDF documents into AI-ready Markdown files
- Mistral OCR API uses optical character recognition (OCR) to convert PDFs into text files
- API creates bounding boxes around graphical elements and includes them in the output
- Output is formatted in Markdown, a syntax used by developers to add formatting elements to plain text files
- API is available on Mistral's API platform or through cloud partners, with on-premise deployment option
- Mistral claims API performs better than those from Google, Microsoft, and OpenAI
- API has many potential use cases, including law firms and RAG systems
Mistral OCR API
Mistral has launched a new API called Mistral OCR, which uses optical character recognition (OCR) to convert PDF documents into text files. This API is designed to make it easier for AI models to ingest and process complex documents, particularly those with illustrations and photos intertwined with text.
The Mistral OCR API creates bounding boxes around graphical elements and includes them in the output, which is formatted in Markdown. This formatting syntax is commonly used by developers to add links, headers, and other formatting elements to plain text files.
According to Mistral co-founder and chief science officer Guillaume Lample, the Mistral OCR API is a crucial step towards the widespread adoption of AI assistants in companies that need to simplify access to their vast internal documentation. The API is available on Mistral's own API platform or through its cloud partners, and it also offers on-premise deployment for companies working with classified or sensitive data.
Mistral has tested its OCR model with complex documents that include mathematical expressions, advanced layouts, or tables, and it claims to perform better than APIs from Google, Microsoft, and OpenAI. The company is also using the Mistral OCR API for its own AI assistant, Le Chat, which uses the API in the background to understand what's in the document before processing the text.
The Mistral OCR API has many potential use cases, including law firms using it to swiftly plough through huge volumes of documents. It also works well with Retrieval-Augmented Generation (RAG) systems, which use multimodal documents as input in an LLM.