European Alternative logo
Discover Alternative to... News
← Back to news

Mistral releases OCR 4, a self-hostable document AI for data-sensitive work

Paris-based Mistral AI shipped OCR 4, a document-understanding model that extracts structured text in 170 languages and can run fully self-hosted — letting regulated European organisations process sensitive files without sending them to a US cloud API.

On 23 June 2026, Paris-based Mistral AI released OCR 4, a document-understanding model that turns PDFs, Word, PowerPoint and OpenDocument files into structured data. It extracts text across 170 languages, draws bounding boxes around each block, classifies elements such as titles, tables, equations and signatures, and attaches a confidence score to what it reads.

Mistral says the model tops the OlmOCRBench leaderboard with a score of 85.20 and reaches 93.07 on OmniDocBench, and that human annotators preferred its output to competing systems around 72% of the time. It is priced at $4 per 1,000 pages through the API, $2 in batch mode, with a higher-level Document AI tier at $5.

The headline feature for European buyers is how it deploys. OCR 4 is compact enough to run in a single container and can be hosted fully on a customer’s own infrastructure — letting banks, hospitals, law firms and public bodies process sensitive documents without routing them through a third-party US cloud service. That is the usual trade-off with incumbents such as AWS Textract, Google Document AI and Azure Document Intelligence, where the data leaves the building.

Mistral, whose Vibe assistant and Vibe for code coding agent already appear here as European alternatives to ChatGPT and GitHub Copilot, has made on-premise and EU-hosted deployment its consistent differentiator against larger US labs. OCR 4 carries that pitch from chat into the high-volume, unglamorous world of document processing — where data-residency rules often decide which vendor a regulated organisation is allowed to use at all.