OCR and VLM examples

These examples show how to pass converted page images into OCR and multimodal tools after rendering.

Choose a format for the downstream tool

PNG is the safest default. Use WebP or JPEG if you want smaller payloads and the downstream tool accepts those formats.

Tesseract.js

import { convert } from "@omsimos/pdf-raster";
import { createWorker } from "tesseract.js";

const worker = await createWorker("eng");
const [page] = await convert("./scan.pdf", {
  pages: [0],
  dpi: 300,
});

const result = await worker.recognize(page.data);
console.log(result.data.text);

page.data is already an encoded image buffer, so it can be passed directly into OCR libraries that accept image bytes.

AI SDK `generateText`

import { convert } from "@omsimos/pdf-raster";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const [page] = await convert("./invoice.pdf", {
  pages: [0],
  dpi: 300,
});

const { text } = await generateText({
  model: openai("gpt-4.1"),
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Extract all visible text from this page." },
        { type: "image", image: page.data },
      ],
    },
  ],
});

The same pattern works for other image-capable SDKs: convert first, then pass page.data and page.mimeType into the next system.

Why the handoff is simple

The converted page result is already shaped for downstream use:

page.data is the image payload
page.mimeType tells the consumer what it is
page.width, page.height, and page.dpi help with validation and metrics

Boundary to preserve

Keep responsibilities separated

@omsimos/pdf-raster handles the PDF-to-image step. OCR engines, VLM providers, and post-processing can be layered on top in app code or separate packages.

Tesseract.js

AI SDK generateText

Why the handoff is simple

Boundary to preserve

On this page

AI SDK `generateText`