pdf-raster

OCR and VLM examples

Pass converted page images into OCR engines or multimodal SDKs.

These examples show how to pass converted page images into OCR and multimodal tools after rendering.

Choose a format for the downstream tool

PNG is the safest default. Use WebP or JPEG if you want smaller payloads and the downstream tool accepts those formats.

Tesseract.js

import { convert } from "@omsimos/pdf-raster";
import { createWorker } from "tesseract.js";

const worker = await createWorker("eng");
const [page] = await convert("./scan.pdf", {
  pages: [0],
  dpi: 300,
});

const result = await worker.recognize(page.data);
console.log(result.data.text);

page.data is already an encoded image buffer, so it can be passed directly into OCR libraries that accept image bytes.

AI SDK generateText

import { convert } from "@omsimos/pdf-raster";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const [page] = await convert("./invoice.pdf", {
  pages: [0],
  dpi: 300,
});

const { text } = await generateText({
  model: openai("gpt-4.1"),
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Extract all visible text from this page." },
        { type: "image", image: page.data },
      ],
    },
  ],
});

The same pattern works for other image-capable SDKs: convert first, then pass page.data and page.mimeType into the next system.

Why the handoff is simple

The converted page result is already shaped for downstream use:

  • page.data is the image payload
  • page.mimeType tells the consumer what it is
  • page.width, page.height, and page.dpi help with validation and metrics

Boundary to preserve

Keep responsibilities separated

@omsimos/pdf-raster handles the PDF-to-image step. OCR engines, VLM providers, and post-processing can be layered on top in app code or separate packages.

On this page