OCR and VLM examples
Pass converted page images into OCR engines or multimodal SDKs.
These examples show how to pass converted page images into OCR and multimodal tools after rendering.
Choose a format for the downstream tool
PNG is the safest default. Use WebP or JPEG if you want smaller payloads and the downstream tool accepts those formats.
Tesseract.js
import { convert } from "@omsimos/pdf-raster";
import { createWorker } from "tesseract.js";
const worker = await createWorker("eng");
const [page] = await convert("./scan.pdf", {
pages: [0],
dpi: 300,
});
const result = await worker.recognize(page.data);
console.log(result.data.text);page.data is already an encoded image buffer, so it can be passed directly
into OCR libraries that accept image bytes.
AI SDK generateText
import { convert } from "@omsimos/pdf-raster";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
const [page] = await convert("./invoice.pdf", {
pages: [0],
dpi: 300,
});
const { text } = await generateText({
model: openai("gpt-4.1"),
messages: [
{
role: "user",
content: [
{ type: "text", text: "Extract all visible text from this page." },
{ type: "image", image: page.data },
],
},
],
});The same pattern works for other image-capable SDKs: convert first, then pass
page.data and page.mimeType into the next system.
Why the handoff is simple
The converted page result is already shaped for downstream use:
page.datais the image payloadpage.mimeTypetells the consumer what it ispage.width,page.height, andpage.dpihelp with validation and metrics
Boundary to preserve
Keep responsibilities separated
@omsimos/pdf-raster handles the PDF-to-image step. OCR engines, VLM
providers, and post-processing can be layered on top in app code or separate
packages.