Skip to main content

PDF OCR pipeline simplification — faster, cheaper, same API

· 2 min read
Chao Cheng
Software Engineer @ Cobbling AI

We just shipped a backend overhaul of PDF OCR. The user-facing API is unchanged — your existing integrations keep working — but everything behind it is leaner.

What changed (and what didn't)

Same

  • Endpoints, authentication, request and response shapes.
  • Webhook events and Svix signature verification.
  • Pricing and subscription tiers.

Faster behind the scenes

  • Pages now go straight from PDF to OCR. We dropped the intermediate PDF→PNG conversion that used to run in AWS Lambda for every page.
  • Worker count for the OCR pipeline went from five to two, which removes hops, queues, and retry surface area.
  • Per-page latency in our internal benchmarks improved noticeably, and tail latency on bursty workloads is much tighter now that fewer queues need to drain.

Why we did it

The old pipeline was three-staged: split the PDF into per-page PDFs, convert each page to PNG (Lambda), then OCR the PNG. The PNG conversion cost real money and added a queue retry hop where, on a transient failure, we'd redo a step that didn't need redoing.

The OCR model already accepts PDF inputs natively. Once we wired the page-OCR worker to send page PDFs directly, the PNG conversion step became dead weight. Removing it eliminated:

  • One Lambda function and one API Gateway.
  • Two Cloudflare Queues (and their dead-letter queues).
  • Three of the five OCR-pipeline workers.

Do I need to change anything?

No. Same endpoints, same auth, same payloads.

If you want a slightly simpler upload path, we now also document the single-request multipart upload at https://api.commapdf.cobbling.ai/v1/parsing/upload alongside the existing presigned-URL flow at https://api.pdf-ocr.cobbling.ai/v1/upload/direct. Both still work; pick whichever fits your client.

See the refreshed PDF OCR documentation for the full reference.

As always, ping us if anything looks off — we read every report.