PDF OCR Service

Cobbling AI's PDF OCR service delivers fast processing, high accuracy, and rich structured outputs for PDF documents.

Key Benefits

Enhanced recognition quality for complex layouts, tables, and mathematical notation
Faster turnaround times for large documents through an optimized processing pipeline
Real-time webhook support to keep your applications up to date

Features

High-accuracy OCR processing using advanced AI models
Automatic conversion of:
- Mathematical equations to LaTeX format
- Tables to Markdown format
Smart handling of:
- Headers and footers
- References and footnotes
- Natural handwriting

Subscription & Pricing

PDF OCR is available through Cobbling AI subscriptions. To get started:

Sign up for a subscription on the Cobbling AI website
Generate an API key in the account dashboard
Use the API key to authenticate requests to the PDF OCR service

Generating Your API Key

After subscribing to the service:

Log in to your Cobbling AI account
Create a project from your dashboard
Navigate to your project and click "Create new secret key"
Copy and securely store your API key
Use this key in your API requests

API Usage

The PDF OCR service uses an asynchronous workflow where you upload a PDF file and then check the processing status.

Step 1: Upload PDF File

Choose one of the following upload methods based on your file size and requirements:

Option A: PutObject Upload (Recommended for files < 50MB)

Get Presigned URL

First, request a presigned URL for uploading your PDF:

POST https://api.pdf-ocr.cobbling.ai/v1/upload/direct
Authorization: Bearer <your_api_key>
Content-Type: application/json

{
  "fileName": "document.pdf",
  "fileType": "application/pdf"
}

Request Parameters

fileName (string, required): Name of the PDF file to upload (max 255 characters)
fileType (string, required): MIME type of the file (use application/pdf for PDF files)

Response Format

{
  "jobGuid": "unique_job_identifier",
  "url": "presigned_upload_url",
  "key": "upload_key"
}

Upload File to Presigned URL

Then upload your PDF file to the presigned URL:

PUT <presigned_upload_url>
Content-Type: application/pdf

<pdf_file_binary_data>

Option B: Multipart Upload (For larger files)

Coming soon – instructions for multipart uploads will be provided to support even larger documents with greater reliability and performance.

Supported File Formats

File Limitations

Maximum file size: 50MB (PutObject upload)
Maximum file size: TBD (Multipart upload)

Step 2: Check Processing Status

GET https://api.pdf-ocr.cobbling.ai/v1/workflows/:instanceId/status
Authorization: Bearer <your_api_key>

Replace :instanceId with the jobGuid received from the upload response.

Response Format

Success Response (200):

{
  "instanceId": "e8e44b74-a155-4458-b90f-e90fd836e15a",
  "status": "complete",
  "error": null,
  "output": {
    "totalPages": 78,
    "processedPages": 78,
    "status": "completed",
    "error": null,
    "createdAt": "2025-07-02T08:51:52.576Z",
    "updatedAt": "2025-07-02T08:54:03.242Z",
    "markdownFiles": [
      {
        "pageNumber": 1,
        "markdown": "# Document Title\n\nThis is the extracted text content from page 1..."
      },
      {
        "pageNumber": 2,
        "markdown": "## Section Header\n\nThis is the extracted text content from page 2..."
      }
    ]
  }
}

Status Values

queued: Workflow is queued for processing
running: Workflow is currently being processed
paused: Workflow execution is paused
errored: Workflow encountered an error
terminated: Workflow was terminated
complete: Workflow completed successfully
waiting: Workflow is waiting
waitingForPause: Workflow is waiting to be paused
unknown: Status is unknown

Error Responses

400 – Bad Request:

{
  "code": "number",
  "message": "Invalid instance ID or workflow not found"
}

401 – Unauthorized:

{
  "code": "number",
  "message": "Invalid or missing bearer token"
}

404 – Workflow Instance Not Found:

{
  "code": "number",
  "message": "Workflow instance not found"
}

500 – Internal Server Error:

{
  "code": "number",
  "message": "Internal server error"
}

Webhook Integration

Use webhooks to receive automatic notifications when the status of a PDF OCR job changes.

Go to the dashboard to create, edit and delete webhooks.

How to Add an Endpoint

In order to start listening to messages, you will need to configure your endpoints. Adding an endpoint is as simple as providing a URL that you control and selecting the event types that you want to listen to. If you don't specify any event types, by default, your endpoint will receive all events, regardless of type. This can be helpful for getting started and for testing, but we recommend changing this to a subset later on to avoid receiving extraneous messages. If your endpoint isn't quite ready to start receiving events, you can press the "with Svix Play" button to have a unique URL generated for you. You'll be able to view and inspect webhooks sent to your Svix Play URL, making it effortless to get started.

How to Test Endpoints

Once you've added an endpoint, you'll want to make sure its working. The "Testing" tab lets you send test events to your endpoint. After sending an example event, you can click into the message to view the message payload, all of the message attempts, and whether it succeeded or failed.

Signature Verification

For information on how to verify webhook signatures, see How to Verify Webhooks with the Svix Libraries.

Fetching Results

After receiving a webhook, use the job_id from the webhook payload to fetch the result:

GET https://api.pdf-ocr.cobbling.ai/v1/jobs/:id
Authorization: Bearer <your_api_key>

Replace :id with the job_id from the webhook notification.

Example Usage

# Step 1: Get presigned URL
curl -X POST "https://api.pdf-ocr.cobbling.ai/v1/upload/direct" \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "fileName": "document.pdf",
    "fileType": "application/pdf"
  }'

# Step 2: Upload PDF to presigned URL (use the URL from step 1 response)
curl -X PUT "https://presigned-upload-url.com/..." \
  -H "Content-Type: application/pdf" \
  --data-binary "@document.pdf"

# Step 3: Check processing status (using jobGuid from step 1 response)
curl -X GET "https://api.pdf-ocr.cobbling.ai/v1/workflows/12345678-1234-1234-1234-123456789abc/status" \
  -H "Authorization: Bearer your_api_key"

# Poll the status endpoint until status is "complete" or "errored"
# When complete, the response will include the extracted text in the output field

Security

API key authentication required for all requests
Request size limited to 200MB
API key revocation available in your account settings
Usage tracking available in your dashboard

Key Benefits​

Features​

Subscription & Pricing​

Generating Your API Key​

API Usage​

Step 1: Upload PDF File​

Option A: PutObject Upload (Recommended for files < 50MB)​

Option B: Multipart Upload (For larger files)​

Supported File Formats​

File Limitations​

Step 2: Check Processing Status​

Response Format​

Status Values​

Error Responses​

Webhook Integration​

How to Add an Endpoint​

How to Test Endpoints​

Signature Verification​

Fetching Results​

Example Usage​

Security​