PDF OCR Service
Cobbling AI's PDF OCR service delivers fast processing, high accuracy, and rich structured outputs for PDF documents.
Key Benefits
- Enhanced recognition quality for complex layouts, tables, and mathematical notation
- Faster turnaround times for large documents through an optimized processing pipeline
- Real-time webhook support to keep your applications up to date
Features
- High-accuracy OCR processing using advanced AI models
- Automatic conversion of:
- Mathematical equations to LaTeX format
- Tables to Markdown format
- Smart handling of:
- Headers and footers
- References and footnotes
- Natural handwriting
Subscription & Pricing
PDF OCR is available through Cobbling AI subscriptions. To get started:
- Sign up for a subscription on the Cobbling AI website
- Generate an API key in the account dashboard
- Use the API key to authenticate requests to the PDF OCR service
Generating Your API Key
After subscribing to the service:
- Log in to your Cobbling AI account
- Create a project from your dashboard
- Navigate to your project and click "Create new secret key"
- Copy and securely store your API key
- Use this key in your API requests
API Usage
The PDF OCR service uses an asynchronous workflow where you upload a PDF file and then check the processing status.
Step 1: Upload PDF File
Choose one of the following upload methods based on your file size and requirements:
Option A: PutObject Upload (Recommended for files < 50MB)
Get Presigned URL
First, request a presigned URL for uploading your PDF:
POST https://api.pdf-ocr.cobbling.ai/v1/upload/direct
Authorization: Bearer <your_api_key>
Content-Type: application/json
{
"fileName": "document.pdf",
"fileType": "application/pdf"
}
Request Parameters
fileName(string, required): Name of the PDF file to upload (max 255 characters)fileType(string, required): MIME type of the file (useapplication/pdffor PDF files)
Response Format
{
"jobGuid": "unique_job_identifier",
"url": "presigned_upload_url",
"key": "upload_key"
}
Upload File to Presigned URL
Then upload your PDF file to the presigned URL:
PUT <presigned_upload_url>
Content-Type: application/pdf
<pdf_file_binary_data>
Option B: Multipart Upload (For larger files)
Coming soon – instructions for multipart uploads will be provided to support even larger documents with greater reliability and performance.
Supported File Formats
File Limitations
- Maximum file size: 50MB (PutObject upload)
- Maximum file size: TBD (Multipart upload)
Step 2: Check Processing Status
GET https://api.pdf-ocr.cobbling.ai/v1/workflows/:instanceId/status
Authorization: Bearer <your_api_key>
Replace :instanceId with the jobGuid received from the upload response.
Response Format
Success Response (200):
{
"instanceId": "e8e44b74-a155-4458-b90f-e90fd836e15a",
"status": "complete",
"error": null,
"output": {
"totalPages": 78,
"processedPages": 78,
"status": "completed",
"error": null,
"createdAt": "2025-07-02T08:51:52.576Z",
"updatedAt": "2025-07-02T08:54:03.242Z",
"markdownFiles": [
{
"pageNumber": 1,
"markdown": "# Document Title\n\nThis is the extracted text content from page 1..."
},
{
"pageNumber": 2,
"markdown": "## Section Header\n\nThis is the extracted text content from page 2..."
}
]
}
}
Status Values
queued: Workflow is queued for processingrunning: Workflow is currently being processedpaused: Workflow execution is pausederrored: Workflow encountered an errorterminated: Workflow was terminatedcomplete: Workflow completed successfullywaiting: Workflow is waitingwaitingForPause: Workflow is waiting to be pausedunknown: Status is unknown
Error Responses
400 – Bad Request:
{
"code": "number",
"message": "Invalid instance ID or workflow not found"
}
401 – Unauthorized:
{
"code": "number",
"message": "Invalid or missing bearer token"
}
404 – Workflow Instance Not Found:
{
"code": "number",
"message": "Workflow instance not found"
}
500 – Internal Server Error:
{
"code": "number",
"message": "Internal server error"
}
Webhook Integration
Use webhooks to receive automatic notifications when the status of a PDF OCR job changes.
Register a Webhook
POST https://api.pdf-ocr.cobbling.ai/v1/webhooks
Authorization: Bearer <your_api_key>
Content-Type: application/json
{
"url": "https://your-server.com/webhook"
}
Response:
{
"url": "https://your-server.com/webhook"
}
Webhook Notifications
Registered webhooks receive events in the following format:
{
"type": "job.completed", // or "job.created", "job.started", "job.failed"
"id": "event-id",
"payload": {
"job": {
"id": 12345,
"status": "completed",
"succeededPages": 10,
"totalPages": 10
}
},
"createdAt": 1677721600000
}
Example Usage
# Step 1: Get presigned URL
curl -X POST "https://api.pdf-ocr.cobbling.ai/v1/upload/direct" \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"fileName": "document.pdf",
"fileType": "application/pdf"
}'
# Step 2: Upload PDF to presigned URL (use the URL from step 1 response)
curl -X PUT "https://presigned-upload-url.com/..." \
-H "Content-Type: application/pdf" \
--data-binary "@document.pdf"
# Step 3: Check processing status (using jobGuid from step 1 response)
curl -X GET "https://api.pdf-ocr.cobbling.ai/v1/workflows/12345678-1234-1234-1234-123456789abc/status" \
-H "Authorization: Bearer your_api_key"
# Poll the status endpoint until status is "complete" or "errored"
# When complete, the response will include the extracted text in the output field
Security
- API key authentication required for all requests
- Request size limited to 200MB
- API key revocation available in your account settings
- Usage tracking available in your dashboard