PDF OCR Service
An intelligent PDF OCR (Optical Character Recognition) service powered by advanced AI technology, available through Cobbling AI. This service provides enhanced text extraction capabilities from PDF documents with special handling for mathematical equations, tables, and natural handwriting.
Features
- High-accuracy OCR processing using advanced AI models
- Automatic conversion of:
- Mathematical equations to LaTeX format
- Tables to Markdown format
- Smart handling of:
- Headers and footers
- References and footnotes
- Natural handwriting
Subscription Required
The PDF OCR service is a premium feature that requires an active subscription. To use this service:
- Sign up for a subscription on the Cobbling AI website
- Generate your API key in your account dashboard
- Use the API key to access the OCR service
Generating Your API Key
After subscribing to the service:
- Log in to your Cobbling AI account
- First create a project from your dashboard
- Navigate to your project and click "Create new secret key"
- Copy and securely store your API key
- Use this key in your API requests
API Usage
The PDF OCR service uses an asynchronous workflow where you upload a PDF file and then check the processing status.
Step 1: Upload PDF File
Get Presigned URL
First, request a presigned URL for uploading your PDF:
POST https://api.pdf-ocr.cobbling.ai/v1/upload/direct
Authorization: Bearer <your_api_key>
Content-Type: application/json
{
"fileName": "document.pdf",
"fileType": "application/pdf"
}
Request Parameters
fileName
(string, required): Name of the PDF file to upload (max 255 characters)fileType
(string, required): MIME type of the file (use "application/pdf" for PDF files)
Response Format
{
"jobGuid": "unique_job_identifier",
"url": "presigned_upload_url",
"key": "upload_key"
}
Upload File to Presigned URL
Then upload your PDF file to the presigned URL:
PUT <presigned_upload_url>
Content-Type: application/pdf
<pdf_file_binary_data>
Supported File Formats
File Limitations
- Maximum file size: 50MB
Step 2: Check Processing Status
GET https://api.pdf-ocr.cobbling.ai/v1/workflows/:instanceId/status
Authorization: Bearer <your_api_key>
Replace :instanceId
with the jobGuid
received from the upload response.
Response Format
Success Response (200):
{
"instanceId": "e8e44b74-a155-4458-b90f-e90fd836e15a",
"status": "complete",
"error": null,
"output": {
"totalPages": 78,
"processedPages": 78,
"status": "completed",
"error": null,
"createdAt": "2025-07-02T08:51:52.576Z",
"updatedAt": "2025-07-02T08:54:03.242Z",
"markdownFiles": [
{
"pageNumber": 1,
"markdown": "# Document Title\n\nThis is the extracted text content from page 1..."
},
{
"pageNumber": 2,
"markdown": "## Section Header\n\nThis is the extracted text content from page 2..."
}
]
}
}
Status Values
queued
: Workflow is queued for processingrunning
: Workflow is currently being processedpaused
: Workflow execution is pausederrored
: Workflow encountered an errorterminated
: Workflow was terminatedcomplete
: Workflow completed successfullywaiting
: Workflow is waitingwaitingForPause
: Workflow is waiting to be pausedunknown
: Status is unknown
Error Responses
400 - Bad Request:
{
"code": "number",
"message": "Invalid instance ID or workflow not found"
}
401 - Unauthorized:
{
"code": "number",
"message": "Invalid or missing bearer token"
}
404 - Workflow Instance Not Found:
{
"code": "number",
"message": "Workflow instance not found"
}
500 - Internal Server Error:
{
"code": "number",
"message": "Internal server error"
}
Example Usage
# Step 1: Get presigned URL
curl -X POST "https://api.pdf-ocr.cobbling.ai/v1/upload/direct" \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"fileName": "document.pdf",
"fileType": "application/pdf"
}'
# Response will contain jobGuid, url, and key
# Example response:
# {
# "jobGuid": "12345678-1234-1234-1234-123456789abc",
# "url": "https://presigned-upload-url.com/...",
# "key": "upload-key"
# }
# Step 2: Upload PDF to presigned URL (use the URL from step 1 response)
curl -X PUT "https://presigned-upload-url.com/..." \
-H "Content-Type: application/pdf" \
--data-binary "@document.pdf"
# Step 3: Check processing status (using jobGuid from step 1 response)
curl -X GET "https://api.pdf-ocr.cobbling.ai/v1/workflows/12345678-1234-1234-1234-123456789abc/status" \
-H "Authorization: Bearer your_api_key"
# Poll the status endpoint until status is "complete" or "errored"
# When complete, the response will include the extracted text in the output field
Security
- API key authentication required for all requests
- Request size limited to 50MB
- API key revocation available in your account settings
- Usage tracking available in your dashboard