Affordable LLM/VLM-powered document extraction. Extract structured data from PDFs, images, and text files with nested schemas. Get the best price-to-performance ratio for OCR. Define your schema, get nested JSON back instantly.
Affordable pricing meets enterprise-grade document extraction
Get the best price-to-performance ratio for OCR. Pay only for what you use with transparent, competitive pricing. No hidden fees or minimum commitments.
Extract from PDFs (up to 1,000 pages), images (JPEG, PNG, WebP, etc.), and text files. One API for all your document processing needs.
Define complex nested objects and arrays with unlimited depth. Extract structured data matching your exact data model - no post-processing needed.
Advanced AI models for rapid document processing. Get results in seconds, not minutes. Optimized for speed and accuracy.
Your data stays protected with enterprise-grade security, geo-blocking, and prompt injection protection. Files processed in-memory only.
Easy integration with any language. Two simple endpoints - upload files or provide URLs. Start extracting data in minutes.
Three simple steps to extract nested structured data
Create a JSON schema with objects, arrays, and nested structures. Support for string, number, boolean, array, and object types with unlimited nesting.
Upload PDFs, images, or text files directly, or provide a URL. Support for multi-page PDFs up to 1,000 pages and 10MB files.
Receive perfectly structured nested JSON matching your schema. Arrays of objects, deeply nested properties - exactly as you defined.
Explore comprehensive tutorials with live examples using real recipe images and PDF documents. Learn how to extract structured data from invoices, receipts, recipes, and more with step-by-step guides.
Extract complex nested data structures from PDFs, images, and text files. Both endpoints use the same schema format and return identical JSON structures.
Extract nested data from PDFs, images, or text files via URL
curl -X POST rapidAPIURL/v1/json \
-H "Content-Type: application/json" \
-H "X-RapidAPI-Key: your-api-key" \
-d '{
"url": "https://example.com/invoice.pdf",
"returnSchema": {
"invoice_number": {
"key": "invoice_number",
"description": "Invoice number",
"type": "string"
},
"customer": {
"key": "customer",
"description": "Customer info",
"type": "object",
"properties": {
"name": {
"key": "name",
"description": "Customer name",
"type": "string"
},
"email": {
"key": "email",
"description": "Email address",
"type": "string"
}
}
},
"items": {
"key": "items",
"description": "Line items",
"type": "array",
"items": {
"key": "item",
"description": "Item details",
"type": "object",
"properties": {
"name": {
"key": "name",
"description": "Item name",
"type": "string"
},
"price": {
"key": "price",
"description": "Item price",
"type": "number"
}
}
}
}
}
}' Upload PDFs, images, or text files directly with nested schemas
curl -X POST rapidAPIURL/v1/form-data \
-H "X-RapidAPI-Key: your-api-key" \
-F "[email protected]" \
-F 'returnSchema={
"invoice_number": {
"key": "invoice_number",
"description": "Invoice number",
"type": "string"
},
"customer": {
"key": "customer",
"description": "Customer info",
"type": "object",
"properties": {
"name": {
"key": "name",
"description": "Customer name",
"type": "string"
},
"email": {
"key": "email",
"description": "Email address",
"type": "string"
}
}
},
"items": {
"key": "items",
"description": "Line items",
"type": "array",
"items": {
"key": "item",
"description": "Item details",
"type": "object",
"properties": {
"name": {
"key": "name",
"description": "Item name",
"type": "string"
},
"price": {
"key": "price",
"description": "Item price",
"type": "number"
}
}
}
}
}' {
"message": "Success",
"returnSchema": {
"invoice_number": "INV-2025-001",
"customer": {
"name": "Acme Corp",
"email": "[email protected]"
},
"items": [
{
"name": "Web Development",
"price": 5000
},
{
"name": "Consulting",
"price": 2500
}
]
}
} Images, PDFs (1,000 pages), Text Files
JPEG, PNG, GIF, WebP, AVIF, BMP
Maximum 10MB per file
String, Number, Boolean, Array, Object
Unlimited depth for objects & arrays
Cheapest LLM/VLM OCR - Pay per use
Extract complex hierarchical data in a single API call - no post-processing needed
Extract repeating structures like invoice line items, recipe ingredients, or transaction lists - each with multiple properties and nested data.
items: [{ name: "...", price: 100, metadata: {...} }] Nest objects within objects to any level. Perfect for addresses, organizational hierarchies, product categories, or complex document structures.
customer โ address โ shipping โ coordinates Combine strings, numbers, booleans, arrays, and objects in any structure. Get strongly-typed data matching your exact application schema.
{ name: string, count: number, active: bool, tags: [] } Get data in the exact format your application needs. Skip parsing, transforming, and reshaping - use the response directly in your database or application.
API Response โ Direct to Database See complex data extraction in action
Extract complex nested data from multi-page PDF invoices. Get customer details, line items as arrays of objects, and deeply nested address information - all in one API call.
curl -X POST rapidAPIURL/v1/json \
-H "Content-Type: application/json" \
-H "X-RapidAPI-Key: your-api-key" \
-d '{
"url": "https://example.com/invoice.pdf",
"returnSchema": {
"invoice_number": {
"key": "invoice_number",
"description": "Invoice ID",
"type": "string"
},
"customer": {
"key": "customer",
"description": "Customer info",
"type": "object",
"properties": {
"name": {"key": "name", "description": "Customer name", "type": "string"},
"address": {
"key": "address",
"description": "Address",
"type": "object",
"properties": {
"street": {"key": "street", "description": "Street", "type": "string"},
"city": {"key": "city", "description": "City", "type": "string"}
}
}
}
},
"items": {
"key": "items",
"description": "Line items",
"type": "array",
"items": {
"key": "item",
"description": "Item",
"type": "object",
"properties": {
"description": {"key": "description", "description": "Item name", "type": "string"},
"quantity": {"key": "quantity", "description": "Qty", "type": "number"},
"price": {"key": "price", "description": "Unit price", "type": "number"}
}
}
},
"total": {"key": "total", "description": "Total amount", "type": "number"}
}
}' {
"message": "Success",
"returnSchema": {
"invoice_number": "INV-2025-1234",
"customer": {
"name": "Acme Corporation",
"address": {
"street": "123 Business Ave",
"city": "San Francisco"
}
},
"items": [
{"description": "Web Development Services", "quantity": 40, "price": 150},
{"description": "Cloud Hosting (Annual)", "quantity": 1, "price": 1200},
{"description": "SSL Certificate", "quantity": 1, "price": 99}
],
"total": 7299
}
} Digitize receipts with nested item arrays. Extract store info, line items with prices and quantities, and payment details - all structured perfectly.
curl -X POST rapidAPIURL/v1/json \
-H "Content-Type: application/json" \
-H "X-RapidAPI-Key: your-api-key" \
-d '{
"url": "https://example.com/receipt-grocery.jpg",
"returnSchema": {
"store_name": {
"key": "store_name",
"description": "Name of the store",
"type": "string"
},
"date": {
"key": "date",
"description": "Purchase date",
"type": "string"
},
"items": {
"key": "items",
"description": "Purchased items",
"type": "array",
"items": {
"key": "item",
"description": "Item details",
"type": "object",
"properties": {
"name": {
"key": "name",
"description": "Item name",
"type": "string"
},
"quantity": {
"key": "quantity",
"description": "Quantity",
"type": "number"
},
"price": {
"key": "price",
"description": "Item price",
"type": "number"
}
}
}
},
"total": {
"key": "total",
"description": "Total amount",
"type": "number"
}
}
}' {
"message": "Success",
"returnSchema": {
"store_name": "Whole Foods Market",
"date": "10/28/2025",
"items": [
{"name": "Organic Bananas", "quantity": 2, "price": 3.98},
{"name": "Almond Milk", "quantity": 1, "price": 4.99},
{"name": "Spinach", "quantity": 1, "price": 3.49},
{"name": "Greek Yogurt", "quantity": 3, "price": 12.00},
{"name": "Chicken Breast", "quantity": 1.5, "price": 23.36}
],
"total": 47.82
}
} Extract recipes with nested ingredients and instructions arrays. Get each ingredient with quantity and each step as a separate object - perfect for recipe apps.
curl -X POST rapidAPIURL/v1/json \
-H "Content-Type: application/json" \
-H "X-RapidAPI-Key: your-api-key" \
-d '{
"url": "https://example.com/recipe-card.jpg",
"returnSchema": {
"recipe_name": {
"key": "recipe_name",
"description": "Recipe title",
"type": "string"
},
"servings": {
"key": "servings",
"description": "Number of servings",
"type": "number"
},
"prep_time": {
"key": "prep_time",
"description": "Prep time in minutes",
"type": "number"
},
"ingredients": {
"key": "ingredients",
"description": "Recipe ingredients",
"type": "array",
"items": {
"key": "ingredient",
"description": "Ingredient details",
"type": "object",
"properties": {
"item": {
"key": "item",
"description": "Ingredient name",
"type": "string"
},
"amount": {
"key": "amount",
"description": "Amount needed",
"type": "string"
}
}
}
},
"instructions": {
"key": "instructions",
"description": "Cooking steps",
"type": "array",
"items": {
"key": "step",
"description": "Cooking step",
"type": "string"
}
}
}
}' {
"message": "Success",
"returnSchema": {
"recipe_name": "Homemade Margherita Pizza",
"servings": 4,
"prep_time": 120,
"ingredients": [
{"item": "Bread flour", "amount": "500g"},
{"item": "Warm water", "amount": "325ml"},
{"item": "Active dry yeast", "amount": "7g"},
{"item": "Salt", "amount": "2 tsp"},
{"item": "Olive oil", "amount": "1 tbsp"},
{"item": "Crushed tomatoes", "amount": "400g"},
{"item": "Fresh mozzarella", "amount": "300g"},
{"item": "Fresh basil leaves", "amount": "handful"}
],
"instructions": [
"Mix warm water and yeast, let sit 5 minutes",
"Combine flour and salt, add yeast mixture and olive oil",
"Knead for 10 minutes until smooth",
"Let rise 1.5 hours",
"Divide into 4 balls, roll out thin",
"Top with tomato sauce, mozzarella, and basil",
"Bake at 475ยฐF for 12-15 minutes until crust is golden"
]
}
} Extract detailed product info with nested specs object and features array. Perfect for e-commerce catalogs, price comparison, and inventory systems.
curl -X POST rapidAPIURL/v1/json \
-H "Content-Type: application/json" \
-H "X-RapidAPI-Key: your-api-key" \
-d '{
"url": "https://example.com/product-laptop.jpg",
"returnSchema": {
"product_name": {
"key": "product_name",
"description": "Full product name",
"type": "string"
},
"brand": {
"key": "brand",
"description": "Brand name",
"type": "string"
},
"price": {
"key": "price",
"description": "Price in dollars",
"type": "number"
},
"specs": {
"key": "specs",
"description": "Technical specifications",
"type": "object",
"properties": {
"processor": {"key": "processor", "description": "CPU", "type": "string"},
"memory": {"key": "memory", "description": "RAM", "type": "string"},
"storage": {"key": "storage", "description": "Storage", "type": "string"},
"display": {"key": "display", "description": "Screen", "type": "string"}
}
},
"features": {
"key": "features",
"description": "Key features",
"type": "array",
"items": {
"key": "feature",
"description": "Feature",
"type": "string"
}
}
}
}' {
"message": "Success",
"returnSchema": {
"product_name": "MacBook Pro 16-inch M3 Max",
"brand": "Apple",
"price": 3499.00,
"specs": {
"processor": "M3 Max chip",
"memory": "36GB unified memory",
"storage": "1TB SSD",
"display": "16.2-inch Liquid Retina XDR"
},
"features": [
"Up to 22 hours battery life",
"Three Thunderbolt 4 ports",
"HDMI port and SDXC card slot",
"MagSafe 3 charging",
"1080p FaceTime HD camera",
"Six-speaker sound system"
]
}
} Analyze photos with nested scene details and detected objects arrays. Extract context, count items, and identify people or objects systematically.
curl -X POST rapidAPIURL/v1/json \
-H "Content-Type: application/json" \
-H "X-RapidAPI-Key: your-api-key" \
-d '{
"url": "https://example.com/event-photo.jpg",
"returnSchema": {
"scene": {
"key": "scene",
"description": "Scene details",
"type": "object",
"properties": {
"setting": {"key": "setting", "description": "Location type", "type": "string"},
"time_of_day": {"key": "time_of_day", "description": "Time period", "type": "string"},
"mood": {"key": "mood", "description": "Atmosphere", "type": "string"}
}
},
"people_count": {
"key": "people_count",
"description": "Number of people",
"type": "number"
},
"activities": {
"key": "activities",
"description": "What people are doing",
"type": "array",
"items": {
"key": "activity",
"description": "An activity",
"type": "string"
}
}
}
}' {
"message": "Success",
"returnSchema": {
"scene": {
"setting": "Outdoor garden party venue",
"time_of_day": "Evening golden hour",
"mood": "Joyful and festive"
},
"people_count": 28,
"activities": [
"Socializing and mingling",
"Dancing",
"Dining at decorated tables",
"Taking photos"
]
}
} Get the most out of your API calls with these proven best practices
The more specific your field descriptions, the better the results. Instead of "date", use "Invoice date in MM/DD/YYYY format" or "Payment due date".
Choose the right data type for each field. Use "number" for quantities and prices, "boolean" for yes/no values, and "string" for text. This ensures proper data formatting.
Group related fields into nested objects. For repeating data like line items or ingredients, use arrays of objects with consistent properties.
For best results with PDFs, ensure they're not password-protected and have clear text (not blurry scans). Native PDFs work better than scanned images.
When extracting lists (invoice items, ingredients, etc.), always use array type with object items. This ensures each entry has all its properties properly structured.
Start with a simple schema and gradually add complexity. Test with sample documents, then refine your descriptions and structure based on results.
Use clear, well-lit images with readable text. Higher resolution images (but under 10MB) produce better results. Avoid heavily compressed or distorted images.
Create reusable schemas for document types you process frequently. Same schema works for all invoices, all receipts, etc. Save schemas for consistent results.
Only request the fields you actually need. Smaller, more focused schemas process faster and use fewer tokens, reducing your costs while maintaining accuracy.
Start extracting nested data from PDFs, images, and text files. Unbeatable pricing with pay-per-use billing.