
>95.7% Accuracy
Reference
API Reference
Detailed information about JustExtract.it API endpoints, request/response formats, and authentication.
Introduction
The JustExtract.it API provides programmatic access to our data extraction capabilities. All API access is over HTTPS, and all data is sent and received as JSON.
The base URL for all API endpoints is: https://api.justextract.it
Authentication
API requests are authenticated using API Keys. You must include your API key in the Authorization
header of your requests as a Bearer token.
1Authorization: Bearer YOUR_API_KEY
You can obtain an API key by following the steps in the Introduction guide.
Keep Your API Key Secure
Endpoints
GET
/api/file-url
Generates secure, pre-signed URLs for uploading a file to our system and for accessing it later during the extraction process.
Request
No request body is required for this endpoint.
1curl -X GET https://api.justextract.it/api/file-url \
2-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
1{
2 "put_url": "https://s3-presigned-url-for-put...",
3 "get_url": "https://s3-presigned-url-for-get...",
4 "file_key": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
5 "expires_in": 3600
6}
put_url
(string): The URL to use forPUT
ting the file.get_url
(string): The URL that can be used toGET
the file. This URL should be passed to the/api/extract
endpoint.file_key
(string): A unique identifier for the file.expires_in
(integer): The duration in seconds for which the URLs are valid (e.g., 3600 for 1 hour).
POST
/api/extract
Submits a document for data extraction. You can specify filters to refine which parts of the document are processed.
Request Body
1curl -X POST https://api.justextract.it/api/extract \
2-H "Authorization: Bearer YOUR_API_KEY" \
3-H "Content-Type: application/json" \
4-d '{
5 "url": "YOUR_DOCUMENT_GET_URL_OR_PUBLIC_URL",
6 "filters": [
7 {
8 "pages": [1, 2],
9 "include": true
10 },
11 {
12 "query": "Find summary sections",
13 "include": true
14 }
15 // ... other filter objects
16 ]
17}'
url
(string, required): The URL of the document to process. This should be theget_url
obtained from the/api/file-url
endpoint, or a publicly accessible URL to your document.filters
(array of Filter Objects, optional): An array of Filter Objects to apply to the document. See the Development guide for filter usage and cost implications.
Response 200 OK
Returns a task_id
that can be used to track the extraction progress.
1{
2 "task_id": "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
3}
POST
/api/translate
Submits text for translation to a specified target language. The translation is processed asynchronously, and you can check the status using the task ID returned in the response.
Request Body
1curl -X POST https://api.justextract.it/api/translate \
2-H "Authorization: Bearer YOUR_API_KEY" \
3-H "Content-Type: application/json" \
4-d '{
5 "text": "Hello, how are you?",
6 "target_language": "Spanish"
7}'
text
(string, required): The text to be translated.target_language
(string, required): The language to translate the text into. The language should be specified in English (e.g., "Spanish", "French", "German").
Response 200 OK
Returns a task_id
that can be used to track the translation progress.
1{
2 "task_id": "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
3}
Translation Result
When checking the task status, a successful translation will return a result object with the following structure:
1{
2 "translation": "Hola, ¿cómo estás?",
3 "source_language": "English",
4 "target_language": "Spanish"
5}
translation
(string): The translated text.source_language
(string): The detected source language of the input text.target_language
(string): The language the text was translated into.
Cost Information
GET
/api/task/{task_id}
Retrieves the status and results of an asynchronous extraction task.
Path Parameters
task_id
(string, required): The ID of the task, obtained from the/api/extract
endpoint response.
1curl -X GET https://api.justextract.it/api/task/YOUR_TASK_ID \
2-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
Returns a Task Object detailing the job's status and results if completed.
1{
2 "id": "1cef1b65-7e6c-4326-8ae6-230a20220f30",
3 "stage": 100,
4 "status": "Extraction done.",
5 "nodes": [
6 {
7 "output_data": [
8 {
9 "page": 1,
10 "values": {
11 "header": {
12 "type": "object",
13 "value": {
14 "company_name": { "type": "string", "value": "Example Corp" },
15 "document_title": { "type": "string", "value": "Invoice" }
16 }
17 },
18 "totals": {
19 "type": "object",
20 "value": {
21 "total": { "type": "number", "value": 100.00 },
22 "currency": { "type": "string", "value": "USD" }
23 }
24 }
25 }
26 }
27 ],
28 "languages_detected": ["English"],
29 "handwritten_detected": false
30 }
31 ]
32}
If the task is not found, a 404 Not Found
error is returned.
GET
/ping
A simple health check endpoint to verify API availability. No authentication is required.
Response 200 OK
1pong
Extraction Filter Objects
Filter objects are used in the filters
array of the POST /api/extract
request. The API infers the filter type from the unique keys provided. All filter objects must include an include: boolean
field.
Cost Impact of Filters
Keyword Filter
Filters pages based on the presence or absence of specified keywords.
keywords
(array of strings, required): Keywords to search for.include
(boolean, required): If true, include pages with keywords; if false, exclude.
1{
2 "keywords": ["invoice", "summary"],
3 "include": true
4 // Set to false to exclude pages with these keywords
5 }
Page Number Filter
Filters pages based on a list of page numbers (1-indexed).
pages
(array of integers, required): 1-indexed page numbers.include
(boolean, required): If true, include specified pages; if false, exclude.
1{
2 "pages": [1, 3, 5], // Page numbers are 1-indexed.
3 "include": true
4 // Set to false to extract all pages EXCEPT 1, 3, 5
5 }
Content Type Filter
Filters pages based on the types of content they contain.
content_types
(array of strings, required): Types like "table", "image", "text", "hyperlink".include
(boolean, required): If true, include pages with specified content types; if false, exclude.
1{
2 "content_types": ["table", "image"],
3 "include": true
4 // Set to false to exclude pages with tables or images
5 }
Page Orientation Filter
Filters pages based on their orientation.
orientation
(string, required): "portrait" or "landscape".include
(boolean, required): If true, include pages with specified orientation; if false, exclude.
1{
2 "orientation": "landscape", // "portrait" or "landscape"
3 "include": true
4 // Set to false to include only portrait pages
5 }
Custom Filter
Filters pages based on relevance to a natural language query.
query
(string, required): The natural language query.include
(boolean, required): If true, include pages relevant to the query; if false, exclude.
1{
2 "query": "Find pages discussing financial projections for Q4.",
3 "include": true
4 }
Common Objects
Task Object
The Task object represents the state and result of an asynchronous job. It is returned by the GET /api/task/{task_id}
endpoint.
1{
2 "id": "1cef1b65-7e6c-4326-8ae6-230a20220f30",
3 "stage": 100,
4 "status": "Extraction done.",
5 "nodes": [
6 {
7 "output_data": [
8 {
9 "page": 1,
10 "values": {
11 "header": {
12 "type": "object",
13 "value": {
14 "company_name": { "type": "string", "value": "Example Corp" },
15 "document_title": { "type": "string", "value": "Invoice" }
16 }
17 },
18 "totals": {
19 "type": "object",
20 "value": {
21 "total": { "type": "number", "value": 100.00 },
22 "currency": { "type": "string", "value": "USD" }
23 }
24 }
25 }
26 }
27 ],
28 "languages_detected": ["English"],
29 "handwritten_detected": false
30 }
31 ]
32}
id
(string): Unique identifier for the task.stage
(number): Progress indicator for the task (0-100).status
(string): Current status message of the task. Examples:Extraction done.
: For successful data extractionTranslation done.
: For successful translationsTask failed.
: When an error occurs
nodes
(array): Contains the output data or error information:- For extraction tasks, contains
output_data
with extracted information,languages_detected
, andhandwritten_detected
- For translation tasks, contains
output_data
with the translation result - For failed tasks, contains an
error
field with the error description
- For extraction tasks, contains
Error Handling
JustExtract.it API uses standard HTTP status codes to indicate the success or failure of an API request.
2xx
status codes indicate success.4xx
status codes indicate a client-side error (e.g., invalid parameters, authentication failure). The response body will typically contain a JSON object with adetail
field explaining the error.5xx
status codes indicate a server-side error. These are rare, but if you encounter one, please wait a moment and try again. If the problem persists, contact support.
Example Error Response (400 Bad Request
):
1{
2 "detail": "Validation Error: 'url' field is required."
3}