>95.7% Accuracy

Please use a larger screen to view the documentation.

Reference

API Reference

Detailed information about JustExtract.it API endpoints, request/response formats, and authentication.

Introduction

The JustExtract.it API provides programmatic access to our data extraction capabilities. All API access is over HTTPS, and all data is sent and received as JSON.

The base URL for all API endpoints is: https://api.justextract.it

Authentication

API requests are authenticated using API Keys. You must include your API key in the Authorization header of your requests as a Bearer token.

1Authorization: Bearer YOUR_API_KEY

You can obtain an API key by following the steps in the Introduction guide.

Keep Your API Key Secure

Treat your API key like a password. Do not share it publicly or embed it directly in client-side code.

Endpoints

`GET` /api/file-url

Generates secure, pre-signed URLs for uploading a file to our system and for accessing it later during the extraction process.

Request

No request body is required for this endpoint.

1curl -X GET https://api.justextract.it/api/file-url \
2-H "Authorization: Bearer YOUR_API_KEY"

Response `200 OK`

1{
2  "put_url": "https://s3-presigned-url-for-put...",
3  "get_url": "https://s3-presigned-url-for-get...",
4  "file_key": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
5  "expires_in": 3600
6}

put_url (string): The URL to use for PUTting the file.
get_url (string): The URL that can be used to GET the file. This URL should be passed to the /api/extract endpoint.
file_key (string): A unique identifier for the file.
expires_in (integer): The duration in seconds for which the URLs are valid (e.g., 3600 for 1 hour).

`POST` /api/extract

Submits a document for data extraction. You can specify filters to refine which parts of the document are processed.

Request Body

1curl -X POST https://api.justextract.it/api/extract \
2-H "Authorization: Bearer YOUR_API_KEY" \
3-H "Content-Type: application/json" \
4-d '{
5  "url": "YOUR_DOCUMENT_GET_URL_OR_PUBLIC_URL",
6  "filters": [
7    {
8      "pages": [1, 2],
9      "include": true
10    },
11    {
12      "query": "Find summary sections",
13      "include": true
14    }
15    // ... other filter objects
16  ]
17}'

url (string, required): The URL of the document to process. This should be the get_url obtained from the /api/file-url endpoint, or a publicly accessible URL to your document.
filters (array of Filter Objects, optional): An array of Filter Objects to apply to the document. See the Development guide for filter usage and cost implications.

Response `200 OK`

Returns a task_id that can be used to track the extraction progress.

1{
2  "task_id": "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
3}

`POST` /api/translate

Submits text for translation to a specified target language. The translation is processed asynchronously, and you can check the status using the task ID returned in the response.

Request Body

1curl -X POST https://api.justextract.it/api/translate \
2-H "Authorization: Bearer YOUR_API_KEY" \
3-H "Content-Type: application/json" \
4-d '{
5  "text": "Hello, how are you?",
6  "target_language": "Spanish"
7}'

text (string, required): The text to be translated.
target_language (string, required): The language to translate the text into. The language should be specified in English (e.g., "Spanish", "French", "German").

Response `200 OK`

Returns a task_id that can be used to track the translation progress.

1{
2  "task_id": "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
3}

Translation Result

When checking the task status, a successful translation will return a result object with the following structure:

1{
2  "translation": "Hola, ¿cómo estás?",
3  "source_language": "English",
4  "target_language": "Spanish"
5}

translation (string): The translated text.
source_language (string): The detected source language of the input text.
target_language (string): The language the text was translated into.

Cost Information

Each translation request counts as 1 transformation. Make sure you have sufficient balance before making requests.

`GET` /api/task/{task_id}

Retrieves the status and results of an asynchronous extraction task.

Path Parameters

task_id (string, required): The ID of the task, obtained from the /api/extract endpoint response.

1curl -X GET https://api.justextract.it/api/task/YOUR_TASK_ID \
2-H "Authorization: Bearer YOUR_API_KEY"

Response `200 OK`

Returns a Task Object detailing the job's status and results if completed.

1{
2  "id": "1cef1b65-7e6c-4326-8ae6-230a20220f30",
3  "stage": 100,
4  "status": "Extraction done.",
5  "nodes": [
6    {
7      "output_data": [
8        {
9          "page": 1,
10          "values": {
11            "header": {
12              "type": "object",
13              "value": {
14                "company_name": { "type": "string", "value": "Example Corp" },
15                "document_title": { "type": "string", "value": "Invoice" }
16              }
17            },
18            "totals": {
19              "type": "object",
20              "value": {
21                "total": { "type": "number", "value": 100.00 },
22                "currency": { "type": "string", "value": "USD" }
23              }
24            }
25          }
26        }
27      ],
28      "languages_detected": ["English"],
29      "handwritten_detected": false
30    }
31  ]
32}

If the task is not found, a 404 Not Found error is returned.

`GET` /ping

A simple health check endpoint to verify API availability. No authentication is required.

Response `200 OK`

1pong

Extraction Filter Objects

Filter objects are used in the filters array of the POST /api/extract request. The API infers the filter type from the unique keys provided. All filter objects must include an include: boolean field.

Cost Impact of Filters

Applying multiple filters scales your cost linearly. Each filter applied to the document incurs a processing cost. Refer to the Development guide for more details.

Keyword Filter

Filters pages based on the presence or absence of specified keywords.

keywords (array of strings, required): Keywords to search for.
include (boolean, required): If true, include pages with keywords; if false, exclude.

1{
2    "keywords": ["invoice", "summary"],
3    "include": true 
4    // Set to false to exclude pages with these keywords
5  }

Page Number Filter

Filters pages based on a list of page numbers (1-indexed).

pages (array of integers, required): 1-indexed page numbers.
include (boolean, required): If true, include specified pages; if false, exclude.

1{
2    "pages": [1, 3, 5], // Page numbers are 1-indexed.
3    "include": true 
4    // Set to false to extract all pages EXCEPT 1, 3, 5
5  }

Content Type Filter

Filters pages based on the types of content they contain.

content_types (array of strings, required): Types like "table", "image", "text", "hyperlink".
include (boolean, required): If true, include pages with specified content types; if false, exclude.

1{
2    "content_types": ["table", "image"],
3    "include": true 
4    // Set to false to exclude pages with tables or images
5  }

Page Orientation Filter

Filters pages based on their orientation.

orientation (string, required): "portrait" or "landscape".
include (boolean, required): If true, include pages with specified orientation; if false, exclude.

1{
2    "orientation": "landscape", // "portrait" or "landscape"
3    "include": true
4    // Set to false to include only portrait pages
5  }

Custom Filter

Filters pages based on relevance to a natural language query.

query (string, required): The natural language query.
include (boolean, required): If true, include pages relevant to the query; if false, exclude.

1{
2    "query": "Find pages discussing financial projections for Q4.",
3    "include": true
4  }

Common Objects

Task Object

The Task object represents the state and result of an asynchronous job. It is returned by the GET /api/task/{task_id} endpoint.

1{
2  "id": "1cef1b65-7e6c-4326-8ae6-230a20220f30",
3  "stage": 100,
4  "status": "Extraction done.",
5  "nodes": [
6    {
7      "output_data": [
8        {
9          "page": 1,
10          "values": {
11            "header": {
12              "type": "object",
13              "value": {
14                "company_name": { "type": "string", "value": "Example Corp" },
15                "document_title": { "type": "string", "value": "Invoice" }
16              }
17            },
18            "totals": {
19              "type": "object",
20              "value": {
21                "total": { "type": "number", "value": 100.00 },
22                "currency": { "type": "string", "value": "USD" }
23              }
24            }
25          }
26        }
27      ],
28      "languages_detected": ["English"],
29      "handwritten_detected": false
30    }
31  ]
32}

id (string): Unique identifier for the task.
stage (number): Progress indicator for the task (0-100).
status (string): Current status message of the task. Examples:
- Extraction done.: For successful data extraction
- Translation done.: For successful translations
- Task failed.: When an error occurs
nodes (array): Contains the output data or error information:
- For extraction tasks, contains output_data with extracted information,languages_detected, and handwritten_detected
- For translation tasks, contains output_data with the translation result
- For failed tasks, contains an error field with the error description

Error Handling

JustExtract.it API uses standard HTTP status codes to indicate the success or failure of an API request.

2xx status codes indicate success.
4xx status codes indicate a client-side error (e.g., invalid parameters, authentication failure). The response body will typically contain a JSON object with a detail field explaining the error.
5xx status codes indicate a server-side error. These are rare, but if you encounter one, please wait a moment and try again. If the problem persists, contact support.

Example Error Response (400 Bad Request):

1{
2  "detail": "Validation Error: 'url' field is required."
3}

API Reference

Introduction

Authentication

Keep Your API Key Secure

Endpoints

GET /api/file-url

Request

Response 200 OK

POST /api/extract

Request Body

Response 200 OK

POST /api/translate

Request Body

Response 200 OK

Translation Result

Cost Information

GET /api/task/{task_id}

Path Parameters

Response 200 OK

GET /ping

Response 200 OK

Extraction Filter Objects

Cost Impact of Filters

Keyword Filter

Page Number Filter

Content Type Filter

Page Orientation Filter

Custom Filter

Common Objects

Task Object

Error Handling

`GET` /api/file-url

Response `200 OK`

`POST` /api/extract

Response `200 OK`

`POST` /api/translate

Response `200 OK`

`GET` /api/task/{task_id}

Response `200 OK`

`GET` /ping

Response `200 OK`