
>95.7% Accuracy
Advanced
Development
Explore advanced configurations, filtering capabilities, and other useful API features.
API Endpoints Recap
Here's a quick summary of the primary API endpoints you'll interact with:
GET /api/file-url
: Generates pre-signed URLs for secure file uploads. Your first step to getting your document into our system.POST /api/extract
: Submits a document for extraction. This is where you can specify filters to refine the content processed.GET /api/task/{task_id}
: Retrieves the status and results of an extraction task.GET /ping
: A simple health check endpoint.
All endpoints require Bearer token authentication as described in the Quick Start guide.
Advanced Filtering
JustExtract.it allows you to apply various filters to your documents before the main extraction process. This is useful for focusing the extraction on specific pages or types of content, which can improve accuracy and reduce processing time.
Filters are provided as an array of objects in the filters
field of the POST /api/extract
request body. The API infers the type of filter based on the parameters you provide in each filter object (e.g., providing a keywords
array indicates a KeywordFilter). Each filter object must also include an include: boolean
field. The extraction process will apply these filters to select relevant pages from your document before performing the deep content analysis.
Keyword Filter
To use the Keyword Filter, provide a keywords
array and an include
flag. It allows you to include or exclude pages based on the presence of specific keywords.
keywords
: An array of strings representing the keywords to search for.include
: A boolean. Iftrue
, only pages containing at least one of the keywords are processed. Iffalse
, pages containing any of the keywords are excluded.
1{
2 "keywords": ["invoice", "summary"],
3 "include": true
4 // Set to false to exclude pages with these keywords
5 }
Page Number Filter
To use the Page Number Filter, provide a pages
array (1-indexed) and an include
flag. This selects or de-selects pages based on their page numbers.
pages
: An array of integers representing the page numbers (1-indexed, e.g., the first page is1
).include
: A boolean. Iftrue
, only the specified pages are processed. Iffalse
, the specified pages are excluded from processing.
1{
2 "pages": [1, 3, 5], // Page numbers are 1-indexed.
3 "include": true
4 // Set to false to extract all pages EXCEPT 1, 3, 5
5 }
Content Type Filter
To use the Content Type Filter, provide a content_types
array and an include
flag. This allows you to filter pages based on the types of content they contain.
content_types
: An array of strings specifying the content types to filter by. Supported types includetable
,image
,text
,hyperlink
.include
: A boolean. Iftrue
, only pages containing at least one of the specified content types are processed. Iffalse
, pages containing any of the specified content types are excluded.
1{
2 "content_types": ["table", "image"],
3 "include": true
4 // Set to false to exclude pages with tables or images
5 }
Page Orientation Filter
To use the Page Orientation Filter, provide an orientation
string and an include
flag. This selects pages based on their orientation.
orientation
: A string, either"portrait"
or"landscape"
.include
: A boolean. Iftrue
, only pages with the specified orientation are processed. Iffalse
, pages with the specified orientation are excluded.
1{
2 "orientation": "landscape", // "portrait" or "landscape"
3 "include": true
4 // Set to false to include only portrait pages
5 }
Custom Filter
To use the Custom Filter, provide a query
string and an include
flag. This offers a flexible way to select pages based on relevance to a natural language query.
query
: A string containing your natural language query (e.g., "Find pages related to financial statements").include
: A boolean. Iftrue
, only pages deemed relevant to the query are processed. Iffalse
, pages relevant to the query are excluded.
1{
2 "query": "Find pages discussing financial projections for Q4.",
3 "include": true
4 }
Using Filters Example
You can combine multiple filters in the filters
array to create highly specific selection criteria. Filters are generally applied in the order they appear, though the exact interaction may depend on the combination.
Cost Impact
filters
array. Applying multiple filters scales your cost linearly. For example, if you apply 3 filters, the cost will be 3x the cost of a single filter.Here's an example of a POST /api/extract
request that uses several filters:
1curl -X POST https://api.justextract.it/api/extract \
2 -H "Authorization: Bearer YOUR_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "url": "YOUR_DOCUMENT_GET_URL",
6 "filters": [
7 {
8 "pages": [1, 2, 3, 4, 5],
9 "include": true
10 },
11 {
12 "keywords": ["confidential", "internal use only"],
13 "include": false
14 },
15 {
16 "content_types": ["table"],
17 "include": true
18 },
19 {
20 "query": "Extract sections related to project milestones.",
21 "include": true
22 }
23 ]
24 }'
In this example, the API will first consider only pages 1 through 5. Then, from those pages, it will exclude any that contain "confidential" or "internal use only". From the remaining pages, it will only process those that contain at least one table. Finally, from that subset, it will apply the custom query to find sections about project milestones. The final extraction will be performed on the pages that satisfy all these conditions.
Health Check
To quickly verify if the JustExtract.it API is operational, you can use the /ping
endpoint. It requires no authentication.
Endpoint: GET /ping
A successful response will return a status code of 200 OK
with the body "pong"
.
1curl -X GET https://api.justextract.it/ping