OCR Document

Quick Start

You send the document that you want to run OCR on in the file parameter. For example:

/1/api/[async|sync]/ocrdocument/v1?file=cv.jpg

Note: API input is subject to a maximum size quota. If you upload text or a file that is too large, the API returns an error. For more information, see Rate Limiting, Quotas, Data Expiry, and Maximums.

The API returns the extracted text, along with information about the location of the detected text in the original image. The API does not provide a precise layout of the text on the page. The text is primarily useful for adding to a text index for search and retrieval of the original document.

{
  "text_block": [
      "text": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
      "left": 2,
      "top": 2,
      "width": 800,
      "height": 1117
  ]
}

The API returns the best results for images with high contrast between the text and the background.

By default, the API treats the input as a clean photo of a document. You can improve the accuracy of the results by specifying the the OCR mode that matches the type of image. For example:

/1/api/[async|sync]/ocrdocument/v1?file=mycv.jpg&mode=document_scan

The options for the mode parameter are:

document_photo: (default) Use to recognize text in a document that has been digitized with variable light, such as through a mobile phone camera.
document_scan: Use to recognize text in a document that has been digitized with constant lighting, such as through a flatbed scanner.
scene_photo: Use to recognize text in a scene, for example signs and billboards in a landscape.
subtitle: Use to recognize text superimposed on an image, such as TV subtitles.

For a list of file formats that you can use for images, see Supported Media Formats.

Optimize OCR Results

In general, OCR results are better for high quality images, and where the text is at high contrast (sharp, dark font on a white background).

When you take a picture of text or a document with a handheld camera, the OCR results are better in diffuse lighting. Natural light is diffuse, so photos taken in natural light are generally better for OCR. When this is not possible, try to ensure that the camera is not between the light source and the text, because this positioning can cause glare or cast shadows on the text. For example, if you want to photograph a business card under an overhead light, hold the camera and card perpendicular to the floor, so that the light is above both, rather than laying the card on a table.

Additionally, if you need to use a flash, ensure that the camera is far enough away that the text does not get washed out or saturated.

The image resolution can have an impact on the OCR results. Higher resolution images have more detail, and OCR might interpret background distortions as possible text. In this case, a high resolution picture with a lot of tiny details might give poorer results. However, when the image resolution is too low, the font becomes less sharp and the image becomes pixelated. The quality of your camera affects the ideal size and you might need to test to find the best results settings.

In all cases, use the appropriate mode for your data. For example, if you take a picture of a document or business card, use document_photo, and if you want to identify the text in a picture of road signs, use scene_photo.

The document_scan mode is best for document scans obtained with a good quality scanner, and for computer-generated images. If the page is not flat when scanned, you might get better results by using the document_photo mode.

Parameter	Description
apikey	The API key to use to authenticate the API request.

Parameter

Description

apikey

The API key to use to authenticate the API request.

Required
Name	Type	Description
file	binary	The image file to process.
reference	string	A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.
url	string	A publicly accessible HTTP URL from which the image can be retrieved.

Required

Name

Type

Description

file

binary

The image file to process.

reference

string

A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.

url

string

A publicly accessible HTTP URL from which the image can be retrieved.

Optional
Name	Type	Description
mode	enum	The type of image to process. Default value: document_photo.

Optional

Name

Type

Description

mode

enum

The type of image to process. Default value: document_photo.

{ "properties": { "text_block": { "items": { "properties": { "height": { "type": "integer" }, "left": { "type": "integer" }, "text": { "type": "string" }, "top": { "type": "integer" }, "width": { "type": "integer" }, "page_num": { "type": "integer" } }, "required": [ "text", "left", "top", "width", "height" ], "type": "object" }, "type": "array" }, "page_count": { "type": "integer" } }, "required": [ "text_block" ], "type": "object" }

Mar	APR	May
	15
2015	2016	2017

Synchronous	https://api.havenondemand.com/1/api/sync/ocrdocument/v1
Asynchronous	https://api.havenondemand.com/1/api/async/ocrdocument/v1

OCR Document Response {
	text_block ( array[Text_block] )	Details of a section of text found in the image. If no text is detected, it returns an empty array [].
	page_count ( integer , optional)	The total number of pages in the document.
}

OCR Document Response:Text_block {
	height ( integer )	The height of the bounding box for the text.
	left ( integer )	The position of the left edge of the bounding box for the text.
	text ( string )	The text extracted in this section of the image.
	top ( integer )	The position of the top edge of the bounding box for the text.
	width ( integer )	The width of the bounding box for the text.
	page_num ( integer , optional)	The page in the document that the text belongs to.
}

Extracts text from an image.

Quick Start

Optimize OCR Results

URL

Authentication

Parameters

Enumeration Types

Asynchronous Use

Model

Model Schema

Parameters

document_photo	Photo of a document Use to recognize text in a document that has been digitized with variable light, such as through a mobile phone camera.
document_scan	Scanned image of a document Use to recognize text in a document that has been digitized with constant lighting, such as through a flatbed scanner.
scene_photo	Photo of a scene containing text Use to recognize text in a scene, for example signs and billboards in a landscape.
subtitle	Text superimposed on an image Use to recognize text superimposed on an image, such as TV subtitles.