You send the document that you want to run OCR on in the file parameter. For example:
/1/api/[async|sync]/ocrdocument/v1?file=cv.jpg
Note: API input is subject to a maximum size quota. If you upload text or a file that is too large, the API returns an error. For more information, see Rate Limiting, Quotas, Data Expiry, and Maximums.
The API returns the extracted text, along with information about the location of the detected text in the original image. The API does not provide a precise layout of the text on the page. The text is primarily useful for adding to a text index for search and retrieval of the original document.
{
"text_block": [
"text": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
"left": 2,
"top": 2,
"width": 800,
"height": 1117
]
}
The API returns the best results for images with high contrast between the text and the background.
By default, the API treats the input as a clean photo of a document. You can improve the accuracy of the results by specifying the the OCR mode that matches the type of image. For example:
/1/api/[async|sync]/ocrdocument/v1?file=mycv.jpg&mode=document_scan
The options for the mode parameter are:
document_photo: (default) Use to recognize text in a document that has been digitized with variable light, such as through a mobile phone camera.
document_scan: Use to recognize text in a document that has been digitized with constant lighting, such as through a flatbed scanner.
scene_photo: Use to recognize text in a scene, for example signs and billboards in a landscape.
subtitle: Use to recognize text superimposed on an image, such as TV subtitles.
For a list of file formats that you can use for images, see Supported Media Formats.
Optimize OCR Results
In general, OCR results are better for high quality images, and where the text is at high contrast (sharp, dark font on a white background).
When you take a picture of text or a document with a handheld camera, the OCR results are better in diffuse lighting. Natural light is diffuse, so photos taken in natural light are generally better for OCR. When this is not possible, try to ensure that the camera is not between the light source and the text, because this positioning can cause glare or cast shadows on the text. For example, if you want to photograph a business card under an overhead light, hold the camera and card perpendicular to the floor, so that the light is above both, rather than laying the card on a table.
Additionally, if you need to use a flash, ensure that the camera is far enough away that the text does not get washed out or saturated.
The image resolution can have an impact on the OCR results. Higher resolution images have more detail, and OCR might interpret background distortions as possible text. In this case, a high resolution picture with a lot of tiny details might give poorer results. However, when the image resolution is too low, the font becomes less sharp and the image becomes pixelated. The quality of your camera affects the ideal size and you might need to test to find the best results settings.
In all cases, use the appropriate mode for your data. For example, if you take a picture of a document or business card, use document_photo, and if you want to identify the text in a picture of road signs, use scene_photo.
The document_scan mode is best for document scans obtained with a good quality scanner, and for computer-generated images. If the page is not flat when scanned, you might get better results by using the document_photo mode.

