Jump to top

Text Recognition

Get started with ML Kit Vision Text Recognition.

Text recognition can automate tedious data entry for credit cards, receipts, and business cards. With the Cloud-based API, you can also extract text from pictures of documents, which you can use to increase accessibility or translate documents.

Once an image file has been processed, the API returns a VisionDocumentText, referencing all found text along with each VisionDocumentTextBlock. Each block contains meta-data such as:

  • The 4-point coordinates of the box on the document.
  • Paragraphs within the block.
  • Recognized languages within the block/document.
  • The confidence the Machine Learning service has in it's own results.

Cloud Text Recognition

The cloud based text recognition service uploads a given image of a document to the remote Firebase service which processes the results and returns them. Only image file types are allowed. To get started, call the cloudDocumentTextRecognizerProcessImage method with a path to a local file on your device:

import { utils } from '@react-native-firebase/app';
import vision from '@react-native-firebase/ml-vision';

async function processDocument(localPath) {
  const processed = await vision().cloudDocumentTextRecognizerProcessImage(localPath);

  console.log('Found text in document: ', processed.text);

  processed.blocks.forEach(block => {
    console.log('Found block with text: ', block.text);
    console.log('Confidence in block: ', block.confidence);
    console.log('Languages found in block: ', block.recognizedLanguages);
  });
}

// Local path to file on the device
const localFile = `${utils.FilePath.PICTURES_DIRECTORY}/text-document.jpg`;

processDocument(localFile).then(() => console.log('Finished processing file.'));

To learn more about the available data on a processed document, view the VisionDocumentText documentation.

Configuration

To help improve the results when using the cloud service, you can optionally provide arguments to the cloudDocumentTextRecognizerProcessImage method:

const processed = await vision().cloudDocumentTextRecognizerProcessImage(documentPath, {
  // The document contains Kurdish
  languageHints: ['KU'],
});

In most scenarios, not providing any hints will yield better results. Use this configuration if the cloud service is struggling to detect a language.

View the VisionCloudDocumentTextRecognizerOptions documentation for more information.

On-device Text Recognition

Running the ML Kit service on a device requires the ml_vision_ocr_model to be download to the device. Although the results of on-device processing will be faster and more accurate, including the model in your application will increase the size of the application.

Enable the model

To enable the mode, set the ml_vision_ocr_model key to true in your firebase.json file:

// <project-root>/firebase.json
{
  "react-native": {
    "ml_vision_ocr_model": true
  }
}

Once complete, rebuild your application:

# For Android
npx react-native run-android

# For iOS
cd ios/ && pod install --repo-update
npx react-native run-ios

Process

Once the model has been downloaded, call the textRecognizerProcessImage method with a path to a local file on your device:

import { utils } from '@react-native-firebase/app';
import vision from '@react-native-firebase/ml-vision';

async function processDocument(localPath) {
  const processed = await vision().textRecognizerProcessImage(localPath);

  console.log('Found text in document: ', processed.text);

  processed.blocks.forEach(block => {
    console.log('Found block with text: ', block.text);
    console.log('Confidence in block: ', block.confidence);
    console.log('Languages found in block: ', block.recognizedLanguages);
  });
}

// Local path to file on the device
const localFile = `${utils.FilePath.PICTURES_DIRECTORY}/text-document.jpg`;

processDocument(localFile).then(() => console.log('Finished processing file.'));