Go Back Arrow
Go Back to Blog Collection

Accelerate document information extraction with Aiah's Document Extraction

December 7, 2022

Organizations all over the world still receive and process data in paper. Crucial business processes still include printed forms, invoices, legal documents, and contracts. In 2022, over 54 billion invoices are exchanged worldwide, and you can imagine how much of that is processed manually. The same holds true not just for business, but for other fields such as medicine, journalism, law & government applications, climate science, etc.

Even if companies decide to make the move into paperless, they still have a lot of archival data in print that needs to be sifted through to get the data they need. But manually encoding documents is a slow, tedious, and expensive task.

Here’s where Aiah’s Document Extraction comes in — it can help you expedite the process of information extraction using a combination of machine learning applied to computer vision, deep learning for OCR, and some heuristic methods. All you have to do is call its API, and let it do the heavy lifting for you.

After you get the information you need, you can use it in a number of ways. Save them to a database as you would after manual encoding, slide the data into your data processing pipeline, or even link it to other AI services to form a smart document processing workflow — the world is your oyster!

Now, let’s slowly unravel what Document Extraction is and how it can help your business thrive.

What is Aiah’s Document Extraction?

Aiah’s Document Extraction service is a customizable, low-code AI service that extracts text and other data (signatures, checkboxes, radios, etc.) from fields you declare ahead of time. In each document, you can choose what fields are important and ignore the rest, or extract everything if you need to. Its key properties are flexibility: a lot of other platforms can only extract data from certain document types, and if yours doesn’t fit, you can’t use their services; and customizability: most types of documents can be handled by this service.

How does it work?

Document Extraction works via template matching coupled with machine learning: you register the documents you have as a template, and the service can extract information from documents that match the same template you declared.

It also has capabilities to align any input document to the created template, so you don’t have to worry about images with weird angles. Even with unclear or distorted documents, you can accurately capture and extract fields you only need.

How can it help?

This service directs your attention to acting on data, rather than being bogged down in encoding data.

Document Extraction does not simply dump the extracted text on you unlike traditional OCR services; it has its own facilities to filter information that you need, and have them mapped in a way that makes sense. And compared to encoding manually, automated data extraction can speed up processing time drastically and reduce processing costs in the long run.

Imagine transforming an image to text in under 5 seconds without installing additional apps or the tedious retyping – all you need is a digital image of your document.

How can I use it?

Here’s a rundown on how to use Document Extraction:

  1. Annotate an image using the Annotation Tool. In this example, we’ll use this dummy driver’s license:

  1. After annotating the fields you want to extract from the image, save your template using the Send button at the upper right corner of the annotation tool.

  1. Enter the following:
  1. Your desired template name (e.g. driver’s license template)
  2. X-Aiah-Key - You can copy this after you log in to AI Marketplace in Profile > API Keys (copy your key)
  3. Client-Project-ID - You can copy this by accessing Document Extraction > Playground > get all templates endpoint (see image below)

  1. Click Send. This will register the driver’s license template automatically. Take note of your template_uuid for future use.

  1. Using the template_uuid, you can now extract information from a given document using the extract information endpoint:

  1. You can also directly integrate this service to your code via API (see code snippet).


If this seems interesting for you and your business, you can try out Aiah’s Document Extraction here. And if you have any questions, you can directly email us at hello@aiah.ai.