Data extraction from image-based documents

Evolution AI are challenging the status quo by combining the latest advances in neural network architectures for computer vision and machine reading

Evolution AI’s machine reading technology can read any kind of document, even scanned images of documents with complex layouts (such as an invoice or a cash flow statement).

Traditionally, computers are not capable of understanding the text contained in images without first applying a method called Optical Character Recognition (OCR). The major downside of OCR is the need for human intervention in setting up separate templates for all the different types of documents due to the wide variety of possible formats. Even the slightest difference in format can result in significant inaccuracies and requires setting up a new template.

Evolution AI are challenging the status quo by combining the latest advances in neural network architectures for Computer Vision and Natural Language Processing. Our engine understands the overall structure of the page (including tables, white space, column headers, line items, etc.), just like a human would. For example, it understands the relationship between field names and values in documents like invoices, even if the format of the invoice has never been seen before. This is game-changing for the industry and paves the way to complete automation.

Evolution AI’s algorithms analyse the document layout before extracting the text, while OCR systems are built to only recognise characters (they only recognise the text marked in red, with no consideration given to the layout and visual design elements).

The data extraction engine uses a combination of advanced Computer Vision, Deep Learning and NLP algorithms and has been trained on millions of documents, including invoices and financial statements. This approach has significant advantages, such as:

  • It’s data efficient — we don't need more than a few training examples in order to customise it for your specific use case.
  • It’s completely autonomous — no need to set up templates, like in the traditional OCR approach.