Document Artificial Intelligence: Is IT The Next Big Frontier?
Documents are a rich source of information and are commonly interpreted by subject matter experts to extract relevant details. Document analysis is skill-intensive, time consuming and is subject to interpretation. Documents can be broadly classified into three categories structured, unstructured and semi-structured. Structured documents, such as spreadsheets, are relatively easy to process since the format is readily known. However, analysis of unstructured documents like contracts, annual reports and regulations, and semi-structured ones, such as invoices, is complicated, yet artificial intelligence (AI) can efficiently process these documents.
The complexity of the current situation
Unstructured or semi structured documents may be scanned, or digital(machine generated). Optical character recognition(OCR) is required to extract text from scanned ones. The layout of these documents too varies significantly. They may be single column, multi column, having tabular structures, logos, signatures, handwritten text, figures, charts, stamp seals, etc. Hence, the AI solution should be capable of understanding and processing them.
Furthermore, the quality of scanned documents affects the OCR output. Therefore, enhancement and correction for artifacts, such as rotation and noise, is done for a better output. The AI solution then processes this output to determine the layout and extracts details of business relevance.
Key technical steps:
Deep learning (DL),a part of machine learning (ML), uses modern artificial neural networks. The development of document intelligence solutions using DL involves three steps:
1.Annotation of data required to solve the business problemDeep neural networks learn from data and require a good amount of consistently annotated data to train. The business requirements define the scope of the documents to be processed and the information to be extracted. The data should be diverse enough to capture the possible variations of the documents to be processed.
Most often, the requirement is that the AI solution renders the location of the extracted information in the document for review and approval by humans. Hence, we annotate both the text fields and their bounding boxes in the document. For example, the information to be extracted from invoices could include details such as invoice number, date, total amount etc.
2.Selection of appropriate AI solutions and training the models:The choice of the DL model is dependent on the business problem to be solved and the data. Typically, the text from the documents is read using OCR or digital readers and processed. Language models are also trained to understand the business context in textual data. For example, terms such as “vendor” and “supplier” may mean the same in invoices and the language model understands them. Using the annotated data, the DL or ML models are then trained.
In some cases, advanced methods such as automated machine learning (AutoML) is implemented where the AI system evaluates possible options and makes an optimal choice. These DL methods require good computing power and typically use graphic processing units (GPU) to train them. The training happens in batches of data chunks, and the GPUs are optimized for it. The AI models are then evaluated on validation or unseen data and it is determined if they are suitable for business use.
3.Deployment of the AI solutions for integration into business processes: The DL models must be then deployed into production to serve the business needs. They are typically deployed in the cloud as micro services that auto-scale according to the number of documents to be processed at any given instance. This helps optimize the deployment costs to suit the volume and time zone when they are uploaded for processing. The drifts in data and the expected outcome is regularly monitored and the ML models are revised as needed. For example, if a DL model is trained on a certain set of documents but the documents being processed have changed, then model may not perform well and this is referred to as data drift. Humans generally review the output from the DL solutions to ensure the correctness of the results and this feedback is used to improve them constantly.
The rapid progress in AI over the recent years has led to the development of robust solutions for document intelligence. However, many of these solutions cater to a specific set of document types such as invoices or contracts. The application of AI solutions across multiple document types and business areas will evolve with technology. This may lead to broader and more cost-effective use of AI to make the business processes more efficient and to AI becoming more pervasive in our day-to-day work.
The complexity of the current situation
Unstructured or semi structured documents may be scanned, or digital(machine generated). Optical character recognition(OCR) is required to extract text from scanned ones. The layout of these documents too varies significantly. They may be single column, multi column, having tabular structures, logos, signatures, handwritten text, figures, charts, stamp seals, etc. Hence, the AI solution should be capable of understanding and processing them.
Most often, the requirement is that the AI solution renders the location of the extracted information in the document for review and approval by humans
Furthermore, the quality of scanned documents affects the OCR output. Therefore, enhancement and correction for artifacts, such as rotation and noise, is done for a better output. The AI solution then processes this output to determine the layout and extracts details of business relevance.
Key technical steps:
Deep learning (DL),a part of machine learning (ML), uses modern artificial neural networks. The development of document intelligence solutions using DL involves three steps:
1.Annotation of data required to solve the business problemDeep neural networks learn from data and require a good amount of consistently annotated data to train. The business requirements define the scope of the documents to be processed and the information to be extracted. The data should be diverse enough to capture the possible variations of the documents to be processed.
Most often, the requirement is that the AI solution renders the location of the extracted information in the document for review and approval by humans. Hence, we annotate both the text fields and their bounding boxes in the document. For example, the information to be extracted from invoices could include details such as invoice number, date, total amount etc.
2.Selection of appropriate AI solutions and training the models:The choice of the DL model is dependent on the business problem to be solved and the data. Typically, the text from the documents is read using OCR or digital readers and processed. Language models are also trained to understand the business context in textual data. For example, terms such as “vendor” and “supplier” may mean the same in invoices and the language model understands them. Using the annotated data, the DL or ML models are then trained.
In some cases, advanced methods such as automated machine learning (AutoML) is implemented where the AI system evaluates possible options and makes an optimal choice. These DL methods require good computing power and typically use graphic processing units (GPU) to train them. The training happens in batches of data chunks, and the GPUs are optimized for it. The AI models are then evaluated on validation or unseen data and it is determined if they are suitable for business use.
3.Deployment of the AI solutions for integration into business processes: The DL models must be then deployed into production to serve the business needs. They are typically deployed in the cloud as micro services that auto-scale according to the number of documents to be processed at any given instance. This helps optimize the deployment costs to suit the volume and time zone when they are uploaded for processing. The drifts in data and the expected outcome is regularly monitored and the ML models are revised as needed. For example, if a DL model is trained on a certain set of documents but the documents being processed have changed, then model may not perform well and this is referred to as data drift. Humans generally review the output from the DL solutions to ensure the correctness of the results and this feedback is used to improve them constantly.
The rapid progress in AI over the recent years has led to the development of robust solutions for document intelligence. However, many of these solutions cater to a specific set of document types such as invoices or contracts. The application of AI solutions across multiple document types and business areas will evolve with technology. This may lead to broader and more cost-effective use of AI to make the business processes more efficient and to AI becoming more pervasive in our day-to-day work.