Work to do. 1. Description We have bboxes that has been added to the invoices Below the table in the invoice we must consider they are moving in Y-direction since the table with prices expands and retracts.
We have made a model with the table so it detects the table and the values. This can probably be enhanced.
In the attached image I show an example of the problem. Same supplier can send an invoice with different number of lines in the table. This means the text below, the subtotal, VAT and total can be on different pages As you can also see on the invoice the lines can grow and shrink in height itself.
We can do bounding boxes on a very large (many pages) invoice but the next invoice can be even bigger or very small (= 1 page for example). The areas we need to extract then is moving up and down and can be on different pages, but we still need to detect the bbox as we must be able to extract the data from those areas. So we must detect the text itself and utilise the bounding box. another problem is exceeding strings in bboxes. see the next two images.
These are the areas we must solve ASAP -> Weekend work
2. Skills Python MySQL 5+ years vision text extraction from bounding boxes OpenCV, NLP, spaCy, regex, tesseract, OCR, PDFXML, TableNet, DeepDeSRT, Graph neural networks, GANs and genetic algorithm But it is allowed to do it simpler too. You must have done something similar previously and you know what regex and tesseract is and have used it several times. You have worked with vision, ML, DL or NN