Build text index to NLP with no human action now (thanks DeBoe tip)
This is an InterSystems IRIS Interoperability OCR Service to extract text from images and pdfs from a file into a multipart request from form or http request.
This application receive a http multipart request with a file, extract text using OCR from Tesseract and returns the result
Make sure you have git and Docker desktop installed.
Clone/git pull the repo into any local directory
$ git clone https://github.com/yurimarx/ocr-service.git
Open the terminal in this directory and run:
$ docker-compose build
$ docker-compose up -d
Open the production
Set host destination folder to the uploaded files. See:
Start the production.
Now Open Postman or create a multipart request into a form pointing to localhost:9980/ using POST with a form-data file attribute. See sample (use an image or pdf with image inside):
See the text returned - support to english and portuguese languages only, in the first version
Send 2 or 3 files with some text
Go to the NLP Domain Explorer
Analyze the texts and enjoy!