Getting Started
The ocrlayout library contains our main class named BBoxHelper.
Before you start¶
ocrlayout package has been developed for Python 3.7+. Refer to the package documentation for more details.
Ocrlayout package install¶
pip install ocrlayout
Import the BBoxHelper and BBOXOCRResponse classes¶
from ocrlayout.bboxhelper import BBOXOCRResponse,BBoxHelper
Prepare your OCR engine(s) call(s)¶
Depending on your preferred OCR service Azure, Google or AWS, follow the corresponding instructions.
Azure¶
Set the below 2 environment variables in your OS env.
COMPUTERVISION_SUBSCRIPTION_KEY
COMPUTERVISION_LOCATION
The ComputerVision location refers to the region you have registered your Azure Computer Vision service. You only need the region there.
COMPUTERVISION_SUBSCRIPTION_KEY="..."
COMPUTERVISION_LOCATION="westeurope"
Google¶
Google documentation to authenticate the Google Client
AWS¶
AWS documentation to get started with Amazon Textract
Calling the BBoxHelper main method¶
if you are not familiar with Azure CV, Google & AWS Text detection first hands I would encourage you to jump the Sample script section as it would help getting hands on experience.
Azure¶
Azure response is a JSON string object hence we can send it as-is to processing after decoding.
ocrresponse=image_analysis.response.content.decode("utf-8")
...
bboxresponse=BBoxHelper().processAzureOCRResponse(ocrresponse)
print(bboxresponse.text)
Google¶
response = google_client.document_text_detection(image=image)
# Convert the response object to JSON
json_string=json_format.MessageToJson(response)
...
# Create BBOX OCR Response from Google's JSON output
bboxresponse=BBoxHelper().processGoogleOCRResponse(json_string)
print(bboxresponse.text)
Passing the full_text_annotation object itself will not work anymore (ocrlayout>=0.8), only a JSON string or dict object are supported hence the JSON MessageToJson invocation.
AWS¶
AWS response is Dictionary-based object hence we can send it as-is to processing.
# Amazon Textract client
textract = boto3.client('textract')
# Call Amazon Textract
ocrresponse = textract.detect_document_text(Document={'Bytes': bytes_test })
# Getting the width and height of the image
imagefn=os.path.join(IMAGES_FOLDER, filename)
image = Image.open(imagefn)
width, height = image.size
# Create BBOX OCR Response from AWS string response
bboxresponse=BBoxHelper().processAWSOCRResponse(ocrresponse,width,height,verbose=verbose)
Notes¶
The BBoxHelper().processXXXOCRResponse() method will accept a JSON string, JSON/Dictionary object or existing BBOXOCRResponse object.
All processXXXOCRResponse methods will check on the input object type to convert it to a BBOXOCRResponse object.
# Load the input into a BBOXOCRResponse object for processing
if isinstance(input,str):
response=BBOXOCRResponse.from_xxx(json.loads(input))
if isinstance(input,dict):
response=BBOXOCRResponse.from_xxx(input)
elif isinstance(input,BBOXOCRResponse):
response=input
Once the input is converted to an non-optimized BBOXOCRResponse object it goes into the processing phase.
if response:
return self.__processOCRResponse(response,sortingAlgo,boxSeparator)
Important Passing an existing BBOXOCRResponse object to any BBoxHelper.processXXXOCRResponse() method will modify the object.
If you need to keep the "original" BBOXOCRResponse make sure to do a copy.deepcopy() beforehands.
bboxresponse=BBoxHelper().processAzureOCRResponse(copy.deepcopy(ocrresponse))
This could be usefull for evaluating OCR Engine(s) quality response (see the sample script) and ocrlayout optimization (before/after).
BBoxHelper - Response object¶
- status : reflect the original status of your ocr request response.
- original_text : the original text provide by the default OCR engine when relevant.
- text : representing the sorted text of all processed pages.
- pages : List of all pages. The OCR Engines we support allows you to send full PDF or TIFF multiple pages.
print(bboxresponse.text)
BBoxHelper - Run the sample script in github¶
We provided a single sample script to showcase how BBoxHelper runs against Azure and Google OCR engines output.
Under the project python directory in a terminal¶
Execute the sample script bboxtester.py for testing with Microsoft Azure CV or Google CV.
Sample script invocation¶
python3 bboxtester.py -h
The help output
Local Package imported
usage: bboxtester.py [-h] [--image IMAGE] [--imagesdir IMAGESDIR]
[--filter FILTER] [--outputdir OUTPUTDIR] [--callocr]
[-v]
Call OCR outputs for a given image or images dir
optional arguments:
-h, --help show this help message and exit
--image IMAGE Process a single image
--imagesdir IMAGESDIR
Process all images contained in the given directory
--filter FILTER Filter the images to process based on their filename
--outputdir OUTPUTDIR
Define where all outputs will be stored
--callocr flag to invoke online OCR Service
-v, --verbose DEBUG logging level
Invoke on a single image¶
bboxtester.py --image <FULL_IMAGE_PATH>
Example
bboxtester.py --image /Users/../../../../images/infography1.jpeg
Invoke for all images from the default IMAGES_FOLDER¶
python3 bboxtester.py
Use the --imagesdir flag to set a different directory
python3 bboxtester.py --imagesdir <NEW_IMAGE_DIR>
Invoke for all images which name contains "scan1"¶
python3 bboxtester.py --filter scan1
you get the idea...
Few notes
- The sample script can run against the local ocrlayout directory if you haven't installed the ocrlayout package, simply run it from where the sample script is located.
- The callocr flag means we will invoke the online OCR service to process an image. Not setting that flag means we will rely on the previous call to the OCR Engine which we saved on disk the output (reducing your online service consumption cost for testing)
- if the flag callocr is on but there is no previously cache output on disk from a specific OCR engine, we will revert to invoke the online OCR service.
- If you only interested in testing a single OCR Engine, simply comment out the function prefix by either google_ or axure_ in bboxtester.py. The code detect automatically which function to run based on its signature.
Sample script flow¶
- process one or all images located under the images script (one level of the python dir),
- call the corresponding OCR service,
- persist the raw ocr response on disk in the tests-results or the directory of your choice
- persist the original image with the bouding boxes of the raw OCR response
- call on the BBOx Helper processOCRResponse() method.
- persist the original image with the bouding boxes of the BBoxHelper OCR response .
Sample script Output¶
Each Sample script will output
- Azure Annotated Image where we draw the lines its OCR
- Azure OCR JSON
- Azure OCR Text (textual information)
- Google Annotated Image where we draw the lines its OCR
- Google OCR Text (textual information)
Those outputs allow you to evaluate the different OCR output visually.
Changing the default input and output directories used in the sample script¶
IMAGES_FOLDER = os.path.join(os.path.dirname(
os.path.realpath(__file__)), "../images")
RESULTS_FOLDER = os.path.join(os.path.dirname(
os.path.realpath(__file__)), "../tests-results")
NOTE The RESULTS_FOLDER is created upon running the sample script if not already existing.
DISCLAIMER¶
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.