OCR Quality Issues
Optical character recognition (OCR) converts scanned images into searchable text. The quality of the extracted text directly affects how well you can search for documents. This article explains common OCR quality problems and how to address them.
What Causes Poor OCR Results
- Cause: Low scan resolution (below 200 DPI)
- Effect: Characters are blurry; OCR produces garbled or missing text
- Cause: Poor scan contrast (light ink, faded print)
- Effect: Characters are hard to differentiate; OCR makes substitution errors
- Cause: Tilted or skewed pages
- Effect: OCR reads words in the wrong order or misses text
- Cause: Handwritten content
- Effect: OCR is designed for printed text; handwriting is poorly supported
- Cause: Very small font sizes
- Effect: Individual characters are too small to reliably identify
- Cause: Heavily compressed image
- Effect: Compression artefacts distort characters
Recommended Scanning Settings
For best OCR results:
- Setting: Resolution
- Recommended value: 300 DPI or higher
- Setting: Colour mode
- Recommended value: Grayscale or black & white for clean text; colour for documents with important colour content
- Setting: File format
- Recommended value: PDF (with embedded image) or uncompressed TIFF
- Setting: Compression
- Recommended value: Minimal JPEG compression on images; use lossless formats where possible
Checking OCR Results on a Document
- Open the document in Essal Office
- Click the Content tab
- Read through the extracted text and compare it to the original document
Common OCR errors to look for: - 0 (zero) confused with O (letter O) - 1 (one) confused with l (lowercase L) - rn read as m - Missing punctuation or line breaks - Entire lines or sections missing
When OCR Text Has Errors
If the extracted text has errors but the document is legible:
- Re-scan at a higher resolution
- Re-upload the new scan (delete the old document first to avoid duplicates)
If re-scanning is not possible, you can manually correct key searchable terms by editing the document's title or adding custom field values with the correct text — this helps find the document by its important attributes even when OCR text is imperfect.
Handwritten Documents
OCR is not reliable for handwriting. For handwritten documents:
- Store them with a descriptive title that includes key searchable terms
- Use correspondent, type, tags, and custom fields to record important details
- The document is stored and retrievable by its metadata, even if the content is not searchable text
When to Contact Support
If all your scans are producing poor OCR across the board — not just occasional documents — there may be a server-side OCR configuration issue. Contact Essal Office support with examples of affected documents and a description of your scanning setup.