What Happens After You Upload (OCR & Processing)

When you upload a file to Essal Office, it doesn't simply get stored and filed. It goes through an automatic processing pipeline that makes it searchable, classifies it, and creates a long-term archive copy. This article explains what that pipeline does.

The Processing Pipeline

Step 1 — Text Extraction (OCR)

The first thing Essal Office does is extract the text content from your document.

Digital PDFs and Office documents — these already contain text, so the text is read directly without OCR. Processing is nearly instant.
Scanned images and image-only PDFs — these contain no embedded text, so the OCR engine (powered by Tesseract) reads the document page by page, recognizing characters from the image. This takes a few seconds per page.
Result — the extracted text is stored as the document's Content, which powers full-text search results and automatic matching rules.

Step 2 — Archive Copy Creation (PDF/A)

Essal Office creates a PDF/A version of your document. PDF/A is an ISO-standardized format specifically designed for long-term archival — it embeds all fonts, color profiles, and metadata directly in the file so it remains readable decades from now without depending on external resources.

For scanned documents, the PDF/A copy includes an invisible text layer with the OCR results, making the text selectable and copyable even in the archive copy.

Your original file is always preserved alongside the archive copy. Essal Office never overwrites the original.

Step 3 — Automatic Matching

Once the text is extracted, Essal Office checks all your configured matching rules — for tags, correspondents, and document types — and applies any that match the document's content, title, or other properties.

For example: - If you've configured a tag called Finance to match any document containing the word "invoice", the tag is applied automatically - If the word "Acme" appears in the document and you have a correspondent called Acme Supplies with a matching rule, it gets assigned - If the document type Invoice has a rule matching common invoice keywords, it gets assigned as the document type

Rules are applied in order. Multiple tags can be assigned simultaneously. Only one correspondent and one document type can be assigned per document.

Step 4 — Indexing

The document's content and all its assigned metadata are added to the search index. From this point, the document is fully searchable by content, title, tag, correspondent, type, and date.

How Long Does Processing Take?

Document type: Digital PDF (with embedded text)
Typical processing time: 1–3 seconds

Document type: Short scanned document (1–5 pages)
Typical processing time: 5–15 seconds

Document type: Long scanned document (20+ pages)
Typical processing time: 30–90 seconds

Document type: High-resolution image
Typical processing time: 5–10 seconds

Document type: Office document (.docx, .xlsx)
Typical processing time: 3–10 seconds

Processing time increases with page count and image resolution. Essal Office processes multiple documents in parallel, so uploading a batch doesn't mean waiting for each one sequentially.

What to Check After Processing

Once the processing indicator disappears and the document is available:

Title — review and update if the auto-generated title (from the filename) isn't descriptive enough
Created date — Essal Office tries to identify the document's date from its content; verify this is correct, especially for documents with multiple dates on the page
Tags, Correspondent, Document Type — check that the automatic assignments are accurate
Content tab — open the document and check the Content tab to see the extracted text; if it looks incorrect or is missing, see OCR Text Is Wrong or Missing