What Happens After You Upload (OCR & Processing)
When you upload a file to Essal Office, it doesn't simply get stored and filed. It goes through an automatic processing pipeline that makes it searchable, classifies it, and creates a long-term archive copy. This article explains what that pipeline does.
The Processing Pipeline
Step 1 — Text Extraction (OCR)
The first thing Essal Office does is extract the text content from your document.
- Digital PDFs and Office documents — these already contain text, so the text is read directly without OCR. Processing is nearly instant.
- Scanned images and image-only PDFs — these contain no embedded text, so the OCR engine (powered by Tesseract) reads the document page by page, recognizing characters from the image. This takes a few seconds per page.
- Result — the extracted text is stored as the document's Content, which powers full-text search results and automatic matching rules.
Step 2 — Archive Copy Creation (PDF/A)
Essal Office creates a PDF/A version of your document. PDF/A is an ISO-standardized format specifically designed for long-term archival — it embeds all fonts, color profiles, and metadata directly in the file so it remains readable decades from now without depending on external resources.
For scanned documents, the PDF/A copy includes an invisible text layer with the OCR results, making the text selectable and copyable even in the archive copy.
Your original file is always preserved alongside the archive copy. Essal Office never overwrites the original.
Step 3 — Automatic Matching
Once the text is extracted, Essal Office checks all your configured matching rules — for tags, correspondents, and document types — and applies any that match the document's content, title, or other properties.
For example: - If you've configured a tag called Finance to match any document containing the word "invoice", the tag is applied automatically - If the word "Acme" appears in the document and you have a correspondent called Acme Supplies with a matching rule, it gets assigned - If the document type Invoice has a rule matching common invoice keywords, it gets assigned as the document type
Rules are applied in order. Multiple tags can be assigned simultaneously. Only one correspondent and one document type can be assigned per document.
Step 4 — Indexing
The document's content and all its assigned metadata are added to the search index. From this point, the document is fully searchable by content, title, tag, correspondent, type, and date.
How Long Does Processing Take?
- Document type: Digital PDF (with embedded text)
- Typical processing time: 1–3 seconds
- Document type: Short scanned document (1–5 pages)
- Typical processing time: 5–15 seconds
- Document type: Long scanned document (20+ pages)
- Typical processing time: 30–90 seconds
- Document type: High-resolution image
- Typical processing time: 5–10 seconds
- Document type: Office document (.docx, .xlsx)
- Typical processing time: 3–10 seconds
Processing time increases with page count and image resolution. Essal Office processes multiple documents in parallel, so uploading a batch doesn't mean waiting for each one sequentially.
What to Check After Processing
Once the processing indicator disappears and the document is available:
- Title — review and update if the auto-generated title (from the filename) isn't descriptive enough
- Created date — Essal Office tries to identify the document's date from its content; verify this is correct, especially for documents with multiple dates on the page
- Tags, Correspondent, Document Type — check that the automatic assignments are accurate
- Content tab — open the document and check the Content tab to see the extracted text; if it looks incorrect or is missing, see OCR Text Is Wrong or Missing