OCR PDF Poor Quality — Fix Blurry & Low-DPI Scans
Why OCR fails on bad scans and how to fix DPI, skew, contrast, and compression order before re-running OCR.
Published June 1, 2025 · 7 min read
3 uses per day · 200 MB · TLS encrypted · auto-delete
OCR PDF poor quality — fix blurry and low-DPI scans
Image-only PDFs are not searchable until OCR adds a text layer. RatPDF OCR PDF uses Tesseract — upload scan, download searchable PDF, then export text or convert to Word.
Pillar: OCR PDF guide · Compare: OCR vs PDF to text.
Real example: Fax-era contract at 150 DPI with coffee stain shadow
- Scan or export PDF — confirm text does not select (image-only).
- Upload to OCR PDF — 300 DPI sources process best.
- Verify: Ctrl+F finds a known word in your viewer.
- Export via PDF to Text or scanned PDF to Word if editing needed.
Poor quality-specific OCR tips
Re-scan beats re-OCR when original paper exists. Deskew, crop margins, increase contrast before second OCR pass.
Diagnose before re-OCR
Zoom to 400% — if letters are smooth curves, DPI may suffice; if blocky pixels, re-scan. Coffee stains and fold shadows need crop or re-scan, not second OCR pass.
Enhancement order
- Deskew rotation
- Crop margins
- Increase contrast (not blur-sharpen filters)
- OCR at 300 DPI equivalent
- Compress only after OCR succeeds
Fax and phone photo sources
Fax PDFs are ~200 DPI with horizontal lines — remove moiré with grayscale rescan when possible. Phone photos need flat surface and even lighting — avoid flash glare.
When to abandon OCR
If three OCR passes with preprocessing still produce garbage, human transcription is cheaper than attorney time fixing errors in court filings.
Professional restoration services
Museum-grade document restoration exceeds browser OCR scope — for one-of-a-kind deeds, use conservation lab before scanning.
Iterative enhancement loop
Run OCR → export text sample → if garbage, adjust DPI/contrast → re-OCR — max three iterations before rescanning source paper. Document each attempt in case file for court.
Software preprocessing tools
Preview auto-enhance, GIMP levels, or scanner driver "text mode" before upload — RatPDF OCR cannot fix 72 DPI phone photos of wall posters.
Benchmark your fixes
Before/after OCR: search for same 10-word phrase — if character error rate drops below 5%, ship; else rescan. Log DPI and filter settings that worked for repeat batches.
OCR pipeline on RatPDF
Tesseract adds invisible text layer over page images — Ctrl+F works in PDF viewers; copy/paste extracts UTF-8. Not the same as perfect transcription — always proofread legal amounts and IDs.
After OCR — next tools
- PDF to Text — plain .txt export
- Scanned PDF to Word — editable DOCX
- PDF to text multilingual — Unicode tips
Privacy and retention
Scanned IDs and contracts contain PII — review privacy policy retention window. Clear local Downloads on shared machines.
Tesseract vs cloud OCR
Research: Tesseract vs online OCR — RatPDF keeps processing on controlled infrastructure vs sending scans to unknown APIs.
Scan settings reference
| Document | DPI | Mode |
|---|---|---|
| Typed contract | 200–300 | Grayscale |
| Small print legal | 300 | Grayscale |
| Colour stamps | 300 | Colour |
Language pack limitations
Tesseract language packs vary by deployment — mixed {name}/English documents may need manual verification of each script block. Dense footnotes OCR poorly — treat as best-effort.
Export formats after OCR
Searchable PDF for archival · .txt for scripts · DOCX for track-changes legal review.
Historical newspaper and book scans
Low-contrast newsprint needs aggressive contrast preprocessing before OCR — expect proper-noun errors in {name} place names; gazetteer lookup for validation.
Accuracy expectations by document type
| Type | Typical accuracy | Action |
|---|---|---|
| Typed laser print | High | OCR + spot-check amounts |
| Dot-matrix / fax | Low | Re-scan or retype critical fields |
| Handwritten margin notes | Very low | Retype notes; OCR body only |
| Tables with rules | Medium | Verify column alignment in export |
Downstream automation
Export OCR'd text to Python RAG pipelines — PDF to text Python workflow. Chunk UTF-8 files; do not feed raw PDF images to LLM without OCR.
Legal and compliance
OCR output is working copy — signed scan remains evidence. For court production, confirm OCR meets local e-discovery rules — e-discovery OCR guide.
Batch queue discipline
One PDF per OCR session on free tier — name outputs doc-ocr-searchable.pdf immediately; browser refresh loses in-memory state.
Compare cloud OCR vendors
Tesseract vs online OCR — privacy, cost, and accuracy trade-offs for Poor quality documents.
Compress after OCR?
OCR adds text layer — file grows. Compress after OCR succeeds, not before — compression benchmark.
HowTo summary
- Scan 300 DPI grayscale (or colour for stamps)
- Deskew and crop in Preview/Photos if needed
- Upload to OCR PDF
- Verify search in viewer
- Export text or convert to Word
- Proofread Any fields manually
Desktop scanner profiles
Save TWAIN profile "OCR-Poor quality-300dpi-gray" — one-click rescan when first pass fails QA. Avoid colour unless stamps or signatures need hue discrimination.
GDPR and PII
Poor quality identity documents contain PII — OCR on RatPDF over HTTPS; delete local copies after HR onboarding completes. Do not OCR passports on untrusted browser extensions.
Hardware scanner settings recap
Flatbed beats sheet-fed for fragile deeds. ADF OK for crisp typed pages. Clean glass prevents vertical streak false characters in Any output.
Cloud sync of OCR outputs
Searchable PDFs in Google Drive remain searchable — index lag may take hours. Do not rely on Drive OCR if you need immediate Ctrl+F — run RatPDF OCR first.
Malware and macro paranoia
OCR output is PDF with text layer only — not executable. Still scan downloads with corporate antivirus policy like any attachment.
Second real example: litigation document dump
Opposing counsel sends 40 image PDFs on USB. Batch OCR each, merge chronologically with custom order merge, deliver searchable pack to partner for keyword review.
Character confusables in Poor quality
Digits 0/O, 1/l/I confuse OCR in any script — manually verify ID numbers, dates, and currency amounts regardless of language.
Related PDF to Word guides
Editable output: scanned PDF to Word · keep formatting · Mac: PDF to Word on Mac.
Closing discipline
OCR is not proofreading — budget human review for any Poor quality document that triggers legal, tax, or immigration consequences.
Regulatory and discovery context
OCR for e-discovery prep: OCR PDF e-discovery. Small firm productions — not Relativity replacement.
Accessibility angle
OCR helps search for screen-reader users when tags missing — see PDF to text accessibility. True WCAG compliance still needs tagging.
Upgrade prompt
High-volume OCR queues — compare plans · Compare: iLovePDF alternative.
Related guides & cluster links
Research: PDF compression benchmark · Compare: Adobe alternative
Translation and NLP after OCR
UTF-8 text exports feed Google Translate API, DeepL, or local MarianMT — OCR quality caps translation quality. Proofread Poor quality proper nouns before machine translation of contracts.
Redaction warning
OCR text layer may include redacted content still readable in object stream if redaction was fake black boxes — use true redaction tool before OCR for sensitive releases.
Government portal uploads
India GST notices, EU tax letters, immigration forms — searchable OCR PDF satisfies "text selectable" portal checks where specified.
FAQ inline
Is OCR free? Three OCR uses per day on free tier. Handwriting? Not reliable — retype. Password PDF? Unlock first.
Closing summary
Poor quality OCR is scan quality in, searchable PDF out — proofread every field that moves money, crosses a border, or enters a court file. Then chain to PDF to Text or Word for editing.
Bookmark this guide for your team's wiki — consistent scan settings beat trying a different OCR vendor each week.
Quality sampling for large jobs
OCR 500 pages? Sample 5% — if error rate above 2% on names/amounts, adjust scan settings and re-run batch. Do not spot-check only page 1.
Font and stamp overlays
Official stamps over Poor quality text reduce confidence — OCR may miss stamped regions. Legally critical stamped paragraphs may need manual transcription.
Seasonal backlog tips
Tax season floods firms with Poor quality scans — queue OCR overnight, verify mornings. Pro tier removes daily friction for backlogs.
Integration with merge cluster
OCR'd packs often merge next — merge scanned and digital · quality merge.
Related invoice guides
Scanned supplier invoices in Poor quality: OCR → extract totals → match to invoice workflows or local ERP.
Keyboard shortcuts after OCR
In PDF viewer: Ctrl+F for QA terms. In Word after conversion: Navigation pane headings — if empty, source PDF lacked structure; OCR text still usable for search.
Compare vendors
Adobe alternative · Smallpdf alternative — evaluate privacy before uploading Poor quality PII scans.
OCR cluster peer pages
Language guides: Hindi · Arabic · Spanish · Quality: poor quality OCR.
Document lifecycle after OCR
Archive image-only source unchanged — OCR PDF is derivative. For retention policies, keep both; for GDPR erasure requests, delete both layers from all backups.
Research: compression benchmark if archiving terabytes of OCR'd scans.
Primary tool: OCR PDF · Text export: PDF to Text · Upgrade: plans.
Re-run OCR after any rotate/crop edit to image-only PDF — text layer from prior pass no longer aligns with pixels.
Frequently asked questions
Why is my OCR text garbled?
Low DPI, skew, shadows, and motion blur cause garbage OCR — re-scan when possible.
What DPI should I scan for OCR?
Use 300 DPI minimum for small text; 200 DPI for large print.
Should I compress before or after OCR?
OCR before aggressive compression — compressing first blurs text strokes.
Sources & references
Primary references used when researching and fact-checking this guide. See our editorial methodology.
-
Tesseract OCR — documentation
— Google / open source
OCR accuracy factors and language packs.