12 Best Practices for AI Document Cleaning: A Checklist I Wish I Had 2 Years Ago
I have ruined more documents than I care to admit. Over-cleaned contracts with invisible signatures. Receipts so bright the thermal ink disappeared. Pages cropped so aggressively that half the footer got cut off. I have processed over 600 documents in the last two years, and I made every mistake in the book so you do not have to.
This guide is not theory. It is a battle-tested checklist of 12 practices that separate professional-quality output from amateur-hour results. Print it. Bookmark it. Follow it. Your future self will thank you.
The Complete Checklist
Part 1: Shooting — Garbage In, Garbage Out
Practice 1 Light Like a Pro, Not Like a Tourist
The single biggest factor in AI cleaning quality is not the AI. It is your lighting. Here is what I learned after 200+ failed attempts:
Use two light sources. One overhead light creates shadows. Two lights at 45-degree angles cancel them out. A $20 desk lamp pair from Amazon changed my output quality more than any software upgrade.
Avoid direct sunlight. It creates harsh highlights that blow out text. Use diffused natural light (near a window with a sheer curtain) or soft artificial light.
Watch for reflections. Glossy paper, laminated IDs, and plastic sleeves create bright hotspots. Tilt your phone 10-15 degrees to avoid direct reflection.
Color temperature matters. Warm yellow light (2700K) makes white paper look cream. Cool white light (5000K) gives accurate color. For document cleaning, cooler is better.
❌ My Mistake
I shot 30 pages of a legal contract under a warm LED bulb at 2700K. The AI cleaned the pages beautifully, but the "white" background came out beige. The client rejected the entire batch because it did not match their document standards. Now I use a 5000K daylight bulb for all document photography.
Practice 2 Angle Within 15 Degrees — No Exceptions
AI perspective correction is powerful, but it is not magic. The further your camera is from parallel to the page, the more the AI has to stretch and reconstruct. Beyond 25 degrees, text distortion becomes visible even to casual readers.
Use your phone's grid. Turn on the camera grid (Settings > Camera > Grid on iPhone). Align the document edges with the grid lines. If the edges do not match the grid, adjust your angle.
Shoot from directly above. For flat documents, position your phone parallel to the surface. For book pages, flatten the spine with a weight or shoot the page at the natural open angle.
Fill 85-90% of the frame. Too much empty space forces the AI to crop aggressively. Too close causes edge distortion. Leave a small, even border around all four sides.
💡 Pro Tip
For book pages, do not force the book flat. Shoot at the natural open angle (usually 120-150 degrees) and let the AI's perspective correction handle the rest. Forcing the book flat damages the spine and creates uneven lighting near the gutter.
Practice 3 Background Contrast Is Non-Negotiable
The AI detects page edges by looking for contrast. White paper on a white desk? Edge detection fails. Dark text on a dark table? Same problem.
Use a dark background for white paper. Dark brown, navy blue, or charcoal gray work best. The contrast helps the AI find page edges accurately.
Avoid patterned surfaces. Wood grain, marble patterns, and fabric textures confuse edge detection. Use a solid-colored mat or folder.
Remove clutter from the frame. Pens, coffee cups, and phone chargers in the corner of the shot create false edges. Clear the area before shooting.
❌ My Mistake
I laid a white contract on a white marble countertop with gray veins. The AI detected the marble veins as page edges and cropped the document to a random rectangle. I lost the top third of every page. Now I keep a dark brown clipboard specifically for document photography.
Part 2: Mode Selection — Choose Wisely
Practice 4 Match the Mode to the Document
V3 has three modes. Using the wrong one is like using a hammer to paint a wall—it works, but the result is terrible.
⚡ Standard Mode
Use for: Clean scans, well-lit photos, documents with no shadows.
Speed: Fastest. 2-3 seconds per page.
Output: Balanced cleanup without aggressive processing.
When NOT to use: Phone photos with shadows, uneven lighting, or dark backgrounds.
🌓 Shadow Removal
Use for: Phone photos, documents shot under single light sources, pages with visible shadows.
Speed: Moderate. 4-5 seconds per page.
Output: Flattened lighting, removed gradients, uniform background.
When NOT to use: Already clean scans (over-processing creates artifacts).
✍️ Signature Mode
Use for: Contracts, checks, legal forms, anything with signatures or stamps.
Speed: Moderate. 4-6 seconds per page.
Output: Transparent background, preserved ink characteristics, PNG export.
When NOT to use: General text documents where background does not matter.
💡 Pro Tip
For documents with both heavy shadows AND signatures, run Shadow Removal first, then re-process the output in Signature Mode. The two-step workflow takes 8 seconds but produces results that no single mode can match. I use this for 90% of my contract work.
Practice 5 Never Mix Modes Within One Document
Consistency is professionalism. If page 1 is cleaned in Standard Mode and page 5 in Shadow Removal, the recipient will notice the difference. The brightness, contrast, and background tone will vary.
Pick one mode per document. If your 20-page contract has 3 shadowy pages and 17 clean pages, process all 20 in Shadow Removal. The clean pages will not be harmed, and the output will be consistent.
Batch by lighting condition. Group your documents before processing: "clean scans" batch, "phone photos" batch, "signatures" batch. This prevents mode-switching errors.
❌ My Mistake
I processed a 15-page proposal where the first 5 pages were clean scans (Standard Mode) and the last 10 were phone photos (Shadow Removal). The client asked why the document "changed style halfway through." They thought I had combined two different documents. Now I process every multi-page document in a single mode, even if some pages do not strictly need it.
Part 3: Processing — The Details Matter
Practice 6 Preview the Worst Page First
Do not preview page 1. Page 1 is usually the cover or title page—simple, clean, and forgiving. Preview the page that will stress the AI the most.
Check the page with the most text density. Dense paragraphs, small fonts, and footnotes test the AI's character reconstruction.
Check the page with the worst lighting. The darkest shadow, the brightest hotspot, the most uneven gradient.
Check the page with a signature or stamp. If the AI washes out ink, you need to switch to Signature Mode.
Check the last page. Footers, page numbers, and legal disclaimers are often in small print near the bottom edge—exactly where aggressive cropping causes damage.
Practice 7 Watch for the 4 Red Flags
These four visual defects mean your settings are wrong. Stop and fix them before processing the full batch.
| Red Flag | What It Looks Like | Cause | Fix |
|---|---|---|---|
| Halo Effect | White glow around text characters | Over-aggressive thresholding | Switch from Standard to Shadow Removal, or reduce processing intensity |
| Washed Ink | Signatures look gray, stamps fade | Over-normalization in wrong mode | Use Signature Mode instead of Standard/Shadow |
| Edge Crop | Missing text at page margins | Edge detection misidentified page boundaries | Re-shoot with better background contrast, or manually adjust crop |
| Stretch Distortion | Text looks tall/narrow or wide/short | Extreme camera angle (>25°) | Re-shoot from a more parallel angle |
💡 Pro Tip
Zoom to 200% when checking for halos. They are invisible at normal zoom but glaringly obvious when enlarged. This is how print shops catch bad cleaning jobs—always check at 200% before sending to print.
Practice 8 Process in Optimal Batch Sizes
Batch processing is powerful, but bigger is not always better. Browser memory limits, file size caps, and error recovery all factor into the ideal batch size.
Standard batch: 20-30 files. This is the sweet spot for stability and speed. Most browsers handle this without lag.
Maximum batch: 50 files. V3's hard limit. Use this only for small files (under 1 MB each). Large image files (5+ MB) in a 50-file batch can crash the browser tab.
Split by file size, not just count. Ten 5-MB photos stress the browser more than thirty 500-KB scans. Keep combined batch size under 25 MB for smooth processing.
Process "problem" files separately. Documents with extreme angles, heavy shadows, or mixed content should be processed in a separate batch. If one file fails, it will not corrupt the others.
❌ My Mistake
I dragged 50 high-resolution photos (each 8 MB) into V3 in one batch. The browser froze at file 37. I lost 30 minutes of progress and had to restart. Now I keep a calculator open: 50 files × average file size = total. If it exceeds 25 MB, I split into two batches.
Part 4: Quality Check — Do Not Skip This
Practice 9 The 3-Page Spot Check
You do not need to review every page. You need to review the right pages. Here is my 3-page rule:
Page 1: Check header alignment, logo placement, and title formatting. This is what the recipient sees first.
The densest page: Find the page with the most text, smallest font, or most complex layout. If the AI handles this, it handles everything.
The last page: Check footers, page numbers, signatures, and legal disclaimers. These are often cropped or distorted.
⚠️ Time investment: 30 seconds per document. Skipping this check has cost me 3 hours of re-work on multiple occasions. The math is simple: 30 seconds of checking saves 30 minutes of fixing.
Practice 10 Test Text Selectability
This is the most important quality check that 90% of people skip. After cleaning, open the output PDF and try to select text with your cursor.
If text selects normally: The AI preserved the document structure. Search works. Copy-paste works. You are good.
If the entire page selects as one image: The AI rasterized the document. File size increased 5-10x. Text is not searchable. This is a quality failure.
If some text selects but other parts do not: The AI partially rasterized. Common with mixed-content pages (text + images). Re-process those pages separately.
💡 Pro Tip
Use Ctrl+F (Cmd+F on Mac) to search for a specific word in the output PDF. If search works, the text layer is intact. If not, the document is just a series of images. For contracts and legal documents, searchable text is mandatory.
Part 5: Export & Archive — Protect Your Work
Practice 11 Match Export Format to Use Case
The wrong export format has cost me more time than any processing error. Here is the decision matrix I use:
| Use Case | Format | Mode | Why |
|---|---|---|---|
| Email to client | PDF (multi-page) | Any | Universal compatibility, preserves formatting |
| Print at office | PDF (high quality) | Shadow Removal | Removes gray tones that waste ink |
| Digital signature extraction | PNG (individual) | Signature Mode | Transparent background, alpha channel preserved |
| Web upload / form submission | JPEG (compressed) | Standard | Smallest file size, fast upload |
| Archival storage | PNG + PDF (both) | Shadow Removal | Lossless PNG for future editing, PDF for sharing |
| OCR (text extraction) | PDF (text-searchable) | Standard | Cleanest text layer for OCR engines |
❌ My Mistake
I exported a batch of 20 invoices as JPEG to save email attachment size. The accountant could not read the 8pt line items due to compression artifacts. I had to re-process and re-send. Now I use PDF for anything with small text, and JPEG only for simple, large-text documents.
Practice 12 Never Overwrite Originals
This is the simplest practice and the most commonly violated. Always keep the original file. Always.
Use a naming convention. My format: YYYY-MM-DD_Client_Document_CLEANED.pdf. The "CLEANED" suffix makes it obvious which is the processed version.
Store originals in a separate folder. I use /Originals/ and /Cleaned/ subfolders. Never mix them.
Keep originals for 30 days minimum. Even after delivery, keep the source file. Clients request re-processing with different settings more often than you think.
Version your cleaned files. If you re-process with different settings, use _CLEANED_v2.pdf. Do not overwrite the first cleaned version.
💡 Pro Tip
Create a "processing log" text file in your project folder. Note the date, mode used, batch size, and any issues. When a client asks "Which settings did you use?" six months later, you will have the answer. I learned this after a client dispute where I could not prove which mode I had used.
Bonus: The 7 Mistakes I See Everyone Make
After helping dozens of colleagues and clients set up their AI cleaning workflows, these are the mistakes I see repeatedly:
- Shooting in portrait mode. Portrait mode blurs the background, which confuses edge detection. Use standard photo mode.
- Using flash. Flash creates harsh hotspots and reflections. Always use ambient or diffused light.
- Processing screenshots as documents. Screenshots of PDFs do not need cleaning. They are already digital. Cleaning a screenshot just degrades quality.
- Ignoring file size limits. Uploading a 50 MB RAW photo to a browser tool will crash it. Compress or resize first.
- Cleaning already-clean documents. Running Shadow Removal on a flatbed scan creates artifacts. Use Standard Mode for clean inputs.
- Not checking color documents. AI cleaning sometimes shifts color temperature. Check that logos and brand colors are still accurate.
- Assuming "AI" means "perfect." AI makes mistakes. Preview. Check. Verify. Every time.
The One-Page Cheat Sheet
📌 Before Shooting
Light: Two sources at 45°, 5000K color temp
Angle: Within 15° of parallel, use grid overlay
Background: Dark, solid, uncluttered
Frame: Fill 85-90%, even border on all sides
📌 Mode Selection
Clean scans: Standard Mode
Phone photos with shadows: Shadow Removal
Contracts/signatures: Signature Mode
Shadow + signature combo: Shadow first, then Signature
One mode per document. Never mix.
📌 Quality Check
Preview: Worst page first (densest, darkest, most complex)
Red flags: Halos, washed ink, edge crop, stretch distortion
Text test: Select text at 200% zoom, Ctrl+F search
3-page rule: Page 1, densest page, last page
📌 Export & Archive
Client delivery: PDF multi-page
Signatures: PNG individual (transparent)
Web upload: JPEG compressed
Naming: Add _CLEANED suffix, never overwrite originals
Storage: Separate /Originals/ and /Cleaned/ folders
Apply These Practices Now
Browser-based. No upload. The same V3 engine I use for all 12 practices above.
Start Cleaning →