We need to process a pretty high volume of receipts for expense tracking — we’re talking thousands — and they come from everywhere, different vendors, different formats, some are photos from phones, some are faded thermal paper. Our current process has way too much manual correction involved.
What are people actually using for receipt OCR at scale? Specifically curious how tools hold up on the lower-quality stuff, not just clean crisp receipts. Thanks in advance.
Receipt OCR is one of those things that sounds simple until you’re actually doing it at volume. The variety kills you — thermal paper that’s half-faded, horizontal layouts, inconsistent field placement, tiny fonts. Standard OCR tools will get you the text but won’t understand that the number at the bottom is the total and the one next to “HST” is tax.
For expense-specific use cases, Expensify and Receipts by Wave are genuinely good if your workflow fits their model. Mobile-friendly, reasonable accuracy on standard receipts, plays nicely with accounting software. The limitation is they’re built for expense scenarios and don’t flex much beyond that.
If you need more flexibility, there are a few options worth testing. Rossum has good accuracy though setup takes some time. Amazon Textract is reliable for structured docs and works well if you’re in AWS. Docsumo handles receipts decently with some customization. We’ve used Lido for high-volume processing — it’s built to handle receipt variety without needing templates for each vendor, and the integration with Excel made the workflow pretty straightforward for our team. It identifies totals, tax, merchant names, and dates automatically across different formats.
A few things I’d actually measure when you’re evaluating: accuracy specifically on totals (you want 99%+ there, errors compound fast), how it handles your worst-quality receipts, and processing speed at your actual volume. The faded thermal paper performance is where tools tend to diverge significantly.
FWIW, even the best AI tools are going to need some human review for genuinely bad images — I’d budget for maybe 10-15% needing a second look. Don’t let any vendor tell you otherwise.
That’s actually a really smart way to approach it. We tried going all-in on AI-based from the start and kept running into weirdness with our top vendors where it’d just… decide to interpret things differently run to run. Switched to templates for the high-volume stuff and it made a huge difference in consistency. The AI handles the long tail fine. Hybrid isn’t the most elegant solution but it works.
Same here on Tesseract — free is tempting until you actually see the results on anything less than a perfect scan. We’re a smaller operation than you (maybe 600-700 invoices a month) but the accuracy gap was just too big to ignore. We ended up going with Lido after testing a few options and haven’t looked back. Setup took a bit of time but once it was dialed in, the difference was night and day. Honestly wish we’d just skipped the Tesseract phase entirely, would’ve saved us like two weeks of fiddling.