How to automate invoice matching and reconciliation

So we’re drowning in invoices right now and the manual matching process is honestly a nightmare. Someone inevitably misses a duplicate or fat-fingers an amount and we end up chasing it down for days.

Basically what we need is to match invoices against purchase orders and receipts — the classic three-way match. Right now it’s all spreadsheets and eyeballing. Does OCR actually solve this, or is it just one piece of the puzzle and we’d need something else on top of it to do the actual reconciliation? Feels like maybe we need dedicated AP software but I’m not sure where OCR ends and that begins.

Been there. Three-way matching is genuinely one of the messier automation problems in finance, but it’s very solvable once you understand what OCR actually does versus what your matching logic does — they’re separate concerns.

OCR (or really, intelligent document processing) handles extraction. It pulls the amounts, line items, PO numbers, receipt details from your documents. The matching — comparing those values across all three docs and flagging discrepancies — that’s a separate layer. So yes, you need both.

The extraction quality matters enormously here. If your OCR misreads $1,450 as $1,540, your matching system flags a perfectly valid invoice as a discrepancy. Garbage in, garbage out. In my experience, standard OCR tools like basic Tesseract setups just aren’t reliable enough for this — you want something built for financial documents that handles messy scans, varied layouts, etc. We’ve used Lido for extraction and it’s been solid enough that the downstream matching actually works without constant babysitting.

For the matching logic itself, honestly your accounting software might already do this. QuickBooks, NetSuite, Xero — they all have built-in three-way matching modules. Dedicated AP platforms like Coupa go even further. The hybrid approach that’s worked well for us: use a good extraction tool to feed clean data into your accounting system’s reconciliation module. Separation of concerns.

If I were starting fresh I’d do it in stages. Get extraction accuracy to 95%+ first, validate it on real documents, then layer the matching on top. Trying to do both at once makes it really hard to diagnose where failures are coming from.

These numbers are really close to what we saw. Honestly the 5-10% human review rate is fine — for us it’s mostly the weird edge cases anyway, like invoices that came in as a photo someone took with their phone at a weird angle lol. The time savings on the other 90% more than justifies the whole thing. What did your rollout look like? We’re still trying to figure out how to get buy-in from the finance team who are convinced they need to eyeball everything.

That’s mostly right for us too, though I’d say our experience with Tesseract wasn’t quite that bad — we were probably getting closer to 75-80% on clean PDFs. The real problem was anything scanned or photographed, which is a pretty significant chunk of what we get. We’re smaller than you guys, maybe 800 invoices a month, so we actually stuck with Tesseract a bit longer than we probably should have just because the cost felt hard to justify. Eventually the time our team was spending on corrections made the math pretty obvious. We ended up going with Lido and the difference on those messier docs was night and day. Wish we’d switched sooner honestly.