Hoping someone can point me in the right direction here. Our procurement team gets POs from all over the place — PDFs via email, scanned images, stuff downloaded from supplier portals, occasionally a typed form. The formats are all over the map depending on the vendor.
We need to pull out PO number, vendor info, line items with quantities and prices, and delivery dates. The problem is every tool I’ve looked at either needs a template per vendor (which defeats the whole point) or spits out raw text that still needs a ton of cleanup. Is there something that can actually handle this variety without us having to configure a new template every time we onboard a supplier?
The template problem is real and it’s why most basic OCR setups fall apart for procurement. You get it working great for your top 3 vendors and then vendor #4 sends a PO in some weird layout and suddenly you’re back to manual entry.
FWIW, simple OCR tools like Tesseract aren’t really built for this. They’ll give you raw text but they don’t understand that a quantity only means something when it’s tied to a specific line item. The structure matters as much as the text. Zapier + basic OCR is a common starting point but in practice it handles maybe 60% of cases and the fallback manual work adds up fast.
For handling real variety across vendors, you want intelligent document processing rather than template-based OCR. These platforms are trained to understand document structure contextually — so they recognize that a column of numbers next to item descriptions is probably quantities and prices, even if the layout is totally different from the last vendor. Lido does this reasonably well for POs; it picks up vendor name, PO number, line items, delivery details without needing per-vendor templates. Rossum is another option worth looking at, and if you’re in a larger enterprise context there are dedicated procurement platforms like BravoSolution.
Honestly the most important thing is to test with your actual documents before committing to anything. Grab POs from 8-10 different vendors, including your weirdest ones, and run them through whatever you’re evaluating. That’ll tell you more than any demo. Most teams that do this end up automating 85-90% of their volume, with the remainder going to a quick human review queue — which is still a huge win over fully manual processing.
This is really helpful, thanks. One thing I keep wondering about — how do these tools actually handle invoices in other languages? We have suppliers in Mexico and Germany so we’re constantly getting docs in Spanish and German. Does the extraction still hold up or do you have to do a lot of manual correction?
Yeah that tracks with what we went through too. The template route seemed smart at first but the second a vendor changed their layout even slightly everything broke. Switched to an AI-based tool maybe a year ago now and honestly haven’t thought about it since. Night and day difference in terms of maintenance alone.
Really appreciate the thorough response! Quick follow-up if anyone knows — how does Lido handle multi-page invoices? A few of our bigger vendors send these 3-4 page statements and that’s always been a sticking point with tools we’ve tested before. Curious if it stitches everything together cleanly or if you have to do some manual wrangling.