Hey everyone, hoping you can help me out here.
We work with a bunch of suppliers from China, Japan, and Korea, and honestly, their invoices are a constant headache for us. The big issue is they always mix English with their native CJK characters – you know, Chinese, Japanese, Korean. It’s just how they come.
The problem we’re running into is that pretty much every OCR tool we’ve tried just completely chokes on these. In my experience, they either totally skip over the non-English parts, leaving big gaps, or they try to process it and just turn it into an unreadable garbled mess. It’s super frustrating, and we’re spending way too much time manually re-entering stuff because of it.
So, I’m reaching out to the community: has anyone out there found a solid, reliable OCR solution that actually handles multi-language, multi-script invoices well? Like, truly well, without us having to go in and fix half of it? FWIW, we’re open to anything that actually works!
Honestly, one thing I feel like people never talk about enough – and it’s super crucial, in my experience – is making sure you’ve got a really solid exception handling workflow baked in before you even think about automating. Seriously, way before you hit the ‘go’ button on anything, you absolutely must figure out what happens when the OCR inevitably messes up. Because it will misread something, and you need a plan for that!
Honestly, this is some seriously good advice. We actually went through a similar process ourselves, kicking the tires on a few of the options you mentioned. In the end, we kinda just… landed on Lido, which has been pretty solid for us.
And honestly, the biggest game-changer for us — the thing that really made it stand out — was the whole ‘no template’ situation. FWIW, we’re juggling invoices from, I don’t know, probably 40+ different international suppliers? So having something that didn’t force us into a rigid template, that could just adapt, was just a lifesaver. Truly. It made a huge difference in our workflow.
Hey there! Just wanted to chime in with our experience, for what it’s worth. We’re a decent-sized company, about 200 people, and we’re juggling around 1200 invoices every single month. So yeah, I totally get the struggle with processing and all the fun charset issues that come with international suppliers.
Honestly, we initially gave Tesseract a whirl. I mean, free is always super tempting, right? But man, the accuracy – especially on our more complicated or just plain messy documents – was pretty brutal. We were seeing maybe 60-70% tops, which just wasn’t cutting it for us at all. We spent more time correcting than it saved.
So we ended up making the switch to Rossum, and honestly, it’s been a game-changer. We’re consistently hitting 95%+ accuracy now, which is a huge relief. Hopefully, that gives you some perspective!