How to extract data from medical forms and prescriptions

We’re in the middle of switching from paper to digital at our clinic and honestly it’s a bigger mess than I expected. We’ve got thousands of patient forms, old intake documents, prescription slips — you name it. A lot of it is handwritten (some doctors have… let’s call it ‘creative’ penmanship), and a decent chunk came in via fax so the quality is rough. Formats vary too depending on which doctor filled it out or which version of the form was in use at the time.

We need to pull out patient names, DOBs, medication details, doctor notes — the usual stuff. Has anyone tackled something like this at scale? What actually worked for you?

Oh man, this one’s genuinely hard — not just technically but because the stakes are high. Misread a dosage and that’s a real problem, not just a data quality issue.

The core challenge is that medical forms are a hybrid mess. Printed fields mixed with handwritten entries, prescriptions that range from typed to full-on doctor chicken scratch, and then fax quality that makes everything worse. Standard OCR tools like Tesseract aren’t built for this. They’ll handle clean, consistent documents okay, but throw a handwritten prescription at them and good luck.

In my experience, the right path depends on your budget and how strict your compliance requirements are. If you need solid HIPAA compliance and high clinical accuracy, you’re probably looking at healthcare-specific vendors — MiOS, or whatever document digitization module your EHR offers. Epic and Cerner both have this built in for their standard form types. Expensive, but purpose-built.

For broader digitization — intake forms, general patient records — AI-powered platforms can handle a lot of the variation without requiring rigid templates. That flexibility matters when your forms aren’t standardized. We’ve looked at a few options here; Lido is one that came up, not healthcare-specific but the AI layer handles mixed handwriting and variable formats reasonably well for extraction. You’d still want to add your own medical validation logic on top.

Whatever you go with — build in a QA step. Spot-check extracted data, especially anything medication-related. No automated system should be the last line of defense on dosages.

We had almost the exact same journey lol. ABBYY looked great in the demo but fell apart pretty fast once we threw real-world docs at it. Been on Lido for a few months now handling a similar volume and it’s been mostly smooth sailing. Knock on wood.

Yeah that’s mostly right, but I’d also throw in that “formats never change” is doing a lot of heavy lifting in that sentence. In my experience even your most consistent vendors will redesign their invoice template at some point and then suddenly your whole setup needs rework. We’ve got pretty low vendor diversity and we still moved away from template-based just because of that maintenance headache. But if you’ve got tight vendor relationships and can get notified ahead of time when formats change, sure, templates can be rock solid.

Totally! We’re actually pushing around 500 documents a month through this system, mostly medical forms and prescriptions, and honestly, it’s been an absolute game-changer for us.

Seriously, the difference it’s made is huge. I honestly wish we’d made the switch a lot sooner – you really don’t realize how much time you’re wasting until you automate some of this stuff.

Wow, seriously great stuff shared here, thanks everyone! Really helpful insights. I do have a burning question though, related to a specific challenge we face.

When it comes to Rossum, how does it handle credit notes and refunds? We’re talking a decent chunk of our transactions here, and honestly, they always seem to confuse our existing setup. It’d be awesome to hear some real-world experiences with that.

Oh man, totally! This honestly sounds exactly like our journey with trying to pull data from medical forms. We spent ages trying to wrangle everything out with rule-based templates, and it was just… so frustrating, you know? The amount of manual cleanup we had to do was seriously insane.

But then, when we finally started leaning into proper AI-driven solutions? Oh my god, the difference was absolutely night and day. Seriously, like someone flicked a switch. The accuracy jump was just phenomenal and saved us so much time and headaches. We’re talking orders of magnitude better, not just a little tweak.

Oh man, this sounds exactly like what we just went through ourselves, maybe two months back. It was quite the deep dive!

We really put Rossum and Adobe Acrobat through their paces, running them side-by-side for a solid two weeks. We fed them a