{snip: my previous}
So I noticed. The pdf's are tables of data. Most everything I try extracts the data, but converts the table grid lines into background images.
So, are you attempting to render this data in HTML as a set of tables (or table-like structures in CSS)? This is beginning to sound like a task someone somewhere has already written a Perl script to accomplish, maybe? The primary modus here would be to take your first pass conversion and run it through the Perl script to do the slice-n-dice and generate the output in a format required by your style sheet.
I wouldn't worry too much about that background image - that can be sliced out pretty easily once the first pass HTML is generated (it's really not needed, is it?) and the remaining data then processed.
I'm not a Perl monger, myself, (I've written a few very basic scripts) but know of a few places where scripts for useful purposes are available for download:
http://www.perlscriptsjavascripts.com/ - PerlScriptJavaScripts.com is a repository of a wide array of Perl scripts, some free and some for pay, and they also offer custom scripts for hire.
http://www.scripts.com/perl-scripts/ - Perl Scripts is another combination repository that's well organized by function.
http://savage.net.au/Perl.html - Ron Savage's Perl Scripts page - all of Ron's stuff is free, open source code. Take particular note that he has already done some scripts for reading genealogy data.
http://www.bewley.net/perl/ - Dale Bewley's Perl Scripts page - Dale offers custom scripting services, too, and has several scripts up on his page for download (but nothing that looks like what you need)
These are just a few of the pages out there dedicated to scripting - you might conduct a search and see if it's possible someone has chewed some of the same ground you are chewing now... a little searching on the front end might save you a lot of work on the back end.
Failing the discovery/creation of some kind a tool to help do this work, yeah, you're exactly where kjpetrie says... and the sooner out the sooner done.
{snip: more of my previous}
I can't seem to locate it - no one will admit anything. It is hand extracted date from will record books at the courthouse. Five books worth. I'm not about to start over, so I've resigned my self to a lot of concatenation coupled with extensive "find and replace." It'll take awhile, but I'll just keep plugging away until I get there. I would like to strangle whoever decided pdf's were the way to go.
Of course, the guy running web site before me used images for EVERYTHING! Even most of the text. And absolute positions, so any change mucked things up pretty bad. When I took it over, a little over a year ago, most of the text appeared to be scanned images. And the menu buttons were three slightly different graphic images used for initial, mouse over and visited. For a long time I was stuck adding more pages, as I couldn't manage to duplicate the buttons. I finally got it all converted to pure html, using css and javascript to control the appearance.
This is so typical. Don't feel too harshly toward your predecessor - he was likely just muddling through until he could do better. So many folks don't keep up with web technologies any more 'cause they're such a crazy quilt.
He may have done everything as images at the advice of an ill-informed attorney who believed it's somehow harder to alter image data than HTML or text.
These pdf files are the last things I need to convert.
By the way - if you're interested, here's the url:
http://www.douglascountygensoc.org/
I'll give it a look. I hope you can find a script out there that gets where you want to go, otherwise it'll be a lot of work.
Later On,
D