I want to prevent automated html scraping from one of our sites while not affecting legitimate spidering (googlebot, etc.). Is there something that already exists to accomplish
One approach is to set up an HTTP tar pit; embed a link that will only be visible to automated crawlers. The link should go to a page stuffed with random text and links to itself (but with additional page info: /tarpit/foo.html , /tarpit/bar.html , /tarpit/baz.html - but have the script at /tarpit/ handle all requests with the 200 result).
To keep the good guys out of the pit, generate a 302 redirect to your home page if the user agent is google or yahoo.
It isn't perfect, but it will at least slow down the naive ones.
EDIT: As suggested by Constantin, you could mark the tar pit as offlimits in robots.txt. The good guys use web spiders that honor this protocol will stay out of the tar pit. This would probably get rid of the requirement to generate redirects for known good people.