问题
Using Java what is the best way to extract meta data from a website?
I am planning on requesting the entire page, then finding where the meta data is located in that page - this seems cumbersome, is there a better way to achieve this?
回答1:
Cumbersome as it is, it's practically the only way, as far as I know.
What you can do is reading only a certain first few bytes, say 2000. This might save some time but it won't guarantee that all meta tags will be read.
Another way is to read in chunks, scan for the string </head>
, if not, continue reading. This could potentially take longer for pages with large <head>
tag, though.
Raw html shouldn't be too long to process anyway.
来源:https://stackoverflow.com/questions/5468385/java-web-site-meta-data