问题
I've been searching for npm packages but they all seem unmaintained and rely on the outdated user-agent databases. Is there a reliable and up-to-date package out there that helps me detect crawlers? (mostly from Google, Facebook,... for SEO) or if there's no packages, can I write it myself? (probably based on an up-to-date user-agent database)
To be clearer, I'm trying to make an isomorphic/universal React website and I want it to be indexed by search engines and its title/meta data can be fetched by Facebook, but I don't want to pre-render on all normal requests so that the server is not overloaded, so the solution I'm thinking of is only pre-render for requests from crawlers
回答1:
I have nothing to add for your search for npm packages. But your question for an up to date user agent database to do build your own package, I would recommend ua.theafh.net
It has, in the moment, data up to Nov 2014 and as far as I know it is with more than 5.4 million agents also the largest search engine for user agents.
回答2:
The best solution I've found is the useragent library, which allows you to do this:
var useragent = require('useragent');
// for an actual request use: useragent.parse(req.headers['user-agent']);
var agent = useragent.parse('Googlebot-News');
// will log true
console.log(agent.device.toJSON().family === 'Spider')
It is fast and kept up-to-date pretty well. Seems like the best approach. Run the above script in your browser: runkit
来源:https://stackoverflow.com/questions/34647657/how-to-detect-web-crawlers-for-seo-using-express