How to detect web crawlers for SEO, using Express?

问题

I've been searching for npm packages but they all seem unmaintained and rely on the outdated user-agent databases. Is there a reliable and up-to-date package out there that helps me detect crawlers? (mostly from Google, Facebook,... for SEO) or if there's no packages, can I write it myself? (probably based on an up-to-date user-agent database)

To be clearer, I'm trying to make an isomorphic/universal React website and I want it to be indexed by search engines and its title/meta data can be fetched by Facebook, but I don't want to pre-render on all normal requests so that the server is not overloaded, so the solution I'm thinking of is only pre-render for requests from crawlers

回答1:

I have nothing to add for your search for npm packages. But your question for an up to date user agent database to do build your own package, I would recommend ua.theafh.net

It has, in the moment, data up to Nov 2014 and as far as I know it is with more than 5.4 million agents also the largest search engine for user agents.

回答2:

The best solution I've found is the useragent library, which allows you to do this:

var useragent = require('useragent');
// for an actual request use: useragent.parse(req.headers['user-agent']);
var agent = useragent.parse('Googlebot-News');

// will log true
console.log(agent.device.toJSON().family === 'Spider')

It is fast and kept up-to-date pretty well. Seems like the best approach. Run the above script in your browser: runkit

来源：https://stackoverflow.com/questions/34647657/how-to-detect-web-crawlers-for-seo-using-express

标签

npm

web-crawler

user-agent

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!