发表新帖

发表新帖

How do search engines find relevant content?

后端未结

关注

 13  876

无人共我 2021-01-29 20:03

How does Google find relevant content when it\'s parsing the web?

Let\'s say, for instance, Google uses the PHP native DOM Library to parse content. What methods would t

13条回答

不知归路 (楼主)

2021-01-29 20:26
I'm facing the same problem right now, and after some tries I found something that works for creating a webpage snippet (must be fine-tuned):
- take all the html
- remove script and style tags inside the body WITH THEIR CONTENT (important)
- remove unnecessary spaces, tabs, newlines.
- now navigate through the DOM to catch div, p, article, td (others?) and, for each one . take the html of the current element . take a "text only" version of the element content . assign to this element the score: text lenght * text lenght / html lenght
- now sort all the scores, take the greatest.
This is a quick (and dirty) way to identify longest texts with a relatively low balance of markup, like what happens in normal contents. In my tests this seems really good. Just add water ;)

In addition to this you can search for "og:" meta tags, title and description, h1 and a lot of other minor techniques.
0 讨论(0)

查看其它13个回答
发布评论:

提交评论
- 加载中...

热议问题