How do search engines find relevant content?

后端 未结 13 876
无人共我
无人共我 2021-01-29 20:03

How does Google find relevant content when it\'s parsing the web?

Let\'s say, for instance, Google uses the PHP native DOM Library to parse content. What methods would t

13条回答
  •  不知归路
    2021-01-29 20:26

    I'm facing the same problem right now, and after some tries I found something that works for creating a webpage snippet (must be fine-tuned):

    • take all the html
    • remove script and style tags inside the body WITH THEIR CONTENT (important)
    • remove unnecessary spaces, tabs, newlines.
    • now navigate through the DOM to catch div, p, article, td (others?) and, for each one . take the html of the current element . take a "text only" version of the element content . assign to this element the score: text lenght * text lenght / html lenght
    • now sort all the scores, take the greatest.

    This is a quick (and dirty) way to identify longest texts with a relatively low balance of markup, like what happens in normal contents. In my tests this seems really good. Just add water ;)

    In addition to this you can search for "og:" meta tags, title and description, h1 and a lot of other minor techniques.

提交回复
热议问题