What does “weight” on search results in PyPI help in choosing a package?

跟風遠走 提交于 2019-12-06 04:36:33

问题


When I search "XML parse" in PyPI, the matched results are listed according to "weight". When I hove my mouse over "weight", it says "occurrence of search term weighted by field (name, summary, keywords, description, author, maintainer)".

  • What does "weighted by field (name, summary, keywords, description, author, maintainer)" mean?

  • Ideally, does a package ranked higher most likely better than a package ranked lower?

Thanks.


回答1:


Interesting question! I cloned the pypi repository and searched for "weight", which gave me this line:

./templates/index.pt:15: <th tal:condition="exists:data/scores"><u title="Occurrence of search term weighted by field (name, summary, keywords, description, author, maintainer)">Weight*</u></th>

Then based on that I searched for "scores", which led me to the search function. In that function, it defines the weight given to the different columns:

    columns = [
        ('name', 4),      # doubled for exact (case-insensitive) match
        ('summary', 2),
        ('keywords', 2),
        ('description', 1),
        ('author', 1),
        ('maintainer', 1),
    ]

So if your search term appears in the package's name it gets a score of 4, if it appears in the summary it gets a score of 2, and so on. It calculates this for each term then adds them all up.

In your example, for "XML parse", the top package is Products.ParsedXML. The score is calculated something like this:

  • Name: "Products.ParsedXML" = 4 + 4 = 8
  • Summary: "Parsed XML allows you..." = 2 + 2 = 4
  • Keywords: "parsedxml xml zope2" = 2 + 2 = 4
  • Description: "Parsed XML allows you to..." = 1 + 1 = 2
  • Author: "Zope community, and various others contributors" = 0
  • Maintainer: (empty) = 0
  • Total = 8 + 4 + 4 + 2 + 0 + 0 = 18

And 18 is indeed the score on the search result page.

So, to get the best possible score you would need to have every field match the desired search terms (AKA "keyword stuffing"). If you're thinking of publishing a package I don't recommend you try to game the system though. The algorithm for scoring is simple because it's relying on people to be honest. If everyone tried to stuff extra keywords into these fields to get a higher score it would be a mess and in the end give worse search results.




回答2:


jobskills = {'java': 10, 'python': 20, 'jquery': 5}

candidateskills = ['python', 'java','angular']

foundskills={k:jobskills[k] for k in candidateskills if k in jobskills}

print sum(foundskills.values())


来源:https://stackoverflow.com/questions/28685680/what-does-weight-on-search-results-in-pypi-help-in-choosing-a-package

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!