What does “weight” on search results in PyPI help in choosing a package?

问题

When I search "XML parse" in PyPI, the matched results are listed according to "weight". When I hove my mouse over "weight", it says "occurrence of search term weighted by field (name, summary, keywords, description, author, maintainer)".

What does "weighted by field (name, summary, keywords, description, author, maintainer)" mean?
Ideally, does a package ranked higher most likely better than a package ranked lower?

Thanks.

回答1:

Interesting question! I cloned the pypi repository and searched for "weight", which gave me this line:

./templates/index.pt:15: <th tal:condition="exists:data/scores"><u title="Occurrence of search term weighted by field (name, summary, keywords, description, author, maintainer)">Weight*</u></th>

Then based on that I searched for "scores", which led me to the search function. In that function, it defines the weight given to the different columns:

    columns = [
        ('name', 4),      # doubled for exact (case-insensitive) match
        ('summary', 2),
        ('keywords', 2),
        ('description', 1),
        ('author', 1),
        ('maintainer', 1),
    ]

So if your search term appears in the package's name it gets a score of 4, if it appears in the summary it gets a score of 2, and so on. It calculates this for each term then adds them all up.

In your example, for "XML parse", the top package is Products.ParsedXML. The score is calculated something like this:

Name: "Products.ParsedXML" = 4 + 4 = 8
Summary: "Parsed XML allows you..." = 2 + 2 = 4
Keywords: "parsedxml xml zope2" = 2 + 2 = 4
Description: "Parsed XML allows you to..." = 1 + 1 = 2
Author: "Zope community, and various others contributors" = 0
Maintainer: (empty) = 0
Total = 8 + 4 + 4 + 2 + 0 + 0 = 18

And 18 is indeed the score on the search result page.

So, to get the best possible score you would need to have every field match the desired search terms (AKA "keyword stuffing"). If you're thinking of publishing a package I don't recommend you try to game the system though. The algorithm for scoring is simple because it's relying on people to be honest. If everyone tried to stuff extra keywords into these fields to get a higher score it would be a mess and in the end give worse search results.

回答2:

jobskills = {'java': 10, 'python': 20, 'jquery': 5}

candidateskills = ['python', 'java','angular']

foundskills={k:jobskills[k] for k in candidateskills if k in jobskills}

print sum(foundskills.values())

来源：https://stackoverflow.com/questions/28685680/what-does-weight-on-search-results-in-pypi-help-in-choosing-a-package

标签

python

pypi