Search of Dictionary Keys python

耗尽温柔 提交于 2019-12-21 09:22:25

问题


I want to know how I could perform some kind of index on keys from a python dictionary. The dictionary holds approx. 400,000 items, so I am trying to avoid a linear search.

Basically, I am trying to find if the userinput is inside any of the dict keys.

for keys in dict:
    if userinput in keys:
        DoSomething()
        break

That would be an example of what I am trying to do. Is there a way to search in a more direct way, without a loop ? or what would be a more efficient way.

Clarification: The userinput is not exactly what the key will be, eg userinput could be log, whereas the key is logfile

Edit: any list/cache creation, pre-processing or organisation that can be done prior to searching is acceptable. The only thing that needs to be quick is the search for the key.


回答1:


If you only need to find keys that start with a prefix then you can use a trie. More complex data structures exist for finding keys that contain a substring anywhere within them, but they take up a lot more space to store so it's a space-time trade-off.




回答2:


If you only need to find keys that start with a prefix then you can use a binary search. Something like this will do the job:

import bisect
words = sorted("""
a b c stack stacey stackoverflow stacked star stare x y z
""".split())
n = len(words)
print n, "words"
print words
print
tests = sorted("""
r s ss st sta stack star stare stop su t
""".split())
for test in tests:
    i = bisect.bisect_left(words, test)
    if words[i] < test: i += 1
    print test, i
    while i < n and words[i].startswith(test):
        print i, words[i]
        i += 1

Output:

12 words
['a', 'b', 'c', 'stacey', 'stack', 'stacked', 'stackoverflow', 'star', 'stare',
'x', 'y', 'z']

r 3
s 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
ss 3
st 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
sta 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
stack 4
4 stack
5 stacked
6 stackoverflow
star 7
7 star
8 stare
stare 8
8 stare
stop 9
su 9
t 9



回答3:


No. The only way of searching for a string in dictionary keys is to look in each key. Something like what you've suggested is the only way of doing it with a dictionary.

However, if you have 400,000 records and you want to speed up your search, I'd suggest using an SQLite database. Then you can just say SELECT * FROM TABLE_NAME WHERE COLUMN_NAME LIKE '%userinput%';. Look at the documentation for Python's sqlite3 module here.

Another option is to use a generator expression, as these are almost always faster than the equivalent for loops.

filteredKeys = (key for key in myDict.keys() if userInput in key)
for key in filteredKeys:
    doSomething()

EDIT: If, as you say, you don't care about one-time costs, use a database. SQLite should do what you want damn near perfectly.

I did some benchmarks, and to my surprise, the naive algorithm is actually twice as fast as a version using list comprehensions and six times as fast as a SQLite-driven version. In light of these results, I'd have to go with @Mark Byers and recommend a Trie. I've posted the benchmark below, in case someone wants to give it a go.

import random, string, os
import time
import sqlite3

def buildDict(numElements):
    aDict = {}
    for i in xrange(numElements-10):
        aDict[''.join(random.sample(string.letters, 6))] = 0

    for i in xrange(10):
        aDict['log'+''.join(random.sample(string.letters, 3))] = 0

    return aDict

def naiveLCSearch(aDict, searchString):
    filteredKeys = [key for key in aDict.keys() if searchString in key]
    return filteredKeys

def naiveSearch(aDict, searchString):
    filteredKeys = []
    for key in aDict:
        if searchString in key: 
            filteredKeys.append(key)
    return filteredKeys

def insertIntoDB(aDict):
    conn = sqlite3.connect('/tmp/dictdb')
    c = conn.cursor()
    c.execute('DROP TABLE IF EXISTS BLAH')
    c.execute('CREATE TABLE BLAH (KEY TEXT PRIMARY KEY, VALUE TEXT)')
    for key in aDict:
        c.execute('INSERT INTO BLAH VALUES(?,?)',(key, aDict[key]))
    return conn

def dbSearch(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT KEY FROM BLAH WHERE KEY GLOB '*log*'")
    return [record[0] for record in cursor]

if __name__ == '__main__':
    aDict = buildDict(400000)
    conn = insertIntoDB(aDict)
    startTimeNaive = time.time()
    for i in xrange(3):
        naiveResults = naiveSearch(aDict, 'log')
    endTimeNaive = time.time()
    print 'Time taken for 3 iterations of naive search was', (endTimeNaive-startTimeNaive), 'and the average time per run was', (endTimeNaive-startTimeNaive)/3.0

    startTimeNaiveLC = time.time()
    for i in xrange(3):
        naiveLCResults = naiveLCSearch(aDict, 'log')
    endTimeNaiveLC = time.time()
    print 'Time taken for 3 iterations of naive search with list comprehensions was', (endTimeNaiveLC-startTimeNaiveLC), 'and the average time per run was', (endTimeNaiveLC-startTimeNaiveLC)/3.0

    startTimeDB = time.time()
    for i in xrange(3):
        dbResults = dbSearch(conn)
    endTimeDB = time.time()
    print 'Time taken for 3 iterations of DB search was', (endTimeDB-startTimeDB), 'and the average time per run was', (endTimeDB-startTimeDB)/3.0


    os.remove('/tmp/dictdb')

For the record, my results were:

Time taken for 3 iterations of naive search was 0.264658927917 and the average time per run was 0.0882196426392
Time taken for 3 iterations of naive search with list comprehensions was 0.403481960297 and the average time per run was 0.134493986766
Time taken for 3 iterations of DB search was 1.19464492798 and the average time per run was 0.398214975993

All times are in seconds.




回答4:


You could join all the keys into one long string with a suitable separator character and use the find method of the string. That is pretty fast.

Perhaps this code is helpful to you. The search method returns a list of dictionary values whose keys contain the substring key.

class DictLookupBySubstr(object):
    def __init__(self, dictionary, separator='\n'):
        self.dic = dictionary
        self.sep = separator
        self.txt = separator.join(dictionary.keys())+separator

    def search(self, key):
        res = []
        i = self.txt.find(key)
        while i >= 0:
            left = self.txt.rfind(self.sep, 0, i) + 1
            right = self.txt.find(self.sep, i)
            dic_key = self.txt[left:right]
            res.append(self.dic[dic_key])
            i = self.txt.find(key, right+1)
        return res



回答5:


dpath can solve this for you easily.

http://github.com/akesterson/dpath-python

$ easy_install dpath
>>> for (path, value) in dpath.util.search(MY_DICT, "glob/to/start/{}".format(userinput), yielded=True):
>>> ...    # (do something with the path and value)

You can pass an eglob ('path//to//something/[0-9a-z]') for advanced searching.




回答6:


Perhaps using has_key solve this too.

http://docs.python.org/release/2.5.2/lib/typesmapping.html



来源:https://stackoverflow.com/questions/5174506/search-of-dictionary-keys-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!