Efficient partial search of a trie in python

问题

This is a hackerrank exercise, and although the problem itself is solved, my solution is apparently not efficient enough, so on most test cases I'm getting timeouts. Here's the problem:

We're going to make our own Contacts application! The application must perform two types of operations:

add name, where name is a string denoting a contact name. This must store as a new contact in the application.

find partial, where partial is a string denoting a partial name to search the application for. It must count the number of contacts starting with partial and print the count on a new line. Given n sequential add and find operations, perform each operation in order.

I'm using Tries to make it work, here's the code:

import re

def add_contact(dictionary, contact):
    _end = '_end_'
    current_dict = dictionary
    for letter in contact:
        current_dict = current_dict.setdefault(letter, {})
    current_dict[_end] = _end
    return(dictionary)

def find_contact(dictionary, contact):
    p = re.compile('_end_')
    current_dict = dictionary
    for letter in contact:
        if letter in current_dict:
            current_dict = current_dict[letter]
        else:
            return(0)
    count = int(len(p.findall(str(current_dict))) / 2)
    re.purge()
    return(count)

n = int(input().strip())
contacts = {}
for a0 in range(n):
    op, contact = input().strip().split(' ')
    if op == "add":
        contacts = add_contact(contacts, contact)
    if op == "find":
        print(find_contact(contacts, contact))

Because the problem requires not returning whether partial is a match or not, but instead counting all of the entries that match it, I couldn't find any other way but cast the nested dictionaries to a string and then count all of the _end_s, which I'm using to denote stored strings. This, it would seem, is the culprit, but I cannot find any better way to do the searching. How do I make this work faster? Thanks in advance.

UPD: I have added a results counter that actually parses the tree, but the code is still too slow for the online checker. Any thoughts?

def find_contact(dictionary, contact):
    current_dict = dictionary
    count = 0
    for letter in contact:
        if letter in current_dict:
            current_dict = current_dict[letter]
        else:
            return(0)
    else:
        return(words_counter(count, current_dict))

def words_counter(count, node):
    live_count = count
    live_node = node
    for value in live_node.values():
        if value == '_end_':
            live_count += 1
        if type(value) == type(dict()):
            live_count = words_counter(live_count, value)
    return(live_count)

回答1:

Ok, so, as it turns out, using nested dicts is not a good idea in general, because hackerrank will shove 100k strings into your program and then everything will slow to a crawl. So the problem wasn't in the parsing, it was in the storing before the parsing. Eventually I found this blogpost, their solution passes the challenge 100%. Here's the code in full:

class Node:
    def __init__(self):
        self.count = 1
        self.children = {}

trie = Node()


def add(node, name):
    for letter in name:
        sub = node.children.get(letter)
        if sub:
            sub.count += 1
        else:
            sub = node.children[letter] = Node()
        node = sub


def find(node, data):
    for letter in data:
        sub = node.children.get(letter)
        if not sub:
            return 0
        node = sub
    return node.count

if __name__ == '__main__':
    n = int(input().strip())
    for _ in range(n):
        op, param = input().split()
        if op == 'add':
            add(trie, param)
        else:
            print(find(trie, param))

来源：https://stackoverflow.com/questions/46961262/efficient-partial-search-of-a-trie-in-python

标签

python

python-3.x

optimization

trie