Partial matching GAE search API

后端 未结 6 825
日久生厌
日久生厌 2020-11-28 09:35

Using the GAE search API is it possible to search for a partial match?

I\'m trying to create autocomplete functionality where the term would be a partial word. eg.

相关标签:
6条回答
  • 2020-11-28 09:56

    I have same problem for typeahead control, and my solution is parse string to small part :

    name='hello world'
    name_search = ' '.join([name[:i] for i in xrange(2, len(name)+1)])
    print name_search;
    # -> he hel hell hello hello  hello w hello wo hello wor hello worl hello world
    

    Hope this help

    0 讨论(0)
  • 2020-11-28 09:58

    Jumping in very late here.

    But here is my well documented function that does tokenizing. The docstring should help you understand it well and use it. Good luck!!!

    def tokenize(string_to_tokenize, token_min_length=2):
      """Tokenizes a given string.
    
      Note: If a word in the string to tokenize is less then
      the minimum length of the token, then the word is added to the list
      of tokens and skipped from further processing.
      Avoids duplicate tokens by using a set to save the tokens.
      Example usage:
        tokens = tokenize('pack my box', 3)
    
      Args:
        string_to_tokenize: str, the string we need to tokenize.
        Example: 'pack my box'.
        min_length: int, the minimum length we want for a token.
        Example: 3.
    
      Returns:
        set, containng the tokenized strings. Example: set(['box', 'pac', 'my',
        'pack'])
      """
      tokens = set()
      token_min_length = token_min_length or 1
      for word in string_to_tokenize.split(' '):
        if len(word) <= token_min_length:
          tokens.add(word)
        else:
          for i in range(token_min_length, len(word) + 1):
            tokens.add(word[:i])
      return tokens
    
    0 讨论(0)
  • 2020-11-28 10:09

    just like @Desmond Lua answer, but with different tokenize function:

    def tokenize(word):
      token=[]
      words = word.split(' ')
      for word in words:
        for i in range(len(word)):
          if i==0: continue
          w = word[i]
          if i==1: 
            token+=[word[0]+w]
            continue
    
          token+=[token[-1:][0]+w]
    
      return ",".join(token)
    

    it will parse hello world as he,hel,hell,hello,wo,wor,worl,world.

    it's good for light autocomplete purpose

    0 讨论(0)
  • 2020-11-28 10:12

    My version optimized: not repeat tokens

    def tokenization(text):
        a = []
        min = 3
        words = text.split()
        for word in words:
            if len(word) > min:
                for i in range(min, len(word)):
                    token = word[0:i]
                    if token not in a:
                        a.append(token)
        return a
    
    0 讨论(0)
  • 2020-11-28 10:13

    Though LIKE statement (partial match) is not supported in Full Text Search, but you could hack around it.

    First, tokenize the data string for all possible substrings (hello = h, he, hel, lo, etc.)

    def tokenize_autocomplete(phrase):
        a = []
        for word in phrase.split():
            j = 1
            while True:
                for i in range(len(word) - j + 1):
                    a.append(word[i:i + j])
                if j == len(word):
                    break
                j += 1
        return a
    

    Build an index + document (Search API) using the tokenized strings

    index = search.Index(name='item_autocomplete')
    for item in items:  # item = ndb.model
        name = ','.join(tokenize_autocomplete(item.name))
        document = search.Document(
            doc_id=item.key.urlsafe(),
            fields=[search.TextField(name='name', value=name)])
        index.put(document)
    

    Perform search, and walah!

    results = search.Index(name="item_autocomplete").search("name:elo")
    

    https://code.luasoftware.com/tutorials/google-app-engine/partial-search-on-gae-with-search-api/

    0 讨论(0)
  • 2020-11-28 10:19

    As described at Full Text Search and LIKE statement, no it's not possible, since the Search API implements full text indexing.

    Hope this helps!

    0 讨论(0)
提交回复
热议问题