Libpuzzle Indexing millions of pictures?

后端 未结 4 1888
走了就别回头了
走了就别回头了 2020-12-12 14:03

its about the libpuzzle libray for php ( http://libpuzzle.pureftpd.org/project/libpuzzle ) from Mr. Frank Denis. I´am trying to understand how to index and store the data in

4条回答
  •  时光取名叫无心
    2020-12-12 14:42

    I've experimented with libpuzzle before - got about as far as you. Didnt really start on a proper implementation. Was also unclear how exactly to do it. (and abandoned the project for lack of time - so didnt really persist with it)

    Anyway, looking now, will try to offer my understanding - maybe between us we can work it out :)

    Queries use a 2 stage process -

    1. first you use the words table.
      1. take the 'reference' image and work out its signature.
      2. work out its component words,
      3. consult the words table to find all the possible matches. This can use the database engines 'indexes' for efficient queries.
      4. compile a list of all sig_ids. (will get some duplicates in 3. )
    2. Then consult the signatures table
      1. retreive and decompress all possible from signatures (because you have a prefiltered list the number should be relatively small)
      2. use puzzle_vector_normalized_distance to work out an actual distance.
      3. sort and rank the results as required

    (ie you only use compression on the signatures table. words remains uncompressed, so can run fast queries on it)

    The words table is a form of inverted index. In fact I have in mind to use https://stackoverflow.com/questions/tagged/sphinx instead the words database table, because that is designed specifically as a very fast inverted index.

    ... in theory anyway...

提交回复
热议问题