I\'m currently extending an image library used to categorize images and i want to find duplicate images, transformed images, and images that contain or are contained in other im
So I basically did something very similar to this a few years ago. The algorithm you want to look into was proposed a few years ago by David Nister, the paper is: "Scalable Recognition with a Vocabulary Tree". They pretty much have an exact solution to your problem that can scale to millions of images.
Here is a link to the abstract, you can find a download link by googleing the title. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1641018
The basic idea is to build a tree with a hierarchical k-means algorithm to model the features and then leverage the sparse distribution of features in that tree to quickly find your nearest neighbors... or something like that, it's been a few years since I worked on it. You can find a powerpoint presentation on the authors webpage here: http://www.vis.uky.edu/~dnister/Publications/publications.html
A few other notes:
I wouldn't bother with the pyramid match kernel, it's really more for improving object recognition than duplicate/transformed image detection.
I would not store any of this feature stuff in an SQL database. Depending on your application it is sometimes more effective to compute your features on the fly since their size can exceed the original image size when computed densely. Histograms of features or pointers to nodes in a vocabulary tree are much more efficient.
SQL databases are not designed for doing massive floating point vector calculations. You can store things in your database, but don't use it as a tool for computation. I tried this once with SQLite and it ended very badly.
If you decide to implement this, read the paper in detail and keep a copy handy while implementing it, as there are many minor details that are very important to making the algorithm work efficiently.