Im working on an automatic image annotation problem in which im trying to associate tags with images. For that im trying for SIFT features for learning. But the problem is a
You can represent single SIFT as "visual word" which is one number and use it as SVM input, I think it is what you need. It is usually done by k-means clustering.
This method is called "bag-of-words" and described in this paper.