information-retrieval | 易学教程

What tried and true algorithms for suggesting related articles are out there?

阅读更多关于 What tried and true algorithms for suggesting related articles are out there?

问题 Pretty common situation, I'd wager. You have a blog or news site and you have plenty of articles or blags or whatever you call them, and you want to, at the bottom of each, suggest others that seem to be related. Let's assume very little metadata about each item. That is, no tags, categories. Treat as one big blob of text, including the title and author name. How do you go about finding the possibly related documents? I'm rather interested in the actual algorithm, not ready solutions,

Image retrieval system by Colour from the web using C++ with openframeworks

阅读更多关于 Image retrieval system by Colour from the web using C++ with openframeworks

问题 I am writing a program in C++ and openFrameworks that should hopefully implement an image retrieval system by colour matching. I have got an algorithm to find the match in a database by an rgb value. For example, if I have a database of 1000 pictures on my computer and I have a query rgb value 255,0,0 the program would look through 1000 pictures and find the closest match. However, my problem is that I want it to also look for the match on the web. I have been trying to find how to get images

What is the best way to compute trending topics or tags?

阅读更多关于 What is the best way to compute trending topics or tags?

问题 Many sites offer some statistics like "The hottest topics in the last 24h". For example, Topix.com shows this in its section "News Trends". There, you can see the topics which have the fastest growing number of mentions. I want to compute such a "buzz" for a topic, too. How could I do this? The algorithm should weight the topics which are always hot less. The topics which normally (almost) noone mentions should be the hottest ones. Google offers "Hot Trends", topix.com shows "Hot Topics", fav

Python: tf-idf-cosine: to find document similarity

阅读更多关于 Python: tf-idf-cosine: to find document similarity

I was following a tutorial which was available at Part 1 & Part 2 . Unfortunately the author didn't have the time for the final section which involved using cosine similarity to actually find the distance between two documents. I followed the examples in the article with the help of the following link from stackoverflow , included is the code mentioned in the above link (just so as to make life easier) from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from nltk.corpus import stopwords import numpy as np import numpy.linalg