information-retrieval

Is there a search engine that will give a direct answer? [closed]

荒凉一梦 提交于 2019-11-30 03:57:22
I've been wondering about this for a while and I can't see why Google haven't tried it yet - or maybe they have and I just don't know about it. Is there a search engine that you can type a question into which will give you a single answer rather than a list of results which you then have to trawl through yourself to find what you want to know? For example, this is how I would design the system: User’s input: “Where do you go to get your eyes tested?” System output: “Opticians. Certainty: 95%” This would be calculated as follows: The input is parsed from natural language into a simple search

TF-IDF implementations in python

烈酒焚心 提交于 2019-11-30 03:28:59
What are the standard tf-idf implementations/api available in python? I've come across the one in nltk. I want to know the other libraries that provide this feature. Gunjan there is a package called scikit which calculates tf-idf scores. you can refer to my answer to this question Python: tf-idf-cosine: to find document similarity and also see the question code from this. Thankz. Try the libraries which implements TF-IDF algorithm in python. http://code.google.com/p/tfidf/ https://github.com/hrs/python-tf-idf Unfortunately, questions asking for a tool or library are offtopic on SO. There are

What is the TREC format?

馋奶兔 提交于 2019-11-30 02:48:45
问题 I'm looking for the specifications of the TREC format . I've been googling a lot but I didn't find a clue. Does any one know where to find any information about it? 回答1: AFAIK TREC is an abbreviation for NIST's Text REtrieval Conference. In order for the indexer to know where the document boundaries are within files, each document must have begin document and end document tags. These tags are similar to HTML or XML tags and are actually the format for TREC documents. TrecParser: This parser

Information retrieval (IR) vs data mining vs Machine Learning (ML)

你说的曾经没有我的故事 提交于 2019-11-29 20:28:54
People often throw around the terms IR, ML, and data mining, but I have noticed a lot of overlap between them. From people with experience in these fields, what exactly draws the line between these? doug This is just the view of one person (formally trained in ML); others might see things quite differently. Machine Learning is probably the most homogeneous of these three terms, and the most consistently applied--it's limited to the pattern-extraction (or pattern-matching) algorithms themselves. Of the terms you mentioned, "Machine Learning" is the one most used by Academic Departments to

Calculating tf-idf among documents using python 2.7

六月ゝ 毕业季﹏ 提交于 2019-11-29 11:59:24
I have a scenario where i have retreived information/raw data from the internet and placed them into their respective json or .txt files. From there on i would like to calculate the frequecies of each term in each document and their cosine similarity by using tf-idf. For example: there are 50 different documents/texts files that consists 5000 words/strings each i would like to take the first word from the first document/text and compare all the total 250000 words find its frequencies then do so for the second word and so on for all 50 documents/texts. Expected output of each frequecy will be

Confusion about (Mean) Average Precision

那年仲夏 提交于 2019-11-29 07:14:51
In this question I asked clarifications about the precision-recall curve. In particular, I asked if we have to consider a fixed number of rankings to draw the curve or we can reasonably choose ourselves. According to the answer , the second one is correct. However now I have a big doubt about the Average Precision (AP) value: AP is used to estimate numerically how good is our algorithm given a certain query. Mean Average Precision (MAP) is average precision on multiple queries. My doubt is: if AP changes according to how many objects we retrieve then we can tune this parameter to our advantage

How can I retrieve my Google search history?

北慕城南 提交于 2019-11-29 06:59:38
In the Google Web History interface I can see all the search queries I have used over the years, and the pages I visited for a particular query. Is there a way I can retrieve this history using a computer program? I couldn't find a Google API that does it. Do you know of a tool that can do this, or suggest a way to achieve this? There's an RSS feed . Update : the link is now broken. The RSS feed in the accepted answer above does not exist anymore. Google does not provide an API that allows you retrieve Google searches, but it does allow you to download an archive of all past searches, via

Are there any API's that'll let me search by image?

岁酱吖の 提交于 2019-11-29 03:01:32
问题 I have an image and I want to search to see what it is. Any API's available for that? 回答1: I believe there are quite a few. You want to search for Content-based Image Retrieval (CBIR). Wikipedia has a page of CBIR engines, including an extensive list of open source ones. For example, isk-daemon and LIRE are both open source CBIR libraries: isk-daemon is an open source standalone server and library capable of adding content-based (visual) image searching to any image related website or

Is there a search engine that will give a direct answer? [closed]

烂漫一生 提交于 2019-11-29 01:33:06
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I've been wondering about this for a while and I can't see why Google haven't tried it yet - or maybe they have and I just don't know about it. Is there a search engine that you can type a question into which will give you a single answer rather than a list of results which you then have to trawl through

Cosine similarity and tf-idf

你。 提交于 2019-11-28 17:02:41
I am confused by the following comment about TF-IDF and Cosine Similarity . I was reading up on both and then on wiki under Cosine Similarity I find this sentence "In case of of information retrieval, the cosine similarity of two documents will range from 0 to 1, since the term frequencies (tf-idf weights) cannot be negative. The angle between two term frequency vectors cannot be greater than 90." Now I'm wondering....aren't they 2 different things? Is tf-idf already inside the cosine similarity? If yes, then what the heck - I can only see the inner dot products and euclidean lengths. I