search-engine

Which datatype and methods should I use?

会有一股神秘感。 提交于 2019-12-02 19:55:51
问题 I am trying to write a kind of simple search engine. I have a determined number of main subjects that are associated with specific keywords. The aim is to recognize the main subject from an input partial keyword. I am thinking of using a : Dictionary<string, List<string>> . I'll have to search in this dictionary and find, e.g., all keywords beginning with a 3 characters string and their main subject which is associated. Is my solution the best one ? And how can I efficiently look through

Is it possible to develop a powerful web search engine using Erlang, Mnesia & Yaws?

99封情书 提交于 2019-12-02 19:47:19
I am thinking of developing a web search engine using Erlang, Mnesia & Yaws. Is it possible to make a powerful and the fastest web search engine using these software? What will it need to accomplish this and how what do I start with? Erlang can make the most powerful web crawler today. Let me take you through my simple crawler. Step 1. I create a simple parallelism module, which i call mapreduce -module(mapreduce). -export([compute/2]). %%===================================================================== %% usage example %% Module = string %% Function = tokens %% List_of_arg_lists = [["file

can't find error in my code?

本秂侑毒 提交于 2019-12-02 19:43:06
问题 I am making a simple search code. I can't find error. The error message says Uncaught SyntaxError: Unexpected identifier on javascript line 40 (target=document.getElementById("outputPlace") . Do not look at the button, I have not added event listener to it yet. I just want that when I press enter products are displayed. HTML CODE <html> <head> <title>Price List </title> </head> <body> <h1> PRICELIST </h1> <form id="formSearch"> <div> <label for="searchBox"> Search products here: </label>

Improving search result using Levenshtein distance in Java

耗尽温柔 提交于 2019-12-02 17:40:07
I have following working Java code for searching for a word against a list of words and it works perfectly and as expected: public class Levenshtein { private int[][] wordMartix; public Set similarExists(String searchWord) { int maxDistance = searchWord.length(); int curDistance; int sumCurMax; String checkWord; // preventing double words on returning list Set<String> fuzzyWordList = new HashSet<>(); for (Object wordList : Searcher.wordList) { checkWord = String.valueOf(wordList); curDistance = calculateDistance(searchWord, checkWord); sumCurMax = maxDistance + curDistance; if (sumCurMax ==

Is there a search engine that support regular expression search? [closed]

做~自己de王妃 提交于 2019-12-02 17:24:58
First, I checked this question but the answer refers to an obsolete service. So is there a web-based (or software, I don't care) that provide searching internet content with regular expression? Let me write here an answer from the superuser.com question due to my complete solidarity with the author: quote from the Ask Metafilter : The only possible way to make keyword searching efficient over hundreds of terabytes (or whatever their index is up to these days) is to precompute an index of words. In fact a full regex engine is turing-complete, and you can write arbitrary regexps that will gobble

What does percolator mean/do in elasticsearch?

笑着哭i 提交于 2019-12-02 17:23:05
Even though I read the documentation for Elasticsearch to understand what a percolator is. I still have difficulty understanding what it means and where it is used in simple terms. Can anyone provide me with more details? What you usually do is index documents and get them back by querying. What the percolator allows you to do in a nutshell is index your queries and percolate documents against the indexed queries to know which queries they match. It's also called reversed search, as what you do is the opposite to what you are used to. There are different usecases for the percolator, the first

Is there any free unlimited album artwork search API service? [closed]

两盒软妹~` 提交于 2019-12-02 17:14:50
Google's custom search API has a limitation up to 100 queries per day. That is far less than what I expected. I want to add that artwork-search function to my app. Thanks a lot. How about Discogs , or Amazon or seeing what Cover Fetcher does? Musicbrainz and the Internet Archive offer the Cover Art Archive but you do need to request using the album's MBID . Last.FM provides an API for getting art but you can't actually use it in publicly distributed applications. They said that the recording companies own the art and they aren't licensed to distribute it. Huh? Why the API then? I don't get it.

Search engine Lucene vs Database search

橙三吉。 提交于 2019-12-02 16:36:54
I am using a MySQL database and have been using database driven search. Any advantages and disadvantages of database engines and Lucene search engine? I would like to have suggestions about when and where to use them? Yuval F I suggest you read Full Text Search Engines vs. DBMS . A one-liner would be: If the bulk of your use case is full text search, use Lucene. If the bulk of your use case is joins and other relational operations, use a database. You may use a hybrid solution for a more complicated use case. Use Lucene when you want to index textual Documents (of any length) and search for

Where can I find materials about indexing and page ranking?

一笑奈何 提交于 2019-12-02 16:23:01
问题 I'm working on a large search engine system. However, I'm not familiar with the background. Where can I find materials about indexing and page ranking? 回答1: You can always look at the google research stuff. It is naturally very intense stuff but interesting none the less. 回答2: Modern information Retrieval A very known and a good book that will introduce you to these concepts. 来源: https://stackoverflow.com/questions/1867320/where-can-i-find-materials-about-indexing-and-page-ranking

Strategy for how to crawl/index frequently updated webpages?

亡梦爱人 提交于 2019-12-02 14:57:18
I'm trying to build a very small, niche search engine, using Nutch to crawl specific sites. Some of the sites are news/blog sites. If I crawl, say, techcrunch.com, and store and index their frontpage or any of their main pages, then within hours my index for that page will be out of date. Does a large search engine such as Google have an algorithm to re-crawl frequently updated pages very frequently, hourly even? Or does it just score frequently updated pages very low so they don't get returned? How can I handle this in my own application? Good question. This is actually an active topic in WWW