search-engine

How to do wildcard search using structured prefix operator with AWS CloudSearch

☆樱花仙子☆ 提交于 2019-12-12 22:16:30
问题 I've currently migrating to the 2013 Cloudsearch API (from the 2011 API). Previously, I had been using a wildcard prefix with my searches, like this: bq=(and 'first secon*') My queries sometimes include facet options, which is why I use the boolean query syntax and not the simple version. I've created a new cloudsearch instance using the 2013 engine and indexed it. The bq parameter is gone now, so I have to use the q parameter with the q.parser=structured parameter to get the same

How to hide website directory from search engines without Robots.txt?

大城市里の小女人 提交于 2019-12-12 21:42:36
问题 We know we can stop search engines from indexing directories on our site using robots.txt. But this of course has the disadvantage of actually publicising directories we don't want found to possible attackers. Password protecting the directory using .htaccess or other means is obviously the best way to keep the directory private. But what if, for reasons of convenience, we didn't want to add another layer of security to the directory and just wanted to add another level of obfuscation? To

GWT and Search Engines

爷,独闯天下 提交于 2019-12-12 20:40:14
问题 Does GWT app are indexed by search engines???? if yes, how to accomplish that? Thanks. 回答1: GWT apps and more generally ajax can't be fully indexed by search engines... yet. But work is being done to make ajax applications crawlable. The most common alternative used by developers to get their gwt app referenced is to publish an html version. 回答2: Search engines don't prefer html that generated on client side by javascript(Ajax). They prefer static html that generated from server side. That is

I have created inverted index for a website but where to store that? Database for a search engine?

南楼画角 提交于 2019-12-12 20:06:46
问题 What can be the database for a search engine? I mean after creating inverted index for a site, where one could store it so that program can create indices for other sites and save them too. Later on indexer can query them also. Because indices can range in thousands of billions. Thanksyou 回答1: I would use Lucene. That's what it is made for. You even have your choice of many different languages. 来源: https://stackoverflow.com/questions/3581792/i-have-created-inverted-index-for-a-website-but

What tools are out there for an Intranet search engine across a diverse toolset?

倖福魔咒の 提交于 2019-12-12 17:00:42
问题 Basic requirements: Should be able to index things like MediaWiki, Confluence, Sharepoint, GitHub:Enterprise, Askbot Should be reasonably smart about de-duping results (one reason Confluence search is so painful). Should definitely incorporate heuristics like how many pages link to a document, whether the search terms are in the title of the document, etc. If there's a way for users to downrank particular results, that might be a bonus. Should be somewhat tunable (e.g., prefer Confluence over

How solr filters actually implemented?

◇◆丶佛笑我妖孽 提交于 2019-12-12 14:08:39
问题 Is my understanding of query processing correct? Get DocSet from cache or First filter query will create implementation of OpenBitSet or SortedVIntSet and cache it Get DocSet from cache or All other filters create their implementation of DocBitSet and it will be intersected with original ( efficiency of this code depends on implementation of first implementation of DocSet ) We do leapfrog with MainQuery and final DocSet(after all intersections) using Lucene filter+query search( efficiency of

Any alternatives to Google Trends? [closed]

蓝咒 提交于 2019-12-12 13:32:39
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I'm writing a small helper utility for obscure software that is used at a local shop. Basically, I would like to know if anyone searches for anything associated with that software and if publishing my work on the Internet would make any sense. I entered the name of the software into Google Trends, but my terms

What should i know about search engine crawling?

时光毁灭记忆、已成空白 提交于 2019-12-12 10:54:30
问题 I don't mean SEO things. What should i know. Such as Do engines run javascript? Do they use cookies? Will cookies carry across crawl sessions (say cookies from today and a craw next week or month). Are selected JS filters not loaded for any reason? (Such as suspected ad which is ignored for optimization reasons?) I don't want to accidental have all index page say some kind of error or warning msg like please turn on your cookie, browser not supported, or not be indexed because i did something

can HBase , MapReduce and HDFS can work on a single machine having Hadoop installed and running on it?

邮差的信 提交于 2019-12-12 10:18:01
问题 I am working on a search engine design, which is to be run on cloud. We have just started, and have not much idea about Hdoop. Can anyone tell if HBase , MapReduce and HDFS can work on a single machine having Hdoop installed and running on it ? 回答1: Yes you can. You can even create a Virtual Machine and run it on there on a single "computer" (which is what I have :) ). The key is to simply install Hadoop in "Pseudo Distributed Mode" which is even described in the Hadoop Quickstart. If you use

Site search with CodeIgniter?

你说的曾经没有我的故事 提交于 2019-12-12 09:16:20
问题 I need to make a simple site search with pagination in it; could anyone tell me how to do it without affecting the URL structure? Currently I'm using the default CodeIgniter URL structure and I have removed index.php from it. Any suggestions? 回答1: You could just use a url like /search/search_term/page_number . Set your route like this: $route['search/:any'] = "search/index"; And your controller like this: function index() { $search_term = $this->uri->rsegment(3); $page = ( ! $this->uri-