Google-like Search Engine in PHP/mySQL [closed]

扶醉桌前 提交于 2019-11-28 17:16:54

You can also try out SphinxSearch. Craigslist uses sphinx and it can connect to both mysql and postgresql.

There are some interesting search engines for you to take a look at. I don't know what you mean by "Google like" so I'm just going to ignore that part.

  • Take a look at the Lucene engine. The original is high performance but written in Java. There is a port of Lucene to PHP (already mentioned elsewhere) but it is too slow.
  • Take a serious look at the Xapian Project. It's fast. It's written in C++ so you'll most probably have to build it for your target server(s) but has PHP bindings.

If MySQL's fulltext search is taking 20 seconds per query, you either have it misconfigured or running on underpowered hardware - some big sites are successfully using plain old MyISAM searching.

My vote goes for Solr, however. It's based on Lucene, so you get all the richness and performance of that best of breed product, but with a RESTful API, making it very easily from PHP. There's even a dW article.

You could put all the files on Google Docs, then scrape the results to your own web site.

My concern is that OCR accuracy is still an issue, so one consideration for a search requirement is the ability to perform "fuzzy" searches. Fuzzy meaning when the OCR incorrectly recognizes the word "hat" for "hot", the search engine will be smart enough to return results that are similar but not exact. In Oracle, there is a function called UTL_MATCH that compares the similarity between two strings: http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/u_match.htm#ARPLS352

A function like this would be useful.

Your scenario suggest, that you'd like to roll your own; good starting points for a general search engine would include:

If you want to use an off-shelf solution:

Why don't you try something like Google Search Appliance or Google Enterprise? It will have cost associated but then it will save you from re-inventing the wheel and give you "google like" search.

Check this Lucene port for PHP:

You might want to check Sphider. In my experience it is quite fast and does the indexing automatically. It is also open source so you could take the code and modify it for your needs.

sqlite has quite good full text search capability (look up sqlite FTS 3/4 - its surprisingly good)

if you want simple a PHP diy approach indexing using up of lots of small files split by a hash of the terms being indexed can work very well amd searching can be very fast even in php if you take care designing it. (the idea is to make a search on a term only need to search a very small file containing terms matching the hash and record id's - you could use bitarray slices to represent record ids if you want to save HD space) .. but doing the indexing of every word for fulltext would be slow in php .. that part should really be done in c

for "Fuzzy" searches maybe look at using metaphone hashes.

for pre-built fulltext tools check out these: sqlite FTS 3/4 (sqlite has very good fulltext search capability!), Sphinx, kinoSearch (kinoSearch is a bit like Lucene but the back-end is c with a nice easy perl wrapper - there is also cLucene but I think thats still pre-alpha)

Java Lucene (or anything Java-based) probably needs a lot of ram to to be set aside to run a JVM - so probably not so great if you are on a budget

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!