Perform an empty query with filters in Sphinx

僤鯓⒐⒋嵵緔 提交于 2019-12-08 08:56:25

问题


I wanted to retrieve data from Sphinx without a query keyword but with filters from other attributes. These other attributes are integers. Here are the attributes of our index:

id - Integer
keyword - String
keyword_ord - Integer
words - Integer
results - Integer

We have approximately 300 Million keywords in our table and we tried to solve this issue by using an empty query in Sphinx (note: we are using PHP and MySQL). Suppose we wanted to get the keywords which have 3 to 6 words in it and those which have 3000 to 10000 results, then we use the SetFilterRange() function of the Sphinx API in PHP.

$sphinx->SetFilterRange( 'words', 3, 6 );
$sphinx->SetFilterRange( 'results', 3000, 10000 );

Then to execute the search, we send an empty query.

$results = $sphinx->query( '' );

The issue is that the query still seems to be slower than we've expected. Do you think there is a better way to fetch data with filters other than sending an empty query in Sphinx? Or is there a better solution for this other than Sphinx itself?

In my guess though, I think the reason why it's slow is because Sphinx has to actually loop through all 300 million keywords to find everything which falls under the filters. While if there's a keyword in the query specified (instead of an empty query), with the help of the indexes, it doesn't have to run through all the keywords but rather skip rows which do not contain the keywords. If this is the reason, then there must be a better approach to solve this other than using Sphinx.

As for our server hardware specs:

  • CPU: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz (8 cores)
  • Memory: 2GB
  • Disk Space: 250GB

回答1:


Sphinx can do the job very well honestly. Your specs are a bit low (memory) and should be bumped up a bit. If you have 300 Million rows (with an index), MySQL alone is eating a ton of that memory up. I'd upgrade to at least 8GB Memory for starters.

After a memory upgrade, I'd toy with the Sphinx configuration. I'd start by adding / using these options...

searchd
{
    max_matches         = 200000
    max_filter_values   = 300000
}

max_matches will limit the amount of total results in general, no reason to return 300 Million results.

max_filter_values is just a sanity check option. It will stop someone from selecting 300 Million tags as a filter option.

To search an empty query, you need:

$results = $sphinx->query( '*' );

I can tell you from experience that Sphinx is absolutely powerful enough to handle 300+ Million records.

Most of the time, Sphinx just doesn't have enough resources to be able to access the data quick enough. "2GB" RAM is being shared through the entire system, so how much is actually available to Sphinx can vary greatly. I've seen web servers boot up and spool up apache instances, mysql, memcached, etc-- and only be left with a 100MB of RAM, which is far from the amount a 300 million row search could take imo (haven't done bench testing to find out actual numbers)

edit Also, you'll eventually want to look into the Delta : Main indexing solutions. If you don't have multiple DB's setup to take on server load, when Sphinx indexes, it may end up locking MySQL until the query is finished.

edit I've ran into some what of an issue with the PHP API, so I compiled the Sphinx C Extension for PHP and that has worked wonders, cutting processing time in half. Before I ended up using the extension, I fixed parts of the API that sped everything up.

2 of the most important: -- comment out all "asserts" -- this may not be the safe way, but assert does not belong in production anyways. If you want to have the asserts running -- use the extension -- Find all "is_int" functions and replace with...

if ((int)$v === $v) {
/* code here */
}

The typecasting is actually about 30% faster with large queries.




回答2:


Thanks for your reply CrazyVipa (I'm Ronalds co-worker).

Our RAM is currently set to 2GB only because noone is using our site at the moment. Usually when we're using Sphinx we set the RAM to 12-16GB. We've tracked our RAM usage and it never exceeds 10GB.

But we'll try your configuration and query suggestions.

We'll get back here tomorrow.



来源:https://stackoverflow.com/questions/14231262/perform-an-empty-query-with-filters-in-sphinx

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!