indexing | 易学教程

Why does Mongo hint make a query run up to 10 times faster?

阅读更多关于 Why does Mongo hint make a query run up to 10 times faster?

问题 If I run a mongo query from the shell with explain(), get the name of the index used and then run the same query again, but with hint() specifying the same index to be used - "millis" field from explain plan is decreased significantly for example no hint provided: >>db.event.find({ "type" : "X", "active" : true, "timestamp" : { "$gte" : NumberLong("1317498259000") }, "count" : { "$gte" : 0 } }).limit(3).sort({"timestamp" : -1 }).explain(); { "cursor" : "BtreeCursor my_super_index", "nscanned"

Elasticsearch index much larger than the actual size of the logs it indexed?

阅读更多关于 Elasticsearch index much larger than the actual size of the logs it indexed?

问题 I noticed that elasticsearch consumed over 30GB of disk space over night. By comparison the total size of all the logs I wanted to index is only 5 GB...Well, not even that really, probably more like 2.5-3GB. Is there any reason for this and is there a way to re-configure it? I'm running the ELK stack. 回答1: There are a number of reasons why the data inside of Elasticsearch would be much larger than the source data. Generally speaking, Logstash and Lucene are both working to add structure to

start index at 1 for Pandas DataFrame

阅读更多关于 start index at 1 for Pandas DataFrame

问题 I need the index to start at 1 rather than 0 when writing a Pandas DataFrame to CSV. Here's an example: In [1]: import pandas as pd In [2]: result = pd.DataFrame({'Count': [83, 19, 20]}) In [3]: result.to_csv('result.csv', index_label='Event_id') Which produces the following output: In [4]: !cat result.csv Event_id,Count 0,83 1,19 2,20 But my desired output is this: In [5]: !cat result2.csv Event_id,Count 1,83 2,19 3,20 I realize that this could be done by adding a sequence of integers

start index at 1 for Pandas DataFrame

阅读更多关于 start index at 1 for Pandas DataFrame

One 400GB table, One query - Need Tuning Ideas (SQL2005)

阅读更多关于 One 400GB table, One query - Need Tuning Ideas (SQL2005)

问题 I have a single large table which I would like to optimize. I'm using MS-SQL 2005 server. I'll try to describe how it is used and if anyone has any suggestions I would appreciate it very much. The table is about 400GB, has 100 million rows and 1 million rows are inserted each day. The table has 8 columns, 1 data col and 7 columns used for lookups/ordering. k1 k2 k3 k4 k5 k6 k7 d1 where k1: varchar(3), primary key - clustered index, 10 possible values k2: bigint, primary key - clustered index,

implementing next and back buttons for a slideshow

阅读更多关于 implementing next and back buttons for a slideshow

问题 I'm trying to make a php slideshow and I'm almost done I just need to implement the next and back buttons which I thought were going to be easy, but apparently you can't increment indexes in php? $sql = "SELECT pic_url FROM pic_info"; $result = $conn->query($sql); $count = 0; $dir = "http://dev2.matrix.msu.edu/~matrix.training/Holmberg_Dane/"; $source = "gallery.php"; if ($result->num_rows > 0) { // output data of each row $pic_array = array(); while ($row = $result->fetch_assoc()) { $pic

On which OS Search.CollatorDSO is available?

阅读更多关于 On which OS Search.CollatorDSO is available?

问题 I am trying to search file system using Search.CollatorDSO: Provider=Search.CollatorDSO;Extended Properties="Application=Windows" On what Windows OS is this provider available by default? According to this question it is not installed on Web editions of Windows Server. If it is not installed can it be installed manually? 回答1: After much digging, it appears the only way to get the ole db provider: Search.CollatorDSO is to enable the Search service in windows OS itself. Installing Search Server

(Update) Add index column to data.frame based on two columns

阅读更多关于 (Update) Add index column to data.frame based on two columns

问题 Example data.frame: df = read.table(text = 'colA colB 2 7 2 7 2 7 2 7 1 7 1 7 1 7 89 5 89 5 89 5 88 5 88 5 70 5 70 5 70 5 69 5 69 5 44 4 44 4 44 4 43 4 42 4 42 4 41 4 41 4 120 1 100 1', header = TRUE) I need to add an index col based on colA and colB where colB shows the exact number of rows to group but it can be duplicated. colB groups rows based on colA and colA -1 . Expected output: colA colB index_col 2 7 1 2 7 1 2 7 1 2 7 1 1 7 1 1 7 1 1 7 1 89 5 2 89 5 2 89 5 2 88 5 2 88 5 2 70 5 3 70

Data Manipulation - Sort Index when values are Alphanumeric

阅读更多关于 Data Manipulation - Sort Index when values are Alphanumeric

问题 I'm wondering how I should approach this data manipulation predicament. What is the best method to sort an index of a multi-index in a data frame where the values of on level of the index is alphanumeric. The values are: [u'0', u'1', u'10', u'11', u'2', u'2Y', u'3', u'3Y', u'4', u'4Y', u'5', u'5Y', u'6', u'7', u'8', u'9', u'9Y'] The result I'm searching for is: [u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10', u'11', u'2Y', u'3Y', u'4Y', u'5Y', u'9Y'] The plain numeric

Which column to put first in index? Higher or lower cardinality?

阅读更多关于 Which column to put first in index? Higher or lower cardinality?

问题 For example, if I have a table with a city and a state column, what is the best way to use the index? Obviously city will have the highest cardinality, so should I put that column first in the index, should I put state or doesn't it matter much? 回答1: MySQL composite index lookups must take place in the order in which the columns are defined within the index. Since you want MySQL to be able to discriminate between records by performing as few comparisons as possible, with all other things