database-indexes

Are POSIX' read() and write() system calls atomic?

我怕爱的太早我们不能终老 提交于 2019-11-28 11:43:35
I am trying to implement a database index based on the data structure (B link tree) and algorithms suggested by Lehman and Yao in this paper . In page 2, the authors state that: The disk is partitioned in sections of fixed size (physical pages; in this paper, these correspond to the nodes of the tree). These are the only units that can be read or written by a process. [emphasis mine] (...) (...) a process is allowed to lock and unlock a disk page. This lock gives that process exclusive modification rights to that page; also, a process must have a page locked in order to modify that page. (...)

Postgres not using index when index scan is much better option

微笑、不失礼 提交于 2019-11-28 08:30:21
I have a simple query to join two tables that's being really slow. I found out that the query plan does a seq scan on the large table email_activities (~10m rows) while I think using indexes doing nested loops will actually be faster. I rewrote the query using a subquery in an attempt to force the use of index, then noticed something interesting. If you look at the two query plans below, you will see that when I limit the result set of subquery to 43k, query plan does use index on email_activities while setting the limit in subquery to even 44k will cause query plan to use seq scan on email

Why are super columns in Cassandra no longer favoured?

倖福魔咒の 提交于 2019-11-28 04:42:25
I have read in the latest release that super columns are not desirable due to "performance issues", but no where is this explained. Then I read articles such as this one that give wonderful indexing patterns using super columns. This leave me with no idea of what is currently the best way to do indexing in Cassandra. What are the performance issues of super columns? Where can I find current best practices for indexing? jericevans Super columns suffer from a number of problems, not least of which is that it is necessary for Cassandra to deserialze all of the sub-columns of a super column when

Cassandra: choosing a Partition Key

♀尐吖头ヾ 提交于 2019-11-28 04:33:01
I'm undecided whether it's better, performance-wise, to use a very commonly shared column value (like Country ) as partition key for a compound primary key or a rather unique column value (like Last_Name ). Looking at Cassandra 1.2's documentation about indexes I get this: " When to use an index : Cassandra's built-in indexes are best on a table having many rows that contain the indexed value. The more unique values that exist in a particular column, the more overhead you will have, on average, to query and maintain the index. For example, suppose you had a user table with a billion users and

Can MySQL use multiple indexes for a single query?

断了今生、忘了曾经 提交于 2019-11-27 17:42:12
Imagine a table with multiple columns, say, id, a, b, c, d, e . I usually select by id , however, there are multiple queries in the client app that uses various conditions over subsets of the columns. When MySQL executes a query on a single table with multiple WHERE conditions on multiple columns, can it really make use of indexes created on different columns? Or the only way to make it fast is to create multi-column indexes for all possible queries? Yes, MySQL can use multiple index for a single query. The optimizer will determine which indexes will benefit the query. You can use EXPLAIN to

Deferrable, case-insensitive unique constraint

可紊 提交于 2019-11-27 17:34:32
问题 Is it possible in PostgreSQL to create a deferrable unique constraint on a character column, but case-insensitive? Let's assume the following basic table: CREATE TABLE sample_table ( my_column VARCHAR(100) ); If deferrable constraint is not needed, it is as simple as creating unique index with function, e.g.: CREATE UNIQUE INDEX my_unique_index ON sample_table(UPPER(my_column)); Deferred constraint check requires creating the constraint explicitly, e.g.: ALTER TABLE sample_table ADD

Neo4j: Step by Step to create an automatic index

江枫思渺然 提交于 2019-11-27 17:13:32
I am creating a new Neo4j database. I have a type of node called User and I would like an index on the properties of user Identifier and EmailAddress . How does one go setting up an index when the database is new? I have noticed in the neo4j.properties file there looks to be support for creating indexes. However when I set these as so # Autoindexing # Enable auto-indexing for nodes, default is false node_auto_indexing=true # The node property keys to be auto-indexed, if enabled node_keys_indexable=EmailAddress,Identifier And add a node and do a query to find an Identifier that I know exists

Why are super columns in Cassandra no longer favoured?

狂风中的少年 提交于 2019-11-27 05:25:39
问题 I have read in the latest release that super columns are not desirable due to "performance issues", but no where is this explained. Then I read articles such as this one that give wonderful indexing patterns using super columns. This leave me with no idea of what is currently the best way to do indexing in Cassandra. What are the performance issues of super columns? Where can I find current best practices for indexing? 回答1: Super columns suffer from a number of problems, not least of which is

Cassandra: choosing a Partition Key

穿精又带淫゛_ 提交于 2019-11-27 05:22:34
问题 I'm undecided whether it's better, performance-wise, to use a very commonly shared column value (like Country ) as partition key for a compound primary key or a rather unique column value (like Last_Name ). Looking at Cassandra 1.2's documentation about indexes I get this: " When to use an index : Cassandra's built-in indexes are best on a table having many rows that contain the indexed value. The more unique values that exist in a particular column, the more overhead you will have, on

Postgres not using index when index scan is much better option

浪尽此生 提交于 2019-11-27 01:42:35
问题 I have a simple query to join two tables that's being really slow. I found out that the query plan does a seq scan on the large table email_activities (~10m rows) while I think using indexes doing nested loops will actually be faster. I rewrote the query using a subquery in an attempt to force the use of index, then noticed something interesting. If you look at the two query plans below, you will see that when I limit the result set of subquery to 43k, query plan does use index on email