query-optimization

Mysql Improve Search Performance with wildcards (%%)

不羁岁月 提交于 2019-11-27 09:05:49
Below is a query I use for searching a person by email SELECT * FROM phppos_customers JOIN phppos_people ON phppos_customers.person_id = phppos_people.person_id WHERE deleted = 0 AND email LIKE '%f%' ORDER BY email ASC Will adding an index on "email" speed up the query? No, because MySQL will not be able to utilize the index when you have a leading wildcard. If you changed your LIKE to 'f%', then it would be able to use the index. No, Mysql will not use the index because LIKE argument ( %f% ) starts with the wildcard character % . If it starts with a constant, index will be used. More info: 7

Linq2SQL “or/and” operators (ANDed / ORed conditions)

吃可爱长大的小学妹 提交于 2019-11-27 08:16:12
问题 Let's say we need to apply several conditions to select from a table called "Things" (unknown count and nature) if conditions are known, we can write db.Things.Where(t=>foo1 && foo2 || foo3); but if we have to build that Where condition programatically, I can imagine how can we apply ANDed conditions IQuerable DesiredThings = db.Things.AsQuerable(); foreach (Condition c in AndedConditions) DesiredThings = DesiredThings.Where(t => GenerateCondition(c,t)); What about ORed conditions ? Note: we

Spark count vs take and length

混江龙づ霸主 提交于 2019-11-27 08:14:48
问题 I'm using com.datastax.spark:spark-cassandra-connector_2.11:2.4.0 when run zeppelin notebooks and don't understand the difference between two operations in spark. One operation takes a lot of time for computation, the second one executes immediately. Could someone explain to me the differences between two operations: import com.datastax.spark.connector._ import org.apache.spark.sql.cassandra._ import org.apache.spark.sql._ import org.apache.spark.sql.types._ import org.apache.spark.sql

Preventing N+1 queries in Rails

蹲街弑〆低调 提交于 2019-11-27 07:12:50
问题 I've seen a few examples of passing an :include hash value when calling one of ActiveRecord's find methods in Rails. However, I haven't seen any examples of whether this is possible via relationship methods. For example, let's say I have the following: def User < ActiveRecord::Base has_many :user_favorites has_many :favorites, :through => :user_favorites end def Favorite < ActiveRecord::Base has_many :user_favorites has_many :users, :through => :user_favorites end def UserFavorite <

Query last N related rows per row

自闭症网瘾萝莉.ら 提交于 2019-11-27 06:35:42
问题 I have the following query which fetches the id of the latest N observations for each station : SELECT id FROM ( SELECT station_id, id, created_at, row_number() OVER(PARTITION BY station_id ORDER BY created_at DESC) AS rn FROM ( SELECT station_id, id, created_at FROM observations ) s ) s WHERE rn <= #{n} ORDER BY station_id, created_at DESC; I have indexes on id , station_id , created_at . This is the only solution I have come up with that can fetch more than a single record per station.

Broadcast not happening while joining dataframes in Spark 1.6

你离开我真会死。 提交于 2019-11-27 06:17:23
问题 Below is the sample code that I am running. when this spark job runs, Dataframe joins are happening using sortmergejoin instead of broadcastjoin. def joinedDf (sqlContext: SQLContext, txnTable: DataFrame, countriesDfBroadcast: Broadcast[DataFrame]): DataFrame = { txnTable.as("df1").join((countriesDfBroadcast.value).withColumnRenamed("CNTRY_ID", "DW_CNTRY_ID").as("countries"), $"df1.USER_CNTRY_ID" === $"countries.DW_CNTRY_ID", "inner") } joinedDf(sqlContext, txnTable, countriesDfBroadcast)

Sql: How to properly check if a record exists

大憨熊 提交于 2019-11-27 06:04:07
Reading some SQL Tuning documentation I found this: Select count(*) : - Counts the number of rows - Often is improperly used to verify the existence of a record Is Select count(*) really that bad? What's the proper way to verify the existence of a record? Martin Schapendonk It's better to use either of the following: -- Method 1. SELECT 1 FROM table_name WHERE key = value; -- Method 2. SELECT COUNT(1) FROM table_name WHERE key = value; The first alternative should give you no result or one result, the second count should be zero or one. How old is the documentation you're using? Although you

PostgreSQL - fetch the row which has the Max value for a column

末鹿安然 提交于 2019-11-27 05:59:17
I'm dealing with a Postgres table (called "lives") that contains records with columns for time_stamp, usr_id, transaction_id, and lives_remaining. I need a query that will give me the most recent lives_remaining total for each usr_id There are multiple users (distinct usr_id's) time_stamp is not a unique identifier: sometimes user events (one by row in the table) will occur with the same time_stamp. trans_id is unique only for very small time ranges: over time it repeats remaining_lives (for a given user) can both increase and decrease over time example: time_stamp|lives_remaining|usr_id|trans

Optimize groupwise maximum query

陌路散爱 提交于 2019-11-27 05:28:43
select * from records where id in ( select max(id) from records group by option_id ) This query works fine even on millions of rows. However as you can see from the result of explain statement: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=30218.84..31781.62 rows=620158 width=44) (actual time=1439.251..1443.458 rows=1057 loops=1) -> HashAggregate (cost=30218.41..30220.41 rows=200 width=4) (actual time=1439.203..1439.503 rows=1057 loops=1) -> HashAggregate (cost=30196.72.

mysql, ifnull vs coalesce, which is faster?

ぐ巨炮叔叔 提交于 2019-11-27 05:05:16
if it's known that there are only two values to candidate for the result of a column, ifnull(a, b) as a_or_b_1 and coalesce(a, b) as a_or_b_2 will give the same result. but which is faster? when searching i found this article , which says ifnull is faster. but it was the only article i found. any views on this? thanks in advance :) My view is that you should benchmark for your usage. I doubt there will be much difference. Bear in mind that while a single benchmark might suggest that one is slightly better, variation in the data over time might change that result. Also note that COALESCE has