query-optimization | 易学教程

Why is my query returning far to many results?

阅读更多关于 Why is my query returning far to many results?

问题 I have a bunch of candidates, who have had one or more jobs, each with a company, using some skills. Bad ascii art follows: --------------- --------------- | candidate 1 | | candidate 2 | --------------- \ -------------- / \ | ------- -------- etc |job 1| | job 2 | ------- --------- / \ / \ --------- --------- --------- -------- |company | | skills | |company | | skills | --------- --------- ---------- ---------- Here's my database: mysql> describe jobs; +--------------+---------+------+-----

Subtracting the value from the last row using variable assignment in MySQL

阅读更多关于 Subtracting the value from the last row using variable assignment in MySQL

问题 According to the MySQL documentation: As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. http://dev.mysql.com/doc/refman/5.6/en/user-variables.html However, in the book High Perfomance MySQL there are a couple of examples of using this tactic to improve query performance anyway. Is the following an anti-pattern and if so is there a better way to write the query

How to optimise this MySQL query? Millions of Rows

阅读更多关于 How to optimise this MySQL query? Millions of Rows

问题 I have the following query: SELECT analytics.source AS referrer, COUNT(analytics.id) AS frequency, SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales FROM analytics LEFT JOIN transactions ON analytics.id = transactions.analytics WHERE analytics.user_id = 52094 GROUP BY analytics.source ORDER BY frequency DESC LIMIT 10 The analytics table has 60M rows and the transactions table has 3M rows. When I run an EXPLAIN on this query, I get: +------+--------------+-----------------+--------+---

Aggregating distinct values from JSONB arrays combined with SQL group by

阅读更多关于 Aggregating distinct values from JSONB arrays combined with SQL group by

问题 I am trying to aggregate distinct values from JSONB arrays in a SQL GROUP BY statement: One dataset has many cfiles and a cfile only ever has one dataset SELECT * FROM cfiles; id | dataset_id | property_values (jsonb) ----+------------+----------------------------------------------- 1 | 1 | {"Sample Names": ["SampA", "SampB", "SampC"]} 2 | 1 | {"Sample Names": ["SampA", "SampB", "SampD"]} 3 | 1 | {"Sample Names": ["SampE"]} 4 | 2 | {"Sample Names": ["SampA", "SampF"]} 5 | 2 | {"Sample Names":

Column definition incompatible with clustered column definition

阅读更多关于 Column definition incompatible with clustered column definition

问题 I have created a cluster in Oracle CREATE CLUSTER myLovelyCluster (clust_id NUMBER(38,0)) SIZE 1024 SINGLE TABLE HASHKEYS 11; Than a table for the cluster CREATE TABLE Table_cluster CLUSTER myLovelyCluster (columnRandom) AS SELECT * FROM myTable ; the columnRandom is well defined as NUMBER(38,0) but why I am getting an error assuming incompatible column definition? Thanks 回答1: Are you sure that columnRandom is number(38,0)? In oracle NUMBER != NUMBER(38,0) Let's create two table. create table

mysql query taking too long to execute

阅读更多关于 mysql query taking too long to execute

问题 I have a query that is taking way too long to execute (4 seconds) even though all the fields i am querying against are indexed. Below are the query and the explain results. Any ideas what the problem is? (mysql CPU usage shoots up to 100% when executing the query EXPLAIN SELECT count(hd.did) as NumPo, `hd`.`sid`, `src`.`Name` FROM (`hd`) JOIN `result` ON `result`.`did` = `hd`.`did` JOIN `sf` ON `sf`.`fid` = `hd`.`fid` JOIN `src` ON `src`.`sid` = `hd`.`sid` WHERE `sf`.`tid` = 2 AND `result`.

mysql query taking too long to execute

阅读更多关于 mysql query taking too long to execute

How to disable all optimizations of PostgreSQL

阅读更多关于 How to disable all optimizations of PostgreSQL

问题 I'm studying query optimization and want to know how much each kind of optimizations help the query. Last time, I got an answer but when in my experiments, disable all optimization in the link has time complicity of O(n^1.8) enable all of them has O(n^0.5). there is not so much difference, if disable all of them, is there still other optimizations? how can I really have only one main optimizations each time? 回答1: You can't. PostgreSQL's query planner has no "turn off optimisation" flag. It'd

Hive Window Function ROW_NUMBER without Partition BY Clause on a large (50 GB) dataset is very slow. Is there a better way to optimize?

阅读更多关于 Hive Window Function ROW_NUMBER without Partition BY Clause on a large (50 GB) dataset is very slow. Is there a better way to optimize?

问题 I have a HDFS file with 50 Million records and raw file size is 50 GB. I am trying to load this in a hive table and create unique id for all rows using the below, while loading. I am using Hive 1.1.0-cdh5.16.1. row_number() over(order by event_id, user_id, timestamp) as id While executing I see that in the reduce step, 40 reducers are assigned. Average time for 39 Reducers is about 2 mins whereas the last reducer takes about 25 mins which clearly makes me believe that most of the data is

Hive Window Function ROW_NUMBER without Partition BY Clause on a large (50 GB) dataset is very slow. Is there a better way to optimize?

阅读更多关于 Hive Window Function ROW_NUMBER without Partition BY Clause on a large (50 GB) dataset is very slow. Is there a better way to optimize?