query-optimization

Why is my query returning far to many results?

空扰寡人 提交于 2021-02-09 11:40:50
问题 I have a bunch of candidates, who have had one or more jobs, each with a company, using some skills. Bad ascii art follows: --------------- --------------- | candidate 1 | | candidate 2 | --------------- \ -------------- / \ | ------- -------- etc |job 1| | job 2 | ------- --------- / \ / \ --------- --------- --------- -------- |company | | skills | |company | | skills | --------- --------- ---------- ---------- Here's my database: mysql> describe jobs; +--------------+---------+------+-----

Subtracting the value from the last row using variable assignment in MySQL

人盡茶涼 提交于 2021-02-08 11:54:18
问题 According to the MySQL documentation: As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. http://dev.mysql.com/doc/refman/5.6/en/user-variables.html However, in the book High Perfomance MySQL there are a couple of examples of using this tactic to improve query performance anyway. Is the following an anti-pattern and if so is there a better way to write the query

How to optimise this MySQL query? Millions of Rows

筅森魡賤 提交于 2021-02-07 04:42:46
问题 I have the following query: SELECT analytics.source AS referrer, COUNT(analytics.id) AS frequency, SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales FROM analytics LEFT JOIN transactions ON analytics.id = transactions.analytics WHERE analytics.user_id = 52094 GROUP BY analytics.source ORDER BY frequency DESC LIMIT 10 The analytics table has 60M rows and the transactions table has 3M rows. When I run an EXPLAIN on this query, I get: +------+--------------+-----------------+--------+---

Aggregating distinct values from JSONB arrays combined with SQL group by

余生颓废 提交于 2021-01-29 15:13:11
问题 I am trying to aggregate distinct values from JSONB arrays in a SQL GROUP BY statement: One dataset has many cfiles and a cfile only ever has one dataset SELECT * FROM cfiles; id | dataset_id | property_values (jsonb) ----+------------+----------------------------------------------- 1 | 1 | {"Sample Names": ["SampA", "SampB", "SampC"]} 2 | 1 | {"Sample Names": ["SampA", "SampB", "SampD"]} 3 | 1 | {"Sample Names": ["SampE"]} 4 | 2 | {"Sample Names": ["SampA", "SampF"]} 5 | 2 | {"Sample Names":

Column definition incompatible with clustered column definition

老子叫甜甜 提交于 2021-01-28 22:03:47
问题 I have created a cluster in Oracle CREATE CLUSTER myLovelyCluster (clust_id NUMBER(38,0)) SIZE 1024 SINGLE TABLE HASHKEYS 11; Than a table for the cluster CREATE TABLE Table_cluster CLUSTER myLovelyCluster (columnRandom) AS SELECT * FROM myTable ; the columnRandom is well defined as NUMBER(38,0) but why I am getting an error assuming incompatible column definition? Thanks 回答1: Are you sure that columnRandom is number(38,0)? In oracle NUMBER != NUMBER(38,0) Let's create two table. create table

mysql query taking too long to execute

旧城冷巷雨未停 提交于 2021-01-28 13:38:00
问题 I have a query that is taking way too long to execute (4 seconds) even though all the fields i am querying against are indexed. Below are the query and the explain results. Any ideas what the problem is? (mysql CPU usage shoots up to 100% when executing the query EXPLAIN SELECT count(hd.did) as NumPo, `hd`.`sid`, `src`.`Name` FROM (`hd`) JOIN `result` ON `result`.`did` = `hd`.`did` JOIN `sf` ON `sf`.`fid` = `hd`.`fid` JOIN `src` ON `src`.`sid` = `hd`.`sid` WHERE `sf`.`tid` = 2 AND `result`.

mysql query taking too long to execute

北战南征 提交于 2021-01-28 13:37:27
问题 I have a query that is taking way too long to execute (4 seconds) even though all the fields i am querying against are indexed. Below are the query and the explain results. Any ideas what the problem is? (mysql CPU usage shoots up to 100% when executing the query EXPLAIN SELECT count(hd.did) as NumPo, `hd`.`sid`, `src`.`Name` FROM (`hd`) JOIN `result` ON `result`.`did` = `hd`.`did` JOIN `sf` ON `sf`.`fid` = `hd`.`fid` JOIN `src` ON `src`.`sid` = `hd`.`sid` WHERE `sf`.`tid` = 2 AND `result`.

How to disable all optimizations of PostgreSQL

白昼怎懂夜的黑 提交于 2021-01-28 03:00:40
问题 I'm studying query optimization and want to know how much each kind of optimizations help the query. Last time, I got an answer but when in my experiments, disable all optimization in the link has time complicity of O(n^1.8) enable all of them has O(n^0.5). there is not so much difference, if disable all of them, is there still other optimizations? how can I really have only one main optimizations each time? 回答1: You can't. PostgreSQL's query planner has no "turn off optimisation" flag. It'd

Hive Window Function ROW_NUMBER without Partition BY Clause on a large (50 GB) dataset is very slow. Is there a better way to optimize?

可紊 提交于 2021-01-04 07:25:26
问题 I have a HDFS file with 50 Million records and raw file size is 50 GB. I am trying to load this in a hive table and create unique id for all rows using the below, while loading. I am using Hive 1.1.0-cdh5.16.1. row_number() over(order by event_id, user_id, timestamp) as id While executing I see that in the reduce step, 40 reducers are assigned. Average time for 39 Reducers is about 2 mins whereas the last reducer takes about 25 mins which clearly makes me believe that most of the data is

Hive Window Function ROW_NUMBER without Partition BY Clause on a large (50 GB) dataset is very slow. Is there a better way to optimize?

£可爱£侵袭症+ 提交于 2021-01-04 07:24:06
问题 I have a HDFS file with 50 Million records and raw file size is 50 GB. I am trying to load this in a hive table and create unique id for all rows using the below, while loading. I am using Hive 1.1.0-cdh5.16.1. row_number() over(order by event_id, user_id, timestamp) as id While executing I see that in the reduce step, 40 reducers are assigned. Average time for 39 Reducers is about 2 mins whereas the last reducer takes about 25 mins which clearly makes me believe that most of the data is