GreenPlum 数据倾斜排查

余生长醉 提交于 2020-08-12 01:57:49




  • gp_skew_coefficients
  • gp_skew_idle_fractions


The gp_toolkit.gp_skew_coefficients view shows data distribution skew by calculating the coefficient of variation (CV) for the data stored on each segment. The skccoeff column shows the coefficient of variation (CV), which is calculated as the standard deviation divided by the average. It takes into account both the average and variability around the average of a data series. The lower the value, the better. Higher values indicate greater data skew.


字段 描述
skcoid 表的对象标识符
skcrelname 表的名字



The gp_toolkit.gp_skew_idle_fractions view shows data distribution skew by calculating the percentage of the system that is idle during a table scan, which is an indicator of computational skew. The siffraction column shows the percentage of the system that is idle during a table scan. This is an indicator of uneven data distribution or query processing skew. For example, a value of 0.1 indicates 10% skew, a value of 0.5 indicates 50% skew, and so on. Tables that have more than10% skew should have their distribution policies evaluated.

字段 描述
sifoid 表的对象标识符
sifname 表的名字


以上两个view只能从静态去分析数据是否倾斜,事实上,在建立分布键的时候都有充分考虑,因此因为分布键设计不合理导致的数据倾斜很少。 后续可以继续逐步排查。

造成GP性能不好的真正的凶手应该是正在运行的某个sql产生了大量的数据motion。 这个对系统的I/O 网络 CPU的压力都是很大的。SQL中常见的join、order by、group by以及其他OLAP类型的sql,可能产生倾斜的时间并不久,但是这足以影响其他sql,影响数据库效能,如果大量的倾斜sql打到数据库上,这个是致命的。


GP官方给出来一个步骤来分析process skew的例子。

  1. 先确定要排查的数据OID,这是为下一步要着手分析哪个数据库上有倾斜。
    =# SELECT oid, datname FROM pg_database;
      oid  |  datname  
         1 | template1
     12813 | template0
     12816 | postgres
     16384 | qmstst
     64919 | gpperfmon
     78257 | pgbench
     78258 | results
    (7 rows)


  2. 使用gpssh 从segment上统计出每个seg上data目录的大小
  4. [gpadmin@mdw kend]$ gpssh -f ~/hosts -e \ "du -b /data[1-2]/primary/gpseg*/base/<OID>/pgsql_tmp/*" | \ grep -v "du -b" | sort | awk -F" " '{ arr[$1] = arr[$1] + $2 ; tot = tot + $2 }; END \ { for ( i in arr ) print "Segment node" i, arr[i], "bytes (" arr[i]/(1024**3)" GB)"; \ print "Total", tot, "bytes (" tot/(1024**3)" GB)" }' -
  5. Example output:
    Segment node[sdw1] 2443370457 bytes (2.27557 GB)
    Segment node[sdw2] 1766575328 bytes (1.64525 GB)
    Segment node[sdw3] 1761686551 bytes (1.6407 GB)
    Segment node[sdw4] 1780301617 bytes (1.65804 GB)
    Segment node[sdw5] 1742543599 bytes (1.62287 GB)
    Segment node[sdw6] 1830073754 bytes (1.70439 GB)
    Segment node[sdw7] 1767310099 bytes (1.64594 GB)
    Segment node[sdw8] 1765105802 bytes (1.64388 GB)
    Total 14856967207 bytes (13.8366 GB)

    If there is a significant and sustained difference in disk usage, then the queries being executed should be investigated for possible skew (the example output above does not reveal significant skew). In monitoring systems, there will always be some skew, but often it is transient and will be short in duration.

  6. If significant and sustained skew appears, the next task is to identify the offending query.

    The command in the previous step sums up the entire node. This time, find the actual segment directory. You can do this from the master or by logging into the specific node identified in the previous step. Following is an example run from the master.

    This example looks specifically for sort files. Not all spill files or skew situations are caused by sort files, so you will need to customize the command:
    $ gpssh -f ~/hosts -e
        "ls -l /data[1-2]/primary/gpseg*/base/19979/pgsql_tmp/*"
        | grep -i sort | awk '{sub(/base.*tmp\//, ".../", $10); print $1,$6,$10}' | sort -k2 -n
    Here is output from this command:
    [sdw1] 288718848
          /data1/primary/gpseg2/.../pgsql_tmp_slice0_sort_17758_0001.0[sdw1] 291176448
          /data2/primary/gpseg5/.../pgsql_tmp_slice0_sort_17764_0001.0[sdw8] 924581888
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0010.9[sdw4] 980582400
          /data1/primary/gpseg18/.../pgsql_tmp_slice10_sort_29425_0001.0[sdw6] 986447872
          /data2/primary/gpseg35/.../pgsql_tmp_slice10_sort_29602_0001.0...[sdw5] 999620608
          /data1/primary/gpseg26/.../pgsql_tmp_slice10_sort_28637_0001.0[sdw2] 999751680
          /data2/primary/gpseg9/.../pgsql_tmp_slice10_sort_3969_0001.0[sdw3] 1000112128
          /data1/primary/gpseg13/.../pgsql_tmp_slice10_sort_24723_0001.0[sdw5] 1000898560
          /data2/primary/gpseg28/.../pgsql_tmp_slice10_sort_28641_0001.0...[sdw8] 1008009216
          /data1/primary/gpseg44/.../pgsql_tmp_slice10_sort_15671_0001.0[sdw5] 1008566272
          /data1/primary/gpseg24/.../pgsql_tmp_slice10_sort_28633_0001.0[sdw4] 1009451008
          /data1/primary/gpseg19/.../pgsql_tmp_slice10_sort_29427_0001.0[sdw7] 1011187712
          /data1/primary/gpseg37/.../pgsql_tmp_slice10_sort_18526_0001.0[sdw8] 1573741824
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0001.0[sdw8] 1573741824
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0002.1[sdw8] 1573741824
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0003.2[sdw8] 1573741824
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0004.3[sdw8] 1573741824
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0005.4[sdw8] 1573741824
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0006.5[sdw8] 1573741824
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0007.6[sdw8] 1573741824
          /data2/primary/gpseg45/.../pgsql_tmp_slice10_sort_15673_0008.7[sdw8] 1573741824

    Scanning this output reveals that segment gpseg45 on host sdw8 is the culprit, as its sort files are larger than the others in the output.

  7. Log in to the offending node with ssh and become root. Use the lsof command to find the PID for the process that owns one of the sort files:
    [root@sdw8 ~]# lsof /data2/primary/gpseg45/base/19979/pgsql_tmp/pgsql_tmp_slice10_sort_15673_0002.1
    postgres 15673  gpadmin 11u  REG  8,48    1073741824  64424546751 /data2/primary/gpseg45/base/19979/pgsql_tmp/pgsql_tmp_slice10_sort_15673_0002.1
    The PID, 15673, is also part of the file name, but this may not always be the case.
  8. Use the ps command with the PID to identify the database and connection information:
    [root@sdw8 ~]# ps -eaf | grep 15673
    gpadmin  15673 27471 28 12:05 ?        00:12:59 postgres: port 40003, sbaskin bdw
   con699238 seg45 cmd32 slice10 MPPEXEC SELECT
    root     29622 29566  0 12:50 pts/16   00:00:00 grep 15673
  9. On the master, check the pg_log log file for the user in the previous command (sbaskin), connection (con699238, and command (cmd32). The line in the log file with these three values should be the line that contains the query, but occasionally, the command number may differ slightly. For example, the ps output may show cmd32, but in the log file it is cmd34. If the query is still running, the last query for the user and connection is the offending query.