Counting chars in sequences via SQL

女生的网名这么多〃 提交于 2020-01-10 04:52:13

问题


I have a database with a sequence table. Each (amino acid) sequence in this table comprises of 20 different chars (A, V, ...). For instance "MQSHAMQCASQALDLYD...".

I would like to count the number of appearance of each char, so that I get something like "2xM, 3xQ, ...".

Furthermore, I would like to do this over all sequences in my DB, so I get the overall appearance of each char. ("248xM, 71x W,...").

How can I do this in PostgreSQL? At the moment, I am doing it with Ruby, but I have 25,000 sequences with a length of about 400 chars each. This takes a while and I hope it will be faster with SQL.


回答1:


This is How to find all A's in a string:

select length(regexp_replace('AAADDD', '[^A]', '', 'g'));

This is how to find all A's in a table:

select sum(length(regexp_replace(field, '[^A]', '', 'g'))) from table;



回答2:


I usually find defining a reusable function more flexible:

CREATE OR REPLACE FUNCTION numch(text, text) RETURNS integer AS
  $$ 
    SELECT length($2) - length(replace($2, $1, '')) 
  $$ LANGUAGE SQL;

And then you can use it like:

SELECT sum(numch('A', colname)) AS nA, sum(numch('V', colname)) AS nV
  FROM relname;

To get results like:

   na   |   nv   
--------+--------
 359541 | 296330
(1 row)

Adopted from this psql-performance thread.



来源:https://stackoverflow.com/questions/6292511/counting-chars-in-sequences-via-sql

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!