count lines with same value in column in python

旧时模样 提交于 2019-12-06 07:12:53

I will suggest pandas, especially in your case of genomic data, the size of the data may be quite large:

In [44]:
#you can read you data by pandas.read_csv()
import pandas as pd
print df
         v0     v1     v2  v3 v4  v5 v6    v7     v8  v9  v10  v11
0  Tag19184  CTAAC  hffef   1  a  36  -  chr1  10006   0  36M   36
1  Tag19184  CTAAC  hffef   1  a  36  -  chr1  10012   0  36M   36
2  Tag19184  CTAAC  hffef   1  a  36  -  chr1  10018   0  36M   36
3  Tag19184  CTAAC  hffef   1  a  36  -  chr1  10024   0  36M   36
4  Tag19184  CTAAC  hffef   1  a  36  -  chr1  10030   0  36M   36
5  Tag19184  CTAAC  hffef   1  a  36  -  chr1  10036   0  36M   36
6  Tag19184  CTAAC  hffef   1  a  36  -  chr1  10042   0  36M   36
7  Tag20198  CTAAC  hffef   1  a  36  -  chr1  10048   0  36M   36
8  Tag20198  CTAAC  hffef   1  a  36  -  chr1  10054   0  36M   36
9  Tag45093  CTAAC  hffef   1  a  36  -  chr1  10060   0  36M   36
In [45]:
#if we want to group by the first 3 fields
df.groupby(['v0','v1','v2']).transform(sum).v3
Out[45]:
0    7
1    7
2    7
3    7
4    7
5    7
6    7
7    2
8    2
9    1
Name: v3, dtype: int64
In [46]:
#all it takes is just one line
df['v3']=df.groupby(['v0','v1','v2']).transform(sum).v3
print df
         v0     v1     v2  v3 v4  v5 v6    v7     v8  v9  v10  v11
0  Tag19184  CTAAC  hffef   7  a  36  -  chr1  10006   0  36M   36
1  Tag19184  CTAAC  hffef   7  a  36  -  chr1  10012   0  36M   36
2  Tag19184  CTAAC  hffef   7  a  36  -  chr1  10018   0  36M   36
3  Tag19184  CTAAC  hffef   7  a  36  -  chr1  10024   0  36M   36
4  Tag19184  CTAAC  hffef   7  a  36  -  chr1  10030   0  36M   36
5  Tag19184  CTAAC  hffef   7  a  36  -  chr1  10036   0  36M   36
6  Tag19184  CTAAC  hffef   7  a  36  -  chr1  10042   0  36M   36
7  Tag20198  CTAAC  hffef   2  a  36  -  chr1  10048   0  36M   36
8  Tag20198  CTAAC  hffef   2  a  36  -  chr1  10054   0  36M   36
9  Tag45093  CTAAC  hffef   1  a  36  -  chr1  10060   0  36M   36
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!