list unique values for each column in a data frame

一笑奈何 提交于 2020-07-30 01:44:47

问题


Suppose you have a very large input file in "csv" format. And you want to know the different values that occur in each column. How would you do that?

ex.

column1    column2    column3    column4
----------------------------------------
value11    value12    value13    value14
value21    value22    value23    value24
...
valueN1    valueN2    valueN3    valueN4

So I want my output to be something like:

column1 has these values: value11, value21, ...valueN1. but I don't need to see reoccurrences of the same value. I need this just to get an idea of what my data is all about.


回答1:


Let dat be your data frame after reading in the csv file, you can do

ulst <- lapply(dat, unique)

If you further want to know the number of unique values for each column, do

k <- lengths(ulst)



回答2:


I find the describe() function from the Hmisc package very handy to get an overview on a dataset, e.g.,

Hmisc::describe(chickwts)
chickwts 

 2  Variables      71  Observations
----------------------------------------------------------------------------------------------------------------
weight 
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90 
      71        0       66        1    261.3    90.26    140.5    153.0    204.5    258.0    323.5    359.0 
     .95 
   385.0 

lowest : 108 124 136 140 141, highest: 380 390 392 404 423
----------------------------------------------------------------------------------------------------------------
feed 
       n  missing distinct 
      71        0        6 

Value         casein horsebean   linseed  meatmeal   soybean sunflower
Frequency         12        10        12        11        14        12
Proportion     0.169     0.141     0.169     0.155     0.197     0.169
----------------------------------------------------------------------------------------------------------------


来源:https://stackoverflow.com/questions/38961144/list-unique-values-for-each-column-in-a-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!