问题
Suppose you have a very large input file in "csv" format. And you want to know the different values that occur in each column. How would you do that?
ex.
column1 column2 column3 column4
----------------------------------------
value11 value12 value13 value14
value21 value22 value23 value24
...
valueN1 valueN2 valueN3 valueN4
So I want my output to be something like:
column1 has these values: value11, value21, ...valueN1
. but I don't need to see reoccurrences of the same value. I need this just to get an idea of what my data is all about.
回答1:
Let dat
be your data frame after reading in the csv
file, you can do
ulst <- lapply(dat, unique)
If you further want to know the number of unique values for each column, do
k <- lengths(ulst)
回答2:
I find the describe()
function from the Hmisc
package very handy to get an overview on a dataset, e.g.,
Hmisc::describe(chickwts)
chickwts 2 Variables 71 Observations ---------------------------------------------------------------------------------------------------------------- weight n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 71 0 66 1 261.3 90.26 140.5 153.0 204.5 258.0 323.5 359.0 .95 385.0 lowest : 108 124 136 140 141, highest: 380 390 392 404 423 ---------------------------------------------------------------------------------------------------------------- feed n missing distinct 71 0 6 Value casein horsebean linseed meatmeal soybean sunflower Frequency 12 10 12 11 14 12 Proportion 0.169 0.141 0.169 0.155 0.197 0.169 ----------------------------------------------------------------------------------------------------------------
来源:https://stackoverflow.com/questions/38961144/list-unique-values-for-each-column-in-a-data-frame