frequency

Frequency distribution of a categorical variable in R

夙愿已清 提交于 2019-12-02 03:00:22
I am trying to prepare a frequency distribution table of a categorical variable in my data and I am using below code. But the output looks ok while I view it but not printing ok in report. # These lines are not needed because the data below is already # in that format # STI<-STI_IPD1%>% select(Q18_1,Q54) # STI$Q54<-as.factor(STI$Q54) STI = structure(list(Q18_1 = c(101L, 120L, 29L, 101L, 94L, 16L, 47L, 141L, 154L, 47L, 141L, 154L, 154L, 29L, 58L, 154L, 101L, 154L, 47L, 141L, 75L, 1L, 120L, 16L, 154L, 141L, 141L, 154L, 154L, 154L, 29L, 141L, 38L, 47L, 101L, 16L, 154L, 154L, 101L, 192L, 58L, 154L

Histogram using Excel FREQUENCY function

亡梦爱人 提交于 2019-12-02 02:56:52
问题 In Excel 2010, I have a list of values in column A and a bin size is specified in B1 . This allows me to create histograms with N bins using this formula: {=FREQUENCY(A:A,(ROW(INDIRECT("1:"&CEILING((MAX(A:A)-MIN(A:A))/B1,1)))-1)*B1+MIN(A:A))} The only problem is that I need to select N cells and apply this formula to get N bins to be used as data source for my bar chart. Is it possible to skip this step? E.g. Is it possible to use this formula in a single cell - somewhat modified - so that

Histogram using Excel FREQUENCY function

北战南征 提交于 2019-12-02 02:16:02
In Excel 2010, I have a list of values in column A and a bin size is specified in B1 . This allows me to create histograms with N bins using this formula: {=FREQUENCY(A:A,(ROW(INDIRECT("1:"&CEILING((MAX(A:A)-MIN(A:A))/B1,1)))-1)*B1+MIN(A:A))} The only problem is that I need to select N cells and apply this formula to get N bins to be used as data source for my bar chart. Is it possible to skip this step? E.g. Is it possible to use this formula in a single cell - somewhat modified - so that when used as data source, it is interpreted as N cells, producing a nice histogram with N values? Thanks.

Extracting most frequent words out of a corpus with python

Deadly 提交于 2019-12-02 02:00:06
Maybe this is a stupid question, but I have a problem with extracting the ten most frequent words out of a corpus with Python. This is what I've got so far. (btw, I work with NLTK for reading a corpus with two subcategories with each 10 .txt files) import re import string from nltk.corpus import stopwords stoplist = stopwords.words('dutch') from collections import defaultdict from operator import itemgetter def toptenwords(mycorpus): words = mycorpus.words() no_capitals = set([word.lower() for word in words]) filtered = [word for word in no_capitals if word not in stoplist] no_punct = [s

frequency of letters in column python

痞子三分冷 提交于 2019-12-02 00:42:15
I want to calculate the frequency of occurrence of each letter in all columns: for example I have this three sequences : seq1=AATC seq2=GCCT seq3=ATCA here, we have: in the first column frequency of 'A' is 2 , 'G' is 1 . for the second column : the frequency of 'A' is 1, 'C' is 1 and 'T' is 1. (the same thing in the rest of column) first, I try to do the code of calculating frequency I try this: for example: s='AATC' dic={} for x in s: dic[x]=s.count(x) this gives: {'A':2,'T':1,'C':1} now, I want to apply this on columns.for that I use this instruction: f=list(zip(seq1,seq2,seq3)) gives: [('A'

Appending Frequency Tables - With Missing Values

亡梦爱人 提交于 2019-12-02 00:13:58
The goal is to produce a frequency table of all my selected variables (about reading habits for 4 Newspapers) which in essence have the same possible values: 1= Subscribed 2= Every week 3= Sometimes 4= Never 0= NA (No Answers) The problem arises if one of the variables does not contain one of the possible value. For example, if no one is subscribed to that particular Newspaper. a <- c(1,2,3,4,3,1,2,3,4,3) b <- c(2,2,3,4,3,0,0,3,4,1) d <- c(2,2,3,4,3,0,0,0,0,0) e <- c(3,3,3,3,3,3,3,3,3,3) ta <- table(a) tb <- table(b) td <- table(d) te <- table(e) abde <- cbind(ta,tb,td,te) ta tb td te 0 2 2 5

Mysql count frequency

左心房为你撑大大i 提交于 2019-12-01 19:49:22
I've checked similar questions but it didnt help in my precise question. So, my table goes like this: id age 1 30 2 36 3 30 4 52 5 52 6 30 7 36 etc.. I need to count the frequency of ages: age freq 30 2 36 3 52 2 How can I grab this freq? After this I will need to work with that data, so it might be necessary using array? Thanks! function drawChart() { // Create the data table. var data = new google.visualization.DataTable(); data.addColumn('string', 'age'); data.addColumn('number', 'freq'); <?php while($row = mysql_fetch_row($result)) { $frequencies[$row[0]] = $frequencies[1]; echo "data

Create a two-mode frequency matrix in R

怎甘沉沦 提交于 2019-12-01 18:16:15
I have a data frame, which looks something like this: CASENO Var1 Var2 Resp1 Resp2 1 1 0 1 1 2 0 0 0 0 3 1 1 1 1 4 1 1 0 1 5 1 0 1 0 There are over 400 variables in the dataset. This is just an example. I need to create a simple frequency matrix in R (excluding the case numbers), but the table function doesn't work. Specifically, I'm looking to cross-tabulate a portion of the columns to create a two-mode matrix of frequencies. The table should look like this: Var1 Var2 Resp1 3 1 Resp2 3 2 In Stata, the command is: gen var = 1 if Var1==1 replace var= 2 if Var2==1 gen resp = 1 if Resp1==1

Create a two-mode frequency matrix in R

微笑、不失礼 提交于 2019-12-01 17:16:29
问题 I have a data frame, which looks something like this: CASENO Var1 Var2 Resp1 Resp2 1 1 0 1 1 2 0 0 0 0 3 1 1 1 1 4 1 1 0 1 5 1 0 1 0 There are over 400 variables in the dataset. This is just an example. I need to create a simple frequency matrix in R (excluding the case numbers), but the table function doesn't work. Specifically, I'm looking to cross-tabulate a portion of the columns to create a two-mode matrix of frequencies. The table should look like this: Var1 Var2 Resp1 3 1 Resp2 3 2 In

Alternative to Scipy mode function in Numpy?

孤者浪人 提交于 2019-12-01 16:07:55
问题 Is there another way in numpy to realize scipy.stats.mode function to get the most frequent values in ndarrays along axis?(without importing other modules) i.e. import numpy as np from scipy.stats import mode a = np.array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[40, 40, 42, 43, 44], [45, 46, 47, 48, 49], [50, 51, 52, 53, 54], [55, 56, 57, 58, 59]]]) mode= mode(data,