可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have one column which contains the group ID of each participant. There are three groups so every number in this column is 1, 2 or 3.
Then I have a second column which contains response scores for each participant. I want to calculate the mean/median response score within each group.
I have managed to do this by looping through every row but I sense this is a slow and suboptimal solution. Could someone please suggest a better way of doing things?
回答1:
Use logic conditions, for example say your data is in matrix m
as follows: the first col is ID
the second col is the response scores,
mean(m(m(:,1)==1,2)) median(m(m(:,1)==1,2))
will give you the mean and median for 1
in the response score, etc
回答2:
This is a good place to use accumarray
(documentation and blog post):
result = accumarray(groupIDs, data, [], @median);
You can of course give a row or column of a matrix instead of a variable called groupIDs
and another for data
. If you'd prefer the mean instead of the median, use @mean
as the 4th arg.
Note: the documentation notes that you should sort the input parameters if you need to rely on the order of the output. I'll leave that exercise for another day though.
回答3:
grpstats
is a good function to be used ( documentation here )
This is a list of embedded statistics:
- 'mean' Mean
- 'sem' Standard error of the mean
- 'numel' Count, or number, of non-NaN elements
- 'gname' Group name
- 'std' Standard deviation
- 'var' Variance
- 'min' Minimum
- 'max' Maximum
- 'range' Range
- 'meanci' 95% confidence interval for the mean
- 'predci' 95% prediction interval for a new observation
and it accepts as well function handles ( Ex: @mean
, @skeweness
)
>> groups = [1 1 1 2 2 2 3 3 3]'; >> data = [0 0 1 0 1 1 1 1 1]'; >> grpstats(data, groups, {'mean'}) ans = 0.3333 0.6667 1.0000 >> [mea, med] = grpstats(data, groups, {'mean', @median}) mea = 0.3333 0.6667 1.0000 med = 0 1 1