问题
I am trying to perform a case-control exact matching by age. My database is composed of 139 eyes of 75 patients divided into 2 group by a dichotomy variable (G6PDcarente= 0/1).
I am trying to perform the matching with the code:
match.it <- matchit(G6PDcarente~age, data = newdata, method="exact",ratio=1,replace=FALSE)
match.it
The problem is that the results are:
Exact Subclasses: 14
Sample sizes:
Control Treated
All 43 85
Matched 31 42
Unmatched 12 43
Why is the sample size of the matched pairs so different? Should not it be the same for the control and treat matched sample (eg:31-31)? How can I obtain an exact match on age with the same sample size in the two group?
I have also tried the code:
match.it <- matchit(G6PDcarente~age, data = newdata, method="nearest",exact="age",ratio=1, replace=FALSE)
But I have the following error message:
Error in Ops.data.frame(exact[itert, k], exact[clabels, k]) :
‘!=’ only defined for equally-sized data frames
Inoltre: Warning message:
In matchit2nearest(c(`1` = 0, `2` = 0, `3` = 0, `4` = 0, `5` = 0, :
Fewer control than treated units and matching without replacement. Not all treated units will receive a match. Treated units will be matched in the order specified by m.order: largest
Can someone help me?
Thanks
Here is the code that reproduces a sample of my data:
newdata <- structure(list(NumeroProgressivo = c(43, 44, 137, 138, 129, 130,
65, 111, 148, 149, 35, 36, 83, 84, 37, 38, 127, 128, 160, 161,
75, 76, 53, 54, 119, 120, 109, 110, 57, 58, 39, 51, 52, 29, 30,
71, 72, 154, 155, 77, 78, 1, 2, 61, 62, 158, 101, 102, 27, 28,
73, 103, 104, 121, 122, 152, 153, 107, 108, 45, 46, 81, 82, 139,
140, 59, 60, 95, 96, 33, 34, 91, 92, 26, 49, 50, 79, 6, 63, 64,
15, 16, 31, 32, 143, 144, 69, 70, 89, 90, 41, 42, 17, 18, 67,
68, 115, 116, 150, 151, 97, 98, 93, 94, 135, 136, 55, 56, 131,
132, 162, 163, 21, 22, 23, 24, 156, 157, 133, 166, 174, 175,
164, 165, 172, 173, 176, 177), IDpaziente = c(22, 22, 67, 67,
63, 63, 33, 56, 73, 73, 18, 18, 42, 42, 19, 19, 62, 62, 79, 79,
38, 38, 27, 27, 60, 60, 55, 55, 29, 29, 20, 26, 26, 15, 15, 36,
36, 76, 76, 39, 39, 1, 1, 31, 31, 78, 51, 51, 14, 14, 37, 52,
52, 61, 61, 75, 75, 54, 54, 23, 23, 41, 41, 68, 68, 30, 30, 48,
48, 17, 17, 46, 46, 13, 25, 25, 40, 3, 32, 32, 8, 8, 16, 16,
70, 70, 35, 35, 45, 45, 21, 21, 9, 9, 34, 34, 58, 58, 74, 74,
49, 49, 47, 47, 66, 66, 28, 28, 64, 64, 80, 80, 11, 11, 12, 12,
77, 77, 65, 82, 86, 86, 81, 81, 85, 85, 87, 87), Occhio = c("OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS"), G6PDcarente = c(0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1,
0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
age = c(70, 70, 38, 38, 54, 54, 41, 74, 31, 31, 27, 27, 36,
36, 36, 36, 49, 49, 34, 34, 49, 49, 34, 34, 33, 33, 34, 34,
38, 38, 62, 30, 30, 38, 38, 53, 53, 27, 27, 57, 57, 84, 84,
25, 25, 26, 57, 57, 47, 47, 29, 31, 31, 26, 26, 23, 23, 34,
34, 48, 48, 34, 34, 34, 34, 40, 40, 45, 45, 33, 33, 61, 61,
73, 32, 32, 67, 80, 39, 39, 67, 67, 37, 37, 28, 28, 26, 26,
32, 32, 24, 24, 61, 61, 36, 36, 66, 66, 26, 26, 35, 35, 39,
39, 32, 32, 39, 39, 39, 39, 42, 42, 35, 35, 64, 64, 34, 34,
37, 61, 80, 80, 74, 74, 62, 62, 71, 71)), row.names = c(NA,
-128L), class = c("tbl_df", "tbl", "data.frame"))
回答1:
The number of observations assigned to the Control / Treatment groups is exactly what they should be, since the assignment is based on the values in the G6PDcarente variable.
From the help file ?matchit
:
(For the first argument in the function,
formula
) This argument takes the usual syntax of R formula,treat ~ x1 + x2
, wheretreat
is a binary treatment indicator andx1
andx2
are the pre-treatment covariates.
In your case, the formula corresponds to G6PDcarente~age
, and the number of observations where G6PDcarente == 1
is different from the number where G6PDcarente == 0
.
We can verify that directly with a manual inspection, since the dataset is not very large:
library(dplyr)
library(tidyr)
new.data.check <- newdata %>%
count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
spread(G6PDcarente, n) %>% # create separate columns for G6PDcarente == 0 / == 1
na.omit() # remove NA rows, where a specific age only has G6PDCarente == 0
# OR G6PDCarente == 1, but not both (i.e. unmatched samples)
> new.data.check
# A tibble: 14 x 3
age `0` `1`
<dbl> <int> <int>
1 26 3 4
2 27 2 2
3 31 2 2
4 32 2 4
5 34 6 8
6 37 1 2
7 38 2 4
8 39 2 6
9 49 2 2
10 61 1 4
11 62 2 1
12 67 2 1
13 74 2 1
14 80 2 1
For age values with both G6PDcarente == 0
and == 1
, there are 31 observations for which G6PDcarente == 0
and 42 observations for which G6PDcarente == 1
:
> colSums(new.data.check)
age 0 1
657 31 42
Not knowing your exact use case, I guess if you really want the same number for treatment vs. control, you can always drop a few observations...
回答2:
Thanks to @Z.Lin reply I have figured out how to resolve my issues.
Here the code I have used following the instruction of this tutorial:
OCTA.Filtered = as.data.frame(na.omit(OCTA.Filtered))
m.out.test = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, ratio = 1)
test_data = match.data(m.out.test)
ps.sd = sd(test_data$distance)
# matching is performed below using propensity scores given the covariates mentioned below
# caliper = 0.25 times sd of propensity scores (optimal)
m.out = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, caliper = 0.25*ps.sd)
# check the sample sizes (below)
m.out
# Final matched data saved as final_data
final_data = match.data(m.out)
# (here distance = propensity score)
new.data.check <- final_data %>%
+ count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
+ spread(G6PDcarente, n) %>% # create separate columns for G6PDcarente == 0 / == 1
+ na.omit()
> new.data.check
# A tibble: 14 x 3
age `0` `1`
<dbl> <int> <int>
1 26 3 3
2 27 2 2
3 31 2 2
4 32 2 2
5 34 6 6
6 37 1 1
7 38 2 2
8 39 2 2
9 49 2 2
10 61 1 1
11 62 1 1
12 67 1 1
13 74 1 1
14 80 1 1
来源:https://stackoverflow.com/questions/51688246/exact-age-matched-match-with-matchit-doesnt-work