问题
I have a data frame with 20 players from 4 different teams (5 players per team), each assigned a salary from a fantasy draft. I would like to be able to create all combinations of 8 players whose salaries are equal to or less than 10000 & whose total points are greater than x but excluding any combinations that contains 4 or more players from the same team.
Here is what my data frame looks like:
Team Player K D A LH Points Salary PPS
4 ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692
2 ATN Supreme 6.8 5.3 7.1 229.4 21.954 1578 1.3913
1 ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560
3 ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061
5 ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861
11 CL Swiftending 6.0 5.8 7.8 360.5 22.285 1606 1.3876
13 CL Pajkatt 13.3 7.5 9.3 326.8 37.248 1489 2.5015
15 CL SexyBamboe 6.3 8.5 9.3 168.0 20.660 1256 1.6449
14 CL EGM 2.8 6.0 13.5 78.8 21.988 989 2.2233
12 CL Saksa 2.5 6.5 10.5 59.8 15.898 967 1.6441
51 DBEARS Ace 7.0 3.4 6.9 195.6 23.596 1578 1.4953
31 DBEARS HesteJoe 5.4 5.4 6.1 176.7 16.927 1512 1.1195
61 DBEARS Miggel 2.8 6.8 11.0 141.8 17.818 1212 1.4701
21 DBEARS Noia 3.0 6.0 8.0 36.1 13.161 970 1.3568
41 DBEARS Ryze 2.7 4.7 6.7 74.6 12.166 937 1.2984
8 GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935
9 GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305
10 GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071
7 GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251
6 GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930
I am following the general concept described in this post: I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less
tweaking the code to suit my needs. This is what I have so far:
## make a list of all combinations of 8 of Player, Points and Salary
xx <- with(FantasyPlayers, lapply(list(as.character(Player), Points, Salary), combn, 8))
## convert the names to a string,
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(FantasyPlayers)[c(2, 7, 8)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)
Using the above code I am able to generate all possibly lineups of 8 players and then subset that by various criteria (total salary and number of points), but I am struggling when it comes to excluding the lineups where there are more than 3 players from the same team.
I imagine the lineups would need to be excluded from newdf but I don't really know where to begin in doing that.
Here are the dput results:
structure(list(Team = c("ATN", "ATN", "ATN", "ATN", "ATN", "CL",
"CL", "CL", "CL", "CL", "DBEARS", "DBEARS", "DBEARS", "DBEARS",
"DBEARS", "GB", "GB", "GB", "GB", "GB"), Player = structure(c(2L,
5L, 4L, 1L, 3L, 15L, 12L, 14L, 11L, 13L, 16L, 18L, 19L, 20L,
21L, 6L, 7L, 10L, 8L, 9L), .Label = c("eL lisasH 2", "ExoticDeer",
"Nisha", "sasu", "Supreme", "Keyser Soze", "Madara", "MNT", "SKANKS224",
"SkyLark", "EGM", "Pajkatt", "Saksa", "SexyBamboe", "Swiftending",
"Ace", "DruidzOzoneShoc", "HesteJoe", "Miggel", "Noia", "Ryze"
), class = "factor"), K = c(6.1, 6.8, 3.6, 2.6, 2.7, 6, 13.3,
6.3, 2.8, 2.5, 7, 5.4, 2.8, 3, 2.7, 6, 5.4, 1.8, 2.3, 1.4), D = c(3.3,
5.3, 6.4, 6.1, 5.6, 5.8, 7.5, 8.5, 6, 6.5, 3.4, 5.4, 6.8, 6,
4.7, 5, 5.3, 5.3, 5.9, 7.6), A = c(6.4, 7.1, 11, 7.9, 7.5, 7.8,
9.3, 9.3, 13.5, 10.5, 6.9, 6.1, 11, 8, 6.7, 5.6, 6.6, 7, 6.1,
7.4), LH = c(306.9, 229.4, 95.7, 29.7, 48.2, 360.5, 326.8, 168,
78.8, 59.8, 195.6, 176.7, 141.8, 36.1, 74.6, 316, 334.5, 71.8,
85.6, 52.5), Points = c(22.209, 21.954, 19.357, 12.037, 12.282,
22.285, 37.248, 20.66, 21.988, 15.898, 23.596, 16.927, 17.818,
13.161, 12.166, 19.12, 19.405, 10.218, 9.316, 7.565), Salary = c(1622,
1578, 1244, 998, 955, 1606, 1489, 1256, 989, 967, 1578, 1512,
1212, 970, 937, 1602, 1577, 1266, 1007, 954), PPS = c(1.3692,
1.3913, 1.556, 1.2061, 1.2861, 1.3876, 2.5015, 1.6449, 2.2233,
1.6441, 1.4953, 1.1195, 1.4701, 1.3568, 1.2984, 1.1935, 1.2305,
0.8071, 0.9251, 0.793)), .Names = c("Team", "Player", "K", "D",
"A", "LH", "Points", "Salary", "PPS"), class = "data.frame", row.names = c("4",
"2", "1", "3", "5", "11", "13", "15", "14", "12", "51", "31",
"61", "21", "41", "8", "9", "10", "7", "6"))
回答1:
Here's one way:
splt.names <- strsplit(as.character(newdf$Player), ", ")
indices <- lapply(splt.names, function(x) match(x, FantasyPlayers$Player))
exclude <- lapply(indices, function(x) any(table(FantasyPlayers$Team[x]) > 3))
newdf2 <- newdf[!unlist(exclude), ]
First split the Player
column by comma. Then match the player names to the Fantasy Players
player name column. With those indices
, we can do the main work which is any(table(FantasyPlayers$Team[x]) > 3)
. This is the check of team counts that exceed three, which will indicate 3 or more players from the same team.
回答2:
Best to construct this in long form, I think:
Construct teams
library(data.table)
setDT(FantasyPlayers)
xx <- combn(as.character(FantasyPlayers$Player), 8)
mxx <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))
head(mxx,10)
# jersey_no team_no Player
# 1: 1 1 ExoticDeer
# 2: 2 1 Supreme
# 3: 3 1 sasu
# 4: 4 1 eL lisasH 2
# 5: 5 1 Nisha
# 6: 6 1 Swiftending
# 7: 7 1 Pajkatt
# 8: 8 1 SexyBamboe
# 9: 1 2 ExoticDeer
# 10: 2 2 Supreme
Groups of 8 players share a team_no
and are indexed by their jersey_no
. Look at ?melt.array
to see how this works. setDT
just converts the resulting data.frame to a data.table for easier merging.
Merge to recover Player
attributes
FantasyTeams <- FantasyPlayers[mxx, on="Player"]
# Team Player K D A LH Points Salary PPS jersey_no team_no
# 1: ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692 1 1
# 2: ATN Supreme 6.8 5.3 7.1 229.4 21.954 1578 1.3913 2 1
# 3: ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560 3 1
# 4: ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061 4 1
# 5: ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861 5 1
# ---
# 1007756: GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935 4 125970
# 1007757: GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305 5 125970
# 1007758: GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071 6 125970
# 1007759: GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251 7 125970
# 1007760: GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930 8 125970
By default, only the first and last several rows of a data.table are printed. To examine the whole thing, try ?View
or look at the arguments to ?print.data.table
.
Filter to a set of teams with chosen features
To filter to those team_no
having no more than three players from the same Team
...
my_teams <- FantasyTeams[, max(table(Team)) <= 3, by=team_no][V1==TRUE]$team_no
V1
is the default name assigned to the constructed variable max(table(Team)) <= 3
. This is not lightning fast, but now that you have excluded some teams, later subsetting steps should be faster:
my_new_teams <-
FantasyTeams[team_no %in% my_teams, sum(Salary) < 10000, by=team_no][V1==TRUE]$team_no
To save a few key strokes and microseconds, substitute (V1)
for V1==TRUE
. It's the idiomatic way.
Recovering the roster from a set of teams
To get the roster associated with each team, join/merge with mxx
mxx[.(team_no = my_new_teams), on="team_no"]
If you want the players listed on a single line, as in the OP:
mxx[.(team_no = my_new_teams), .(roster = toString(Player)), on="team_no", by=.EACHI]
If you want aggregate statistics for each team, you'll instead need to join with FantasyTeams
:
FantasyTeams[.(team_no = my_new_teams), .(
roster = toString(Player),
tot_salary = sum(Salary),
tot_points = sum(Points)
), on="team_no", by=.EACHI]
# team_no roster tot_salary tot_points
# 1: 3716 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, Ryze 9913 149.018
# 2: 3720 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, MNT 9983 146.168
# 3: 3721 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, SKANKS224 9930 144.417
# 4: 3725 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, MNT 9950 145.173
# 5: 3726 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, SKANKS224 9897 143.422
# ---
# 40202: 125663 EGM, Saksa, Miggel, Noia, Ryze, Keyser Soze, MNT, SKANKS224 8638 117.032
# 40203: 125664 EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, MNT 8925 119.970
# 40204: 125665 EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, SKANKS224 8872 118.219
# 40205: 125666 EGM, Saksa, Miggel, Noia, Ryze, Madara, MNT, SKANKS224 8613 117.317
# 40206: 125667 EGM, Saksa, Miggel, Noia, Ryze, SkyLark, MNT, SKANKS224 8302 108.130
To understand what by=.EACHI
is doing, a little background is needed. The merge syntax here is DT[i, j, on=cols, by=.EACHI]
.
- If
j
andby
are left out, it just does the merge, as in the construction ofFantasyTeams
. - If
by
is left out, butj
is included,j
is computed after the merge. - If
by=.EACHI
, thenj
is computed separately for each value ini
.
来源:https://stackoverflow.com/questions/32855755/i-want-to-generate-8-combinations-of-names-from-a-column-in-an-r-data-frame-base