How to create a Bar with ggplot with probability with 2 variables and 3 sub variables

喜你入骨 提交于 2021-01-29 13:42:02

问题


Desperate for help with this.

Raw Data comes from https://www.hockey-reference.com/play-index/tiny.fcgi?id=mmDlH

Looks Like this: csv file

# A tibble: 6 x 19
  match_no Date  Tm    Opp   Outcome Time      G    PP    SH     S   PIM    GA  PPGA  SHGA
     <dbl> <chr> <chr> <chr> <chr>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1        1 6/4/… NYI   WSH   W       REG       3     0     0    24     4     0     0     0
2        2 6/4/… WSH   NYI   L       REG       0     0     0    29     2     3     0     0
3        3 6/4/… STL   VAN   W       SO        3     1     0    36     6     2     2     0
4        4 6/4/… VAN   STL   L       SO        2     2     0    25     6     3     1     0
5        5 6/4/… COL   SJS   L       REG       2     0     0    30     4     5     0     0
6        6 6/4/… SJS   COL   W       REG       5     0     0    30     4     2     0     0
# … with 5 more variables: PPO <dbl>, PPOA <dbl>, SA <dbl>, OppPIM <dbl>, DIFF <dbl>

and I can convert to this

A tibble: 6 x 5
# Groups:   Tm [1]
  Tm    Outcome Time      n  prob
  <chr> <chr>   <chr> <int> <dbl>
1 ANA   L       OT        7  0.09
2 ANA   L       REG      37  0.45
3 ANA   L       SO        3  0.04
4 ANA   W       OT        5  0.06
5 ANA   W       REG      27  0.33
6 ANA   W       SO        3  0.04

I used this

team_outcomes_regulation <-
df %>%
+ count(Tm,Outcome, Time) %>%
+ group_by(Tm) %>%
+ mutate(prob = round(prop.table(n), 2))

Then I try to ggplot with

team_outcomes_regulation %>%
ggplot(aes(x = Tm, y = prob, fill = Time)) 
+ geom_bar(position = "fill",stat = "identity")
+ theme(axis.text.x = element_text(angle = 90))

And this is what I get,but I am desperate to get the graph split with the 6 total (Wins by SO, Reg & OT, Losses by SO, Reg & OT)]3

I now want to try and Compare Wins to Goal Difference using the original df.

 # A tibble: 6 x 19
      match_no Date  Tm    Opp   Outcome Time      G    PP    SH     S   PIM    GA  PPGA  SHGA
         <dbl> <chr> <chr> <chr> <chr>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1        1 6/4/… NYI   WSH   W       REG       3     0     0    24     4     0     0     0
    2        2 6/4/… WSH   NYI   L       REG       0     0     0    29     2     3     0     0
    3        3 6/4/… STL   VAN   W       SO        3     1     0    36     6     2     2     0
    4        4 6/4/… VAN   STL   L       SO        2     2     0    25     6     3     1     0
    5        5 6/4/… COL   SJS   L       REG       2     0     0    30     4     5     0     0
    6        6 6/4/… SJS   COL   W       REG       5     0     0    30     4     2     0     0
    # … with 5 more variables: PPO <dbl>, PPOA <dbl>, SA <dbl>, OppPIM <dbl>, DIFF <dbl>

So I Now want to Extract: the 31 Teams (Tm), Number of Wins (Outcome) and Goal Difference (sum of DIFF), some further assistance please?


回答1:


You are nearly there, as you've already produced a plot split among those values listed in your "Time" column. If you want to plot all your permutations of both "Time" AND "Outcome" columns, that means you need to combine those values into one column and plot the same thing. There are a few options here, but perhaps the easiest would be as follows:

team_outcomes_regulation$outcome_time <-
    paste(team_outcomes_regulation$Outcome, "by", team_outcomes_regulation$Time)

Then your plot becomes:

team_outcomes_regulation %>%
    ggplot(aes(x = Tm, y = prob, fill = outcome_time)) +
    geom_bar(position = "fill",stat = "identity") +
    theme(axis.text.x = element_text(angle = 90))

EDIT: Side Question

So I Now want to Extract: the 31 Teams (Tm), Number of Wins (Outcome) and Goal Difference (sum of DIFF), some further assistance please?

For this, I'm creating a dummy dataset similar to your own that should help you visualize one approach you could take. There's a few ways of doing this though--what I have here is "sort of clunky" IMHO.

# dummy data
df <- data.frame(
    Tm <- sample(LETTERS[1:5], 30, replace = TRUE),
    Outcome <- sample(c('W','L'), 30, replace = TRUE),
    Diff <- sample(1:3, 30, replace=TRUE),
    Time <- sample(c('REG', 'SO'), 30, replace=TRUE)
)

This gives you 5 teams ("A" through "E") with random outcomes, goal differences, and I also added an "extra" column to show you that this also removes columns that are not needed. The approach here is to remove the losses and then summarize the remaining data, grouped by team. CAUTION: this means that the sum of Diff is based only on wins and not on losses. If you want to include losses, there's a few other ways of doing this.

df %>%
    group_by(Tm, Outcome) %>%
    summarize(Wins=n(), Goal.Diff=sum(Diff)) %>%
    dplyr::filter(Outcome=='W')

# A tibble: 4 x 4
# Groups:   Tm [4]
  Tm    Outcome  Wins Goal.Diff
  <fct> <fct>   <int>     <int>
1 A     W           5        10
2 B     W           3         7
3 C     W           4         9
4 D     W           1         2

That's one way to do it - if you have further questions related to that, I would suggest you ask a new question on SO. You can link it to this one if you wish, but it's a separate question, so should be asked separately.



来源:https://stackoverflow.com/questions/61317271/how-to-create-a-bar-with-ggplot-with-probability-with-2-variables-and-3-sub-vari

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!