Searching and using databases

我怕爱的太早我们不能终老 提交于 2021-02-11 13:55:17

问题


df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv')

In the 1st dataset, there are countries divided into continents.

In the second data set, there is country and population information.

How can I combine population information in data set 2 according to the continental information in data set 1.

thank you. The problem is that in the 1st dataset, countries are written on a continental basis. Countries and their populations in the second dataset. Do I need the population information of the continents? eg europe = 400 million, asia = 2.4 billion


回答1:


Using the dplyr package all you have to do is join by a common variable, in this case country name. Since in one data frame the name is called countryName and in the other one country_name, we just have to specify that they in fact belong to the same variable.

library(dplyr)
library(stringr)

df %>% 
    left_join(df8, by = c("countryName" = "country_name")) %>% 
    mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
    group_by(countryName) %>%
    slice_tail(1) %>% 
    group_by(region) %>% 
    summarize(population = sum(population, na.rm = TRUE))

# A tibble: 5 x 2
  region   population
* <chr>         <dbl>
1 Africa   1304908713
2 Americas 1019607512
3 Asia     4592311527
4 Europe    738083720
5 Oceania    40731992


来源:https://stackoverflow.com/questions/61593730/searching-and-using-databases

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!