r

How do I do one fuzzy and one exact match in a dataframe?

风格不统一 提交于 2021-02-10 14:28:45
问题 I want to be able to fuzzy match one column and exact match another column. Say I df1 looks like this: And df2 looks like this: I want to fuzzy match the "Name" but exact match the "Year." So "Ashley" and "Ashlee" would be a match. This is what I have so far: res <- fuzzy_left_join( df, df2, by=c("Year","Name"), list(`==`, function(x,y) stringdist(tolower(x), tolower(y), method="lv") <= 3) ) res %>% select(Year = Year.x, everything(), - Year.y) It appears to be over-matching, though. Not sure

How do I do one fuzzy and one exact match in a dataframe?

馋奶兔 提交于 2021-02-10 14:27:37
问题 I want to be able to fuzzy match one column and exact match another column. Say I df1 looks like this: And df2 looks like this: I want to fuzzy match the "Name" but exact match the "Year." So "Ashley" and "Ashlee" would be a match. This is what I have so far: res <- fuzzy_left_join( df, df2, by=c("Year","Name"), list(`==`, function(x,y) stringdist(tolower(x), tolower(y), method="lv") <= 3) ) res %>% select(Year = Year.x, everything(), - Year.y) It appears to be over-matching, though. Not sure

Place Plotly Bar Chart and Box Plot in Front of Line Traces

那年仲夏 提交于 2021-02-10 14:26:36
问题 I have created subplots in Plotly that each contain a bar chart (or boxplot ) and three trace lines. I have created traces at y= 1,2,3 to act as ablines like in ggplot . What the plots look like: and . Problem: I want to have it so the bars of the bar chart are in front of the trace lines so you should only be able to see the trace lines in between the bars. My code currently: (I have excluded the code that generates the subplots as I don't think it is needed) generate_plotly_barPlot <-

Place Plotly Bar Chart and Box Plot in Front of Line Traces

て烟熏妆下的殇ゞ 提交于 2021-02-10 14:25:42
问题 I have created subplots in Plotly that each contain a bar chart (or boxplot ) and three trace lines. I have created traces at y= 1,2,3 to act as ablines like in ggplot . What the plots look like: and . Problem: I want to have it so the bars of the bar chart are in front of the trace lines so you should only be able to see the trace lines in between the bars. My code currently: (I have excluded the code that generates the subplots as I don't think it is needed) generate_plotly_barPlot <-

R: Is there a way to sort messy data where it pivots from long to wide, and as it moves across variables, into one logical key:value column?

微笑、不失礼 提交于 2021-02-10 14:24:06
问题 I have extremely messy data. A portion of it looks like the following example. x1_01=c("bearing_coordinates", "bearing_coordinates", "bearing_coordinates", "roadkill") x1_02=c(146,122,68,1) x2_01=c("tree_density","animals_on_road","animals_on_road", "tree_density") x2_02=c(13,2,5,11) x3_01=c("animals_on_road", "tree_density", "roadkill", "bearing_coordinates") x3_02=c(3,10,1,1000) x4_01=c("roadkill","roadkill", "tree_density", "animals_on_road") x4_02=c(1,1,12,6) testframe = data.frame(x1_01

How to use specific Java version from a conda environment inside of Jupyter notebook

瘦欲@ 提交于 2021-02-10 14:24:00
问题 My overall aim is to use sparklyr within an R Jupyter notebook on my Azure cloud service of Jupyter lab. I created a new conda environment with R, sparklyr and Java 8 (since this is the version supported by sparklyr) as follows: conda create -n r_spark r=3.6 r-essentials r-irkernel openjdk=8 r-sparklyr source activate r_spark R > IRkernel::installspec(user=TRUE, name="rspark", displayname="R (Spark)") When I run R within a terminal session within this environment, everything works fine: R >

Ignoring NA values in function

拟墨画扇 提交于 2021-02-10 14:23:07
问题 Im writing my own function to calculate the mean of a column in a data set and then applying it using apply() but it only returns the first columns mean. Below is my code mymean <- function(cleaned_us){ column_total = sum(cleaned_us) column_length = length(cleaned_us) return (column_total/column_length) } Average_2 <- apply(numeric_clean_usnews,2,mymean,na.rm=T) 回答1: We need to use the na.rm=TRUE in the sum and using it in apply is not going to work as mymean doesn't have that argument mymean

Function to calculate Euclidean distance in R

依然范特西╮ 提交于 2021-02-10 14:19:14
问题 I am trying to implement KNN classifier in R from scratch on iris data set and as a part of this i have written a function to calculate the Euclidean distance. Here is my code. known_data <- iris[1:15,c("Sepal.Length", "Petal.Length", "Class")] unknown_data <- iris[16,c("Sepal.Length", "Petal.Length")] # euclidean distance euclidean_dist <- function(k,unk) { distance <- 0 for(i in 1:nrow(k)) distance[i] <- sqrt((k[,1][i] - unk[,1][i])^2 + (k[,2][i] - unk[,2][i])^2) return(distance) }

Use R to extract time series from netcdf data

邮差的信 提交于 2021-02-10 14:18:14
问题 A newbie question related to R. How do I extract time series data for a particular location using R from an netdcf file. So for example, the following snapshot shows that the time series for location (1,2) is 13,28,43. Thanks in advance. 回答1: This might do it, where "my.variable" is the name of the variable you are interested in: library(survival) library(RNetCDF) library(ncdf) library(date) setwd("c:/users/mmiller21/Netcdf") my.data <- open.nc("my.netCDF.file.nc"); my.time <- var.get.nc(my

Parsing of PMCID table row to column form

一个人想着一个人 提交于 2021-02-10 14:17:12
问题 dput(t1) structure(list(PMCID = c("PMC7809753", "PMC7809753", "PMC7809753", "PMC7809753", "PMC7809753", "PMC7790830", "PMC7790830", "PMC7790830", "PMC7790830", "PMC7790830"), table = c("Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1"), row = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), text = c("Drug=Cytarabine (Ara-C); Target=DNA polymerases; Influx=ENT1, CNT3, OCTN1; Metabolisma=Activation: dCK, dCMPK, NDK. Inactivation: CDA, dCMPD,