r | 易学教程

How do I do one fuzzy and one exact match in a dataframe?

阅读更多关于 How do I do one fuzzy and one exact match in a dataframe?

问题 I want to be able to fuzzy match one column and exact match another column. Say I df1 looks like this: And df2 looks like this: I want to fuzzy match the "Name" but exact match the "Year." So "Ashley" and "Ashlee" would be a match. This is what I have so far: res <- fuzzy_left_join( df, df2, by=c("Year","Name"), list(`==`, function(x,y) stringdist(tolower(x), tolower(y), method="lv") <= 3) ) res %>% select(Year = Year.x, everything(), - Year.y) It appears to be over-matching, though. Not sure

How do I do one fuzzy and one exact match in a dataframe?

阅读更多关于 How do I do one fuzzy and one exact match in a dataframe?

Place Plotly Bar Chart and Box Plot in Front of Line Traces

阅读更多关于 Place Plotly Bar Chart and Box Plot in Front of Line Traces

问题 I have created subplots in Plotly that each contain a bar chart (or boxplot ) and three trace lines. I have created traces at y= 1,2,3 to act as ablines like in ggplot . What the plots look like: and . Problem: I want to have it so the bars of the bar chart are in front of the trace lines so you should only be able to see the trace lines in between the bars. My code currently: (I have excluded the code that generates the subplots as I don't think it is needed) generate_plotly_barPlot <-

Place Plotly Bar Chart and Box Plot in Front of Line Traces

阅读更多关于 Place Plotly Bar Chart and Box Plot in Front of Line Traces

R: Is there a way to sort messy data where it pivots from long to wide, and as it moves across variables, into one logical key:value column?

阅读更多关于 R: Is there a way to sort messy data where it pivots from long to wide, and as it moves across variables, into one logical key:value column?

问题 I have extremely messy data. A portion of it looks like the following example. x1_01=c("bearing_coordinates", "bearing_coordinates", "bearing_coordinates", "roadkill") x1_02=c(146,122,68,1) x2_01=c("tree_density","animals_on_road","animals_on_road", "tree_density") x2_02=c(13,2,5,11) x3_01=c("animals_on_road", "tree_density", "roadkill", "bearing_coordinates") x3_02=c(3,10,1,1000) x4_01=c("roadkill","roadkill", "tree_density", "animals_on_road") x4_02=c(1,1,12,6) testframe = data.frame(x1_01

How to use specific Java version from a conda environment inside of Jupyter notebook

阅读更多关于 How to use specific Java version from a conda environment inside of Jupyter notebook

问题 My overall aim is to use sparklyr within an R Jupyter notebook on my Azure cloud service of Jupyter lab. I created a new conda environment with R, sparklyr and Java 8 (since this is the version supported by sparklyr) as follows: conda create -n r_spark r=3.6 r-essentials r-irkernel openjdk=8 r-sparklyr source activate r_spark R > IRkernel::installspec(user=TRUE, name="rspark", displayname="R (Spark)") When I run R within a terminal session within this environment, everything works fine: R >

Ignoring NA values in function

阅读更多关于 Ignoring NA values in function

问题 Im writing my own function to calculate the mean of a column in a data set and then applying it using apply() but it only returns the first columns mean. Below is my code mymean <- function(cleaned_us){ column_total = sum(cleaned_us) column_length = length(cleaned_us) return (column_total/column_length) } Average_2 <- apply(numeric_clean_usnews,2,mymean,na.rm=T) 回答1: We need to use the na.rm=TRUE in the sum and using it in apply is not going to work as mymean doesn't have that argument mymean

Function to calculate Euclidean distance in R

阅读更多关于 Function to calculate Euclidean distance in R

问题 I am trying to implement KNN classifier in R from scratch on iris data set and as a part of this i have written a function to calculate the Euclidean distance. Here is my code. known_data <- iris[1:15,c("Sepal.Length", "Petal.Length", "Class")] unknown_data <- iris[16,c("Sepal.Length", "Petal.Length")] # euclidean distance euclidean_dist <- function(k,unk) { distance <- 0 for(i in 1:nrow(k)) distance[i] <- sqrt((k[,1][i] - unk[,1][i])^2 + (k[,2][i] - unk[,2][i])^2) return(distance) }

Use R to extract time series from netcdf data

阅读更多关于 Use R to extract time series from netcdf data

问题 A newbie question related to R. How do I extract time series data for a particular location using R from an netdcf file. So for example, the following snapshot shows that the time series for location (1,2) is 13,28,43. Thanks in advance. 回答1: This might do it, where "my.variable" is the name of the variable you are interested in: library(survival) library(RNetCDF) library(ncdf) library(date) setwd("c:/users/mmiller21/Netcdf") my.data <- open.nc("my.netCDF.file.nc"); my.time <- var.get.nc(my

Parsing of PMCID table row to column form

阅读更多关于 Parsing of PMCID table row to column form

问题 dput(t1) structure(list(PMCID = c("PMC7809753", "PMC7809753", "PMC7809753", "PMC7809753", "PMC7809753", "PMC7790830", "PMC7790830", "PMC7790830", "PMC7790830", "PMC7790830"), table = c("Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1", "Table 1"), row = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), text = c("Drug=Cytarabine (Ara-C); Target=DNA polymerases; Influx=ENT1, CNT3, OCTN1; Metabolisma=Activation: dCK, dCMPK, NDK. Inactivation: CDA, dCMPD,