r

extracting a dataframe from a list over many objects

旧城冷巷雨未停 提交于 2021-02-19 05:59:29
问题 I have over a 1000 objects ( z ) in R, each containing three dataframes ( df1 , df2 , df3 ) with different structures. z1$df1 … z1000$df1 z1$df2 … z1000$df2 z1$df3 … z1000$df3 I created a list of these objects (list1 thus contains z1 thru z1000) and tried to use lapply to extract one type of dataframe ( df2 ) for all objects, and then merge them to one single dataframe. Extraction: For a single object it would look like this: df15<- z15$df2 # I transferred the index of z to the extracted df I

Using R for webscraping: HTTP error 503 despite using long pauses in program

随声附和 提交于 2021-02-19 05:59:25
问题 I'm trying to search the ProQuest Archiver using R. I'm interested in finding the number of articles for a newspaper containing a certain keyword. It generally works well using the rvest tool. However, the program sometimes breaks down. See this minimal example: library(xml2) library(rvest) # Retrieve the title of the first search hit on the page of search results for (p in seq(0, 150, 10)) { searchURL <- paste("http://pqasb.pqarchiver.com/djreprints/results.html?st=advanced&QryTxt=bankruptcy

Generate a confusion matrix for svm in e1071 for CV results

浪尽此生 提交于 2021-02-19 05:59:05
问题 I did a classification with svm using e1071 . The goal is to predict type through all other variables in dtm . dtm[140:145] %>% str() 'data.frame': 385 obs. of 6 variables: $ think : num 0 0 0 0 0 0 0 0 0 0 ... $ actually: num 0 0 0 0 0 0 0 0 0 0 ... $ comes : num 0 0 0 0 0 0 0 0 0 0 ... $ able : num 0 0 0 0 0 0 0 0 0 0 ... $ hours : num 0 0 0 0 0 0 0 0 0 0 ... $ type : Factor w/ 4 levels "-1","0","1","9": 4 3 3 3 4 1 4 4 4 3 ... To train/test the model, I used the 10-fold-cross-validation.

How to mutate_at multiple columns on a condition on each value?

送分小仙女□ 提交于 2021-02-19 05:55:28
问题 I have a dataframe of over 1 million rows, and a column for each hour in the day. I want to mutate each value in those columns, but that modifition depends of the sign of the value. How can I efficiently do that ? I could do a gather on those hourly values (then spread), but gather seems to be pretty slow on big dataframes. I could also just do the same mutate on all 24 columns, but it does not seems like a great solution when mutate_at looks to be able to do exactly that. I'll probably have

knn- same k, different result

血红的双手。 提交于 2021-02-19 05:55:13
问题 I have a matriz ZZ. After I ran prcomp and chose the first 5 PCs I get data_new : P= prcomp(zz) data_new = P$x[,1:5] then I split into training set and test set pca_train = data_new[1:121,] pca_test = data_new[122:151,] and use KNN: k <- knn(pca_train, pca_test, tempGenre_train[,1], k = 5) a <- data.frame(k) res <- length(which(a!=tempGenre_test)) Each time I run these 3 last rows, I get a different value in res . Why? Is there a better way to check what is the test error? 回答1: From the

Colorbar guide with string labels

余生颓废 提交于 2021-02-19 05:52:15
问题 I would like to modify the guide_colorbar of a continuous aesthetic with character descriptions, e.g., 'low' and 'high', instead of actual numbers. This can be especially useful when using one legend or colorbar for multiple plots (heatmaps, e.g, geom_bin2d ). Here an example. Say given dummy data: dd <- data.frame(xx=rnorm(100),yy=rnorm(100),zz=rep(1:10,10)) I can do the usual ggplot(dd,aes(xx,yy,color=zz))+ geom_point()+ viridis::scale_color_viridis(option='A',direction=-1) and hide

using `rlang` quasiquotation with `dplyr::_join` functions

梦想与她 提交于 2021-02-19 05:45:26
问题 I am trying to write a custom function where I use rlang 's quasiquotation. This function also internally uses dplyr 's join functions. I have provided below a minimal working example that illustrated my problem. # needed libraries library(tidyverse) # function definition df_combiner <- function(data, x, group.by) { # check how many variables were entered for this grouping variable group.by <- as.list(rlang::quo_squash(rlang::enquo(group.by))) # based on number of arguments, select `group.by`

How do I remove rows based on a range of dates given by values in 2 columns?

匆匆过客 提交于 2021-02-19 05:39:26
问题 I have a data set that includes a range of dates and need to fill in the missing dates in new rows. df1 is an example of the data I am working with and df2 is an example of what I've managed to achieve (where I'm stuck). df3 is where I would like to end up! df1 ID Date DateStart DateEnd 1 2/11/2021 2/11/2021 2/17/2021 1 2/19/2021 2/19/2021 2/21/2021 2 1/15/2021 1/15/2021 1/20/2021 2 1/22/2021 1/22/2021 1/23/2021 This is where I am with this. The NAs aren't an issue because I intend to drop

R code to rename header of an xts object using name(object) <- vector

旧巷老猫 提交于 2021-02-19 05:38:08
问题 I'm new to learning R and I'm having an issue with some of my R code. I placed all the code for your convenience so that you can see the logic in what I am trying to do. My issue is renaming the header of my xts object Monthly_Quotes. I understand that when having an invalid stock symbol, the getsymbols function will not retrieve the quotes for "zzzz", which is why I am running into the issue of renaming the header. I'd like to resolve this such that if I have a much larger list of ticker

Remove Gaps Between Bars in Plotly

谁说胖子不能爱 提交于 2021-02-19 05:30:50
问题 I am trying to create a Marimekko chart in R using Plotly. Essentially, this is just a stacked, variable-width bar chart with both bars directly adjacent to one another. Currently, my attempt looks like this: The code to create it is here: bar.test <- plot_ly(type = "bar") %>% layout(title = paste0("Newark Charter vs District BTO Makeup"), xaxis = list(title = ""), yaxis = list(title = "Percent BTO", tickformat = "%")) %>% add_trace(x = ~test1$sch.type, y = ~test1$y, width = ~test1$width,