data-manipulation | 易学教程

Dropping Multiple Columns from a data frame using Python

阅读更多关于 Dropping Multiple Columns from a data frame using Python

问题 I know how to drop columns from a data frame using Python. But for my problem the data set is vast, the columns I want to drop are grouped together or are basically singularly spread out across the column heading axis. Is there a shorter way to slice or drop all the columns with fewer lines of code rather than to write it out like how I have done. The way I have done it here works but I would like a more summarized way. The flight_data_copy_final is the variable in which it should be stored.

perl, removing elements from array in for loop

阅读更多关于 perl, removing elements from array in for loop

will the following code always work in perl ? for loop iterating over @array { # do something if ($condition) { remove current element from @array } } Because I know in Java this results in some Exceptions, The above code is working for me for now, but I want to be sure that it will work for all cases in perl. Thanks raina77ow Well, it's said in the doc : If any part of LIST is an array, foreach will get very confused if you add or remove elements within the loop body, for example with splice. So don't do that. It's a bit better with each : If you add or delete a hash's elements while

data.table or dplyr - data manipulation

阅读更多关于 data.table or dplyr - data manipulation

问题 I have the following data Date Col1 Col2 2014-01-01 123 12 2014-01-01 123 21 2014-01-01 124 32 2014-01-01 125 32 2014-01-02 123 34 2014-01-02 126 24 2014-01-02 127 23 2014-01-03 521 21 2014-01-03 123 13 2014-01-03 126 15 Now, I want to count unique values in Col1 for the each date (that did not repeat in previous date), and add to the previous count. For example, Date Count 2014-01-01 3 i.e. 123,124,125 2014-01-02 5 (2 + above 3) i.e. 126, 127 2014-01-03 6 (1 + above 5) i.e. 521 only 回答1:

r data.frame create new variable

阅读更多关于 r data.frame create new variable

I have a dataframe with around 1.5 million rows and 5 cols. One variable (VARIABLE) is of this type NATIONALITY_YEAR (e.g. SPAIN_1998) and I want to split it in two columns, one containing the Nationality, which is the left side of the name before the underscore, and one containing the Year, right side of the underscore. I have tried with concat.split which should be the easiest way: aa <- concat.split(mydata, "VARIABLE", sep = "_", drop = F) but after 2 hours running it did not produce any output. I am not sure if I should leave it running for a longer period of time or if there is a non time

How to filter (with dplyr) for all values of a group if variable limit is reached?

阅读更多关于 How to filter (with dplyr) for all values of a group if variable limit is reached?

Here's the dummy data: cases <- rep(1:5,times=2) var1 <- as.numeric(c(450,100,250,999,200,500,980,10,700,1000)) var2 <- as.numeric(c(111,222,333,444,424,634,915,12,105,152)) maindata1 <- data.frame(cases,var1,var2) df1 <- maindata1 %>% filter(var1 >950) %>% distinct(cases) %>% select(cases) table1 <- maindata1 %>% filter(cases == 2 | cases == 4 | cases == 5) %>% arrange(cases) > table1 cases var1 var2 1 2 100 222 2 2 980 915 3 4 999 444 4 4 700 105 5 5 200 424 6 5 1000 152 I'm trying to formulate a dataframe which contains all the data related to cases where var1 >950 so it would show every

Check python string format?

阅读更多关于 Check python string format?

问题 I have a bunch of strings but I only want to keep the ones with this format: x/x/xxxx xx:xx What is the easiest way to check if a string meets this format? (Assuming I want to check by if it has 2 /'s and a ':' ) 回答1: try with regular expresion: import re r = re.compile('.*/.*/.*:.*') if r.match('x/x/xxxx xx:xx') is not None: print 'matches' you can tweak the expression to match your needs 回答2: If you use regular expressions with match you must also account for the end being too long. Without

Removing elements with Array.map in JavaScript

阅读更多关于 Removing elements with Array.map in JavaScript

问题 I would like to filter an array of items by using the map() function. Here is a code snippet: var filteredItems = items.map(function(item) { if( ...some condition... ) { return item; } }); The problem is that filtered out items still uses space in the array and I would like to completely wipe them out. Any idea? EDIT: Thanks, I forgot about filter() , what I wanted is actually a filter() then a map() . EDIT2: Thanks for pointing that map() and filter() are not implemented in all browsers,

remove row with nan value

阅读更多关于 remove row with nan value

let's say, for example, i have this data: data <- c(1,2,3,4,5,6,NaN,5,9,NaN,23,9) attr(data,"dim") <- c(6,2) data [,1] [,2] [1,] 1 NaN [2,] 2 5 [3,] 3 9 [4,] 4 NaN [5,] 5 23 [6,] 6 9 Now i want to remove the rows with the NaN values in it: row 1 and 4. But i don't know where these rows are, if it's a dataset of 100.000+ rows, so i need to find them with a function and remove the complete row. Can anybody point me in the right direction? The function complete.cases will tell you where the rows are that you need: data <- matrix(c(1,2,3,4,5,6,NaN,5,9,NaN,23,9), ncol=2) data[complete.cases(data),

Sliding time intervals for time series data in R

阅读更多关于 Sliding time intervals for time series data in R

I am trying to extract interesting statistics for an irregular time series data set, but coming up short on finding the right tools for the job. The tools for manipulating regularly sampled time series or index-based series of any time are pretty easily found, though I'm not having much luck with the problems I'm trying to solve. First, a reproducible data set: library(zoo) set.seed(0) nSamples <- 5000 vecDT <- rexp(nSamples, 3) vecTimes <- cumsum(c(0,vecDT)) vecDrift <- c(0, rnorm(nSamples, mean = 1/nSamples, sd = 0.01)) vecVals <- cumsum(vecDrift) vecZ <- zoo(vecVals, order.by = vecTimes) rm

Categorizing variabels in SAS using a range system

阅读更多关于 Categorizing variabels in SAS using a range system

I have the numeric values of salaries of different employee's. I want to break the ranges up into categories. However I do not want a new column rather, I want to just format the existing salary column into this range method: At least $20,000 but less than $100,000 - At least $100,000 and up to $500,000 - >$100,000 Missing - Missing salary Any other value - Invalid salary I've done something similar with gender. I just want to use the proc print and format command to show salary and gender. DATA Work.nonsales2; SET Work.nonsales; RUN; PROC FORMAT; VALUE $Gender 'M'='Male' 'F'='Female' 'O'=