dataframe

Comparing Boolean Values of Pandas Dataframes- Returning String

♀尐吖头ヾ 提交于 2021-01-28 04:49:02
问题 I have 4 dataframes I'm going to be comparing, that each look like ID Jan Feb Mar 1 True True False 2 True True True 3 False False False anywhere from 2 to 3000 rows. They will have the exact same column names but may not always share all the same index IDs. What I would like to do is compare them and generate a new dataframe based on their values. For any cell that was False in at least one dataframe, I want to assign it a string (e.g. "False in Dataframe1") and if multiple, append both (e.g

Pandas conditional creation of a dataframe column: based on multiple conditions max

不想你离开。 提交于 2021-01-28 04:39:39
问题 I have a df: dog1 dog2 cat1 cat2 ant1 ant2 0 1 2 3 4 5 6 1 1 2 3 4 0 0 2 3 3 3 3 3 3 3 4 3 2 1 1 0 I want to add a new column based on the following conditions: if max(dog1, dog2) > max(cat1, cat2) > max(ant1, ant2) -----> 2 elif max(dog1, dog2) > max(cat1, cat2) -----> 1 elif max(dog1, dog2) < max(cat1, cat2) < max(ant1, ant2) -----> -2 elif max(dog1, dog2) < max(cat1, cat2) -----> -1 else -----> 0 So it should become this: dog1 dog2 cat1 cat2 ant1 ant2 new 0 1 2 3 4 5 6 -2 1 1 2 3 4 0 0 -1

How to convert a list with different length element to a dataframe

独自空忆成欢 提交于 2021-01-28 04:16:01
问题 Here I meet so often this kind of problem when I have a loop. The first one is solved. [1] I have a list like this: myList <- list(a = c(1, 2, 3), b = c(4, 5, 6, 7), c= c(9,10)) now I want to convert the list to a data.frame like this: Value a 1, 2, 3 b 4, 5, 6, 7 c 9, 10 Does anyone show me a general function by basic R? [2]New problem arising: mynewList <- list(a = c(1, 2, 3, "f"), b = c(4, 5, 6), c= c(9,10), d=list(1,2)) I want to convert the mynewlist to a dataframe like this: a b c d 1 1

Rolling average with window size an interval of column values

生来就可爱ヽ(ⅴ<●) 提交于 2021-01-28 04:13:44
问题 I'm trying to calculate a rolling average on some incomplete data. I want to average values in column 2 across windows of size 1.0 of the value in column 1 (miles). I've tried .rolling(), but (from my limited understanding) this only creates windows based on the index, and not on column values. import pandas as pd import numpy as np df = pd.DataFrame([ [4.5, 10], [4.6, 11], [4.8, 9], [5.5, 6], [5.6, 6], [8.1, 10], [8.2, 13] ]) averages = [] for index in range(len(df)): nearby = df.loc[np.abs

Python: How to add a secondary x axis for a single trace?

不问归期 提交于 2021-01-28 04:10:30
问题 I have a DataFrame (see 'Test Data' section below) and I would like to add a secondary x axis (at the top). But this axis has to be from 0 to 38.24(ms). This is the sum of all values in column 'Time'. It expresses the total time that the 4 inferences took to execute. So far I have tried 'twinx()' without success. How can I do that? Is it possible or am I lacking information? Test Data: raw_data = {'Time': [21.9235, 4.17876, 4.02168, 3.81504, 4.2972], 'TPU': [33.3, 33.3, 33.3, 33.3, 33.3],

ggplot using facet_wrap of multiple data.frame in R?

 ̄綄美尐妖づ 提交于 2021-01-28 04:01:03
问题 I am trying to ggplot D2 on the same figure as of D1 . I, however, do not have data for the Variable X in D2 data.frame . How i can plot D2 on its respective facets of D1 plot ? these plots represent data for 2011 and 2014 so i would like to have legends for the line to differentiate which line represent which year data. library(tidyverse) set.seed(1500) D1 <- data.frame(Day = 1:8, A = runif(8, 2,16), S = runif(8, 3,14), X = runif(8, 5,10), Z = runif(8, 1,12), Year = rep("2011",8)) D2 <- data

Split a text(with names and values) column into multiple columns in Pandas DataFrame

余生颓废 提交于 2021-01-28 03:47:59
问题 I have problem with speed of my algorithm, is too slow. I have a big dataframe and wanna create columns depends on the name and value in other. I am looking for a solution maybe in Pandas. Before running I don't know the size of the future columns. Here is a simple schema. "column"<==>"value"<br>"column"<==> "value"<br>... my data frame id | params | ---|----------------- 0 |currency<=>PLN<br>price<=>72.14<br>city<==>Berlin ---|----------------- 1 |price<=>90<br>area<=>72.14<br>city<==>San

Python - Pandas, group by time intervals

安稳与你 提交于 2021-01-28 03:22:47
问题 Having the following DF: group_id timestamp A 2020-09-29 06:00:00 UTC A 2020-09-29 08:00:00 UTC A 2020-09-30 09:00:00 UTC B 2020-09-01 04:00:00 UTC B 2020-09-01 06:00:00 UTC I would like to count the deltas between records using all groups, not counting deltas between groups. Result for the above example: delta count 2 2 25 1 Explanation: In group A the deltas are 06:00:00 -> 08:00:00 (2 hours) 08:00:00 -> 09:00:00 on the next day (25 hours) And in group B: 04:00:00 -> 06:00:00 (2 hours) How

conditionally duplicating rows in a data frame

…衆ロ難τιáo~ 提交于 2021-01-28 03:12:23
问题 This is a sample of my data set: day city count 1 1 A 50 2 2 A 100 3 2 B 110 4 2 C 90 Here is the code for reproducing it: df <- data.frame( day = c(1,2,2,2), city = c("A","A","B","C"), count = c(50,100,110,90) ) As you could see, the count data is missing for city B and C on the day 1. What I want to do is to use city A's count as an estimate for the other two cities. So the desired output would be: day city count 1 1 A 50 2 1 B 50 3 1 C 50 4 2 A 100 5 2 B 110 6 2 C 90 I could come up with a

Split a text(with names and values) column into multiple columns in Pandas DataFrame

爷,独闯天下 提交于 2021-01-28 02:51:09
问题 I have problem with speed of my algorithm, is too slow. I have a big dataframe and wanna create columns depends on the name and value in other. I am looking for a solution maybe in Pandas. Before running I don't know the size of the future columns. Here is a simple schema. "column"<==>"value"<br>"column"<==> "value"<br>... my data frame id | params | ---|----------------- 0 |currency<=>PLN<br>price<=>72.14<br>city<==>Berlin ---|----------------- 1 |price<=>90<br>area<=>72.14<br>city<==>San