data-analysis

Creating new column based on multiple possible cell possibilities across several columns

人盡茶涼 提交于 2021-02-08 08:25:50
问题 data[, allkneePR := Reduce(`|`, lapply(.SD, `==`, "0082")), .SDcols=PR1:PR3] Hey, I'm trying to look for different diagnoses c("0082", "0083", "0084") across a range of rows and columns in data.table (the dataset is huge). If one of the values is "0082" or "0083" or "0084" in any of the columns PR1:PR3 I want another column that indicates true. Right now this works with the above code, but I am trying to add in multiple diagnoses, not just "0082". I tried the any() function which doesn't work

Creating new column based on multiple possible cell possibilities across several columns

不想你离开。 提交于 2021-02-08 08:25:35
问题 data[, allkneePR := Reduce(`|`, lapply(.SD, `==`, "0082")), .SDcols=PR1:PR3] Hey, I'm trying to look for different diagnoses c("0082", "0083", "0084") across a range of rows and columns in data.table (the dataset is huge). If one of the values is "0082" or "0083" or "0084" in any of the columns PR1:PR3 I want another column that indicates true. Right now this works with the above code, but I am trying to add in multiple diagnoses, not just "0082". I tried the any() function which doesn't work

Counting qualitative values based on the date range in Pandas

夙愿已清 提交于 2021-02-08 06:33:14
问题 I am learning to use Pandas library and need to perform analysis and plot the crime data set below. Each row represents one occurrence of crime. date_rep column contains daily dates for a year. Data needs to be grouped by month and instances of specific crime need to be added up per month, like in the table below. The problem I am running into is that data in crime column is qualitative and I just cant find resources online that can help me solve this! I have been reading up on groupby and

Calculating subtractions of pairs of columns in pandas DataFrame

微笑、不失礼 提交于 2021-02-07 18:21:54
问题 I work with significantly sized (48K rows, up to tens of columns) DataFrames. At a certain point in their manipulation, I need to do pair-wise subtractions of column values and I was wondering if there is a more efficient way to do so rather than the one I'm doing (see below). My current code: # Matrix is the pandas DataFrame containing all the data comparison_df = pandas.DataFrame(index=matrix.index) combinations = itertools.product(group1, group2) for observed, reference in combinations:

Pandas: conditional shift

断了今生、忘了曾经 提交于 2021-02-07 04:59:43
问题 There is a way to shift a dataframe column dependently on the condition on two other columns? something like: df["cumulated_closed_value"] = df.groupby("user").['close_cumsum'].shiftWhile(df['close_time']>df['open_time]) I have figured out a way to do this but it's inefficient: 1)Load data and create the column to shift df=pd.read_csv('data.csv') df.sort_values(['user','close_time'],inplace=True) df['close_cumsum']=df.groupby('user')['value'].cumsum() df.sort_values(['user','open_time']

How to return the fit error in Python curve_fit

百般思念 提交于 2021-01-29 17:31:57
问题 I'm trying to fit function to a data set of an experiment using python. I can get a really good approximation and the fit looks pretty good, but the error given for the parameters is incredibly high and I'm not sure how to fix this. The function looks like this: Function The data consist of the a time data set and a y data set. The variable "ve" is a linear velocity function, that's why in the code it is replaced with "a*x+b". Now the fit looks really good and theoretically the function

Modeling noisy 1/x data in R, getting “essentially perfect fit” from summary - why? [closed]

三世轮回 提交于 2021-01-29 11:14:19
问题 Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 1 year ago . Improve this question Just trying to walk myself through how fitting a reciprocal function to data would go, using the following toy example: # includes library(ggplot2) library(forecast) library(scales) # make data sampledata <- as.data.frame( .1 * seq(1, 20)) names(sampledata) <- c

Keep getting this error using numpy.piecewise to get segmented linear regression

孤街浪徒 提交于 2021-01-28 11:00:28
问题 I have a very large datafile, where x= time and y= distance. I would like to figure out what the speed is in different segments. Ideally, I would like Python to calculate the segments and the corresponding linear regression functions. I googled this and think my best option is using the numpy.piecewise to get segmented linear regression. I only keep getting this error # Remove full_output from kwargs, otherwise we're passing it in twice'. The code is use is as follows: y = cleandata["Distance

Getting error when adding a new row to my existing dataframe in pandas

南笙酒味 提交于 2021-01-27 02:52:11
问题 I have the below data frame. df3=pd.DataFrame(columns=["Devices","months"]) I am getting row value from a loop row, print(data) Devices months 1 Powerbank Feb month When I am adding this data row to my df3 I am getting an error. df3.loc[len(df3)]=data ValueError: cannot set a row with mismatched columns 回答1: use df3 = pd.concat([df3, data], axis=0) or as suggested by @Wen use df3 = df3.append(data) 回答2: From https://pandas.pydata.org/pandas-docs/stable/merging.html: It is worth noting however

Getting error when adding a new row to my existing dataframe in pandas

[亡魂溺海] 提交于 2021-01-27 02:51:32
问题 I have the below data frame. df3=pd.DataFrame(columns=["Devices","months"]) I am getting row value from a loop row, print(data) Devices months 1 Powerbank Feb month When I am adding this data row to my df3 I am getting an error. df3.loc[len(df3)]=data ValueError: cannot set a row with mismatched columns 回答1: use df3 = pd.concat([df3, data], axis=0) or as suggested by @Wen use df3 = df3.append(data) 回答2: From https://pandas.pydata.org/pandas-docs/stable/merging.html: It is worth noting however