dataframe | 易学教程

Pandas combine two dataframes based on time difference

阅读更多关于 Pandas combine two dataframes based on time difference

问题 I have two data frames that stores different types of medical information of patients. The common elements of both the data frames are the encounter ID ( hadm_id ), the time the information was recorded ( (n|c)e_charttime ). One data frame ( df_str ) contains structured information such as vital signs and lab test values and values derived from these (such as change statistics over 24 hours). The other data frame ( df_notes ) contains a column with a clinical note recorded at a specified time

Summing rows based on keyword within index

阅读更多关于 Summing rows based on keyword within index

问题 I am trying to sum multiple rows together based on a keyword that is part of the index - but it is not the entire index. For example, the index could look like Count 1234_Banana_Green 43 4321_Banana_Yellow 34 2244_Banana_Brown 23 12345_Apple_Red 45 I would like to sum all of the rows that have the same "keyword" within them and create a total "banana" row. Is there a way to do this without searching for the keyword "banana"? For my purposes, this keyword changes every time and I would like to

Secondary axis in ploty for R and Shiny

阅读更多关于 Secondary axis in ploty for R and Shiny

问题 EDIT: Regarding my question 2, it seems it is a bug and hasn't been fixed yet as it is not their top priority at the moment. Someone asked to try katex instead of latex, but not sure how that works https://github.com/plotly/plotly.js/issues/559 I have attached an output for a code- https://i.stack.imgur.com/u65if.jpg. I am trying to plot two y axis and a common x axis using plotly. The issues I am facing are: I would like the primary and the secondary y axis ticks to share the same gridline.

Subsetting data by levels of granularity and applying a function to each data frame in R

阅读更多关于 Subsetting data by levels of granularity and applying a function to each data frame in R

问题 Okay, this question is a fairly long and complex (at least for me) and I have done my best to make this as clear, organized, and detailed as possible, so please bear with me... ---------------------------------------------------------------------- I currently have an overly manual process in applying a function to subsets in my data, and I would like to figure out how to make the code more efficient. It is easiest to describe the issue with an example: The variables in my data (myData): GDP

Pandas DataFrame to JSON with multiple nested categories

阅读更多关于 Pandas DataFrame to JSON with multiple nested categories

问题 I'm looking for a solution to convert a Pandas DataFrame with 3 subcategories to an JSON output without any lists. This is the data-structure I got: import pandas as pd df = pd.DataFrame({ 'cat1': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'], 'cat2': ['BB', 'BB', 'BC', 'BB', 'BB', 'BB', 'BC', 'BC'], 'cat3': ['CC', 'CC', 'CD', 'CD', 'CD', 'CC', 'CD', 'CE'], 'prod': ['P1', 'P2', 'P3', 'P1', 'P4', 'P1', 'P3','P6'], 'amount': [132, 51, 12, 421, 55, 11, 123, 312] }) And this is the desired JSON output

R get the original index of data frame after subsetting

阅读更多关于 R get the original index of data frame after subsetting

问题 Is it possible to get the original index of a data frame after subsetting? It is being stored somewhere but I am not sure where and how to access it. I understand that there is a better solution if this is part of the algorithm design. I am just curious if anyone knows if it possible. Example Scenario: df = data.frame(atr1=integer(),atr2=integer()) for(i in 1:10) { df <- rbind(df,data.frame(atr1=as.integer(i),atr2=as.integer(i))) } View(df) Note the far left side of output of View function in

Python converting csv files to dataframes

阅读更多关于 Python converting csv files to dataframes

问题 I have a large csv file containing data like: 2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H,.... and so on. (continuous stream without separate rows) I would want to convert it into dataframe, which would look something like Col1 Col2 Col3 2018-09 100 A 2018-10 50 M 2018-11 69 H This is a simplified version of the actual data. Please advice what would be the best way to approach it. Edit: To clarify, my csv file doesn't have separate lines for each row. All the data is on one row. 回答1: One

pandas create separate dataframe for each excel sheet

阅读更多关于 pandas create separate dataframe for each excel sheet

问题 I have an excel file with about 20 sheets. Each sheets contain different type of data I would like to loop the excel and create a data frame for each of these sheets with data frame name equal to the sheet name. Tried different solution without success 回答1: You can create dictionary of DataFrames by parameter sheet_name=None in read_excel: dfs = pd.read_excel(fileName, sheet_name=None) print (dfs['sheetname1']) print (dfs['sheetname2']) 来源： https://stackoverflow.com/questions/52538718/pandas

Wrapping dplyr filter in function results in “Error: Result must have length 4803, not 3”

阅读更多关于 Wrapping dplyr filter in function results in “Error: Result must have length 4803, not 3”

问题 I'm learning R for data analysis and using this Kaggle dataset. Following the movie recommendation script works, but when I try to generalize a dplyr code by making it a function I get an error: I've tried troubleshooting some. It looks like the code stops at the filter and mutate functions. The following works and gives the expected output. genres <- df %>% filter(nchar(genres)>2) %>% mutate( separated = lapply(genres, fromJSON) ) %>% unnest(separated, .name_repair = "unique") %>% select(id,

Adding a pandas.dataframe to another one with it's own name

阅读更多关于 Adding a pandas.dataframe to another one with it's own name

问题 I have data that I want to retrieve from a couple of text files in a folder. For each file in the folder, I create a pandas.DataFrame to store the data. For now it works correctly and all the fils has the same number of rows. Now what I want to do is to add each of these dataframes to a 'master' dataframe containing all of them. I would like to add each of these dataframes to the master dataframe with their file name. I already have the file name. For example, let say I have 2 dataframes with