dataframe | 易学教程

Looping through Columns replicating each column fetched six times

阅读更多关于 Looping through Columns replicating each column fetched six times

问题 I have this data frame where the column names are from v1 to v292. There are 17 observations. I need to iterate over the columns and replicate each column fetched 6 times. For example: v1 v2 v3 v4 1 3 4 6 3 4 3 1 What the output should be x 1 3 1 3 1 3 1 3 1 3 1 3 3 4 3 4 3 4 3 4 3 4 3 4 .. and so on. Please help. Thank you in advance. 回答1: You could use rep data.frame(x = unlist(rep(df, each = 6))) Checking output with each = 2 data.frame(x = unlist(rep(df, each = 2))) # x #1 1 #2 3 #3 1 #4

How to reindex a pandas dataframe within a function?

阅读更多关于 How to reindex a pandas dataframe within a function?

问题 I'm trying to add column headers with empty values to my dataframe (just like this answer), but within a function that is already modifying it, like so: mydf = pd.DataFrame() def myfunc(df): df['newcol1'] = np.nan # this works list_of_newcols = ['newcol2', 'newcol3'] df = df.reindex(columns=df.columns.tolist() + list_of_newcols) # this does not return myfunc(mydf) If I run the lines individually in an IPython console, it will add them. But run as a script, newcol1 will be added but 2 and 3

How to remove rows from a data frame that contain only few words in R?

阅读更多关于 How to remove rows from a data frame that contain only few words in R?

问题 I'm trying to remove rows from my data frame that contain less than 5 words. e.g. mydf <- as.data.frame(read.xlsx("C:\\data.xlsx", 1, header=TRUE) head(mydf) NO ARTICLE 1 34 The New York Times reports a lot of words here. 2 12 Greenwire reports a lot of words. 3 31 Only three words. 4 2 The Financial Times reports a lot of words. 5 9 Greenwire short. 6 13 The New York Times reports a lot of words again. I want to remove rows with 5 or less words. how can i do that? 回答1: Here are two ways:

Wrong Dates in Dataframe and Subplots

阅读更多关于 Wrong Dates in Dataframe and Subplots

问题 I am trying to plot my data in the csv file. Currently my dates are not shown properly in the plot also if i am converting it. How can I change it to show the proper dat format as defined Y-m-d? The second question is that I am currently plotting all the dat in one plot but want to have for every Valuegroup one subplot. My code looks like the following: import pandas as pd import matplotlib.pyplot as plt csv_loader = pd.read_csv('C:/Test.csv', encoding='cp1252', sep=';', index_col=0).dropna()

Join groupby column with a comma in a Pandas DataFrame

阅读更多关于 Join groupby column with a comma in a Pandas DataFrame

问题 I have a dataset like this: >>> df = pd.DataFrame({'id_sin':['s123','s123','s124','s124'], 'raison':['first problem','second problem','album','dog'] }) >>> df id_sin raison 0 s123 first problem 1 s123 second problem 2 s124 album 3 s124 dog This is the expected output: id_sin raison 0 s123 first problem, second problem 1 s124 album, dog What I tried: df['raison'] = df.groupby('id_sin')['raison'].apply(lambda x: ', '.join(x)) But doesn't work... what am I missing? Thanks for help! 回答1: Try

Merge 2 Different Data Frames - Python 3.6

阅读更多关于 Merge 2 Different Data Frames - Python 3.6

问题 Want to merge 2 table and blank should fill with first table rows. DF1: Col1 Col2 Col3 A B C DF2: Col6 Col8 1 2 3 4 5 6 7 8 9 10 I am expecting result as below: Col1 Col2 Col3 Col6 Col8 A B C 1 2 A B C 3 4 A B C 5 6 A B C 7 8 A B C 9 10 回答1: Use assign, but then is necessary change order of columns: df = df2.assign(**df1.iloc[0])[df1.columns.append(df2.columns)] print (df) Col1 Col2 Col3 Col6 Col8 0 A B C 1 2 1 A B C 3 4 2 A B C 5 6 3 A B C 7 8 4 A B C 9 10 Or concat and replace NaN s by

Selecting rows in a dataframe based on the column names of another

阅读更多关于 Selecting rows in a dataframe based on the column names of another

问题 Say I have two dfs df = pd.DataFrame({'A': [1, 2, 3,4,5], 'B': [2, 4,2,4,5], 'C': [1, -1, 3,5,10],'D': [3, -4,3,7,-3]}, columns=['A', 'B', 'C', 'D']) df = df.set_index(['A']) df2 = pd.DataFrame({'A': [1, 2, 3,4,5], 'J': ['B', 'B','C','D','C']}, columns=['A', 'J']) df2 = df2.set_index(['A']) and I would like to use df2 to select the columns of df row by row in order to get the following dataframe sel 1 2 2 4 3 3 4 7 5 10 where the first two values are from the column B of df , the third from

How to prevent pandas dataframe from adding double quotes around #tmp when using sqlalchemy and sybase?

阅读更多关于 How to prevent pandas dataframe from adding double quotes around #tmp when using sqlalchemy and sybase?

问题 I have reduced the issue to pandas to_sql adding double quotes around #tmp when dealing with sybase using sqlalchemy as the pooling framework. Code : def get_data_with_tmp(): engine = get_connection("sybase") with engine.connect() as conn: df = pd.DataFrame({'alias_id': ['345402KP5', '3454014R1']}) df.to_sql(name='#tmp', con=conn, schema=None, if_exists='append', index=False) df = pd.read_sql_query("SELECT alias_id from #tmp", con=conn) Error: statement = '\nCREATE TABLE "#tmp" (\n\talias_id

Python pandas splitting text and numbers in dataframe

阅读更多关于 Python pandas splitting text and numbers in dataframe

问题 I have a dataframe df1 with column name Acc Number as the first column and the data looks like: Acc Number ASC100.1 MJT122 ASC120.4 XTY111 I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is: Text Number ASC 100.1 MJT 122 ASC 100.4 XTY 111 How would I go about doing this? Thanks! 回答1: You could do something like this: import pandas as pd data = ['ASC100.1', 'MJT122', 'ASC120.4', 'XTY111'] df = pd

R: converting fractions into decimals in a data frame

阅读更多关于 R: converting fractions into decimals in a data frame

问题 I am trying to convert a data frame of numbers stored as characters in a fraction form to be stored as numbers in decimal form. (There are also some integers, also stored as char.) I want to keep the current structure of the data frame, i.e. I do not want a list as a result. Example data frame (note: the real data frame has all elements as character, here it is a factor but I couldn't figure out how to replicate a data frame with characters): a <- c("1","1/2","2") b <- c("5/2","3","7/2") c <-