pandas | 易学教程

WRITE only first N rows from pandas df to csv

阅读更多关于 WRITE only first N rows from pandas df to csv

问题 How can I write only first N rows or from P to Q rows to csv from pandas dataframe without subseting the df first? I cannot subset the data I want to export because of memory issues. I am thinking of a function which writes to csv row by row. Thank you 回答1: Use head- Return the first n rows. Ex. import pandas as pd import numpy as np date = pd.date_range('20190101',periods=6) df = pd.DataFrame(np.random.randn(6,4), index=date, columns=list('ABCD')) #wtire only top two rows into csv file print

WRITE only first N rows from pandas df to csv

阅读更多关于 WRITE only first N rows from pandas df to csv

Python pandas dataframe fill NaN with other Series

阅读更多关于 Python pandas dataframe fill NaN with other Series

问题 I want to fill NaN values in a DataFrame (df) column (var4) based on a control table (fillna_mean) using column mean, and var1 as index.In the dataframe I want them to match on var1. I have tried doing this with fillna but I dont get it to work all the way. How do I do this in a smart way, using df.var1 as index matching fillna_mean.var1? df: df = pd.DataFrame({'var1' : list('a' * 3) + list('b' * 2) + list('c' * 4) + list('d' * 3) ,'var2' : [i for i in range(12)] ,'var3' : list(np.random

Numpy Where Changing Timestamps/Datetime to Integers

阅读更多关于 Numpy Where Changing Timestamps/Datetime to Integers

问题 Not so much a question but something puzzling me. I have a column of dates that looks something like this: 0 NaT 1 1996-04-01 2 2000-03-01 3 NaT 4 NaT 5 NaT 6 NaT 7 NaT 8 NaT I'd like to convert it the NaTs to a static value. (Assume I imported pandas as pd and numpy as np). If I do: mydata['mynewdate'] = mydata.mydate.replace( np.NaN, pd.datetime(1994,6,30,0,0)) All is well, I get: 0 1994-06-30 1 1996-04-01 2 2000-03-01 3 1994-06-30 4 1994-06-30 5 1994-06-30 6 1994-06-30 7 1994-06-30 8 1994

Numpy Where Changing Timestamps/Datetime to Integers

阅读更多关于 Numpy Where Changing Timestamps/Datetime to Integers

get_dummies(), Exception: Data must be 1-dimensional

阅读更多关于 get_dummies(), Exception: Data must be 1-dimensional

问题 I have this data I am trying to apply this: one_hot = pd.get_dummies(df) But I get this error: Here is my code up until then: # Import modules import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn import tree df = pd.read_csv('AllMSAData.csv') df.head() corr_matrix = df.corr() corr_matrix df.describe() # Get featurs and targets labels = np.array(df['CurAV']) # Remove the labels from the features # axis 1 refers to the columns df = df.drop('CurAV', axis = 1) #

Pandas Split column into multiple columns by multiple string delimiters

阅读更多关于 Pandas Split column into multiple columns by multiple string delimiters

问题 I have a dataframe: id info 1 Name: John Age: 12 Sex: Male 2 Name: Sara Age: 22 Sex: Female 3 Name: Mac Donald Age: 32 Sex: Male I'm looking to split the info column into 3 columns such that i get the final output as: id Name Age Sex 1 John 12 Male 2 Sara 22 Female 3 Mac Donald 32 Male I tried using pandas split function. df[['Name','Age','Sex']] = df.info.split(['Name']) I might have to do this multiple times to get desired output. Is there a better way to achieve this? PS: The info column

AWS Lambda, Python, Numpy and others as Layers

阅读更多关于 AWS Lambda, Python, Numpy and others as Layers

问题 I have been going at this for a while trying to get python, numpy and pytz added to AWS Lambda as Layers rather than having to zip and throw it at AWS with my .py file. I was able to follow multiple tutorials and all of them failed. I have resorted to following this guide if I am to go with pandas, numpy or pytz for any functionality (AWS Lambda with Pandas and NumPy - Ruslan Korniichuk - Medium). So this is good but I do not want to have to recreate a zip each time if things change with my

How to extract dollar amount from pandas DataFrame column

阅读更多关于 How to extract dollar amount from pandas DataFrame column

问题 I would to get dollar amounts from more than hundreds rows in a column, and then save the amount in a new column. The dollar amount varies in each row, like $100.01, $1,000.05, 10,000, 100,000 etc. One of the lines looks like this: Approving the settlement claim of Mr. X Y by payment in the amount of $120,000.65 I tried to do something like this, but it's not extracting the dollar amount: df['amount'] = df['description'].str.extract('/(\$[0-9]+(\.[0-9]{2})?)/', expand=True) Please help. 回答1:

pandas - apply function to current row against all other rows

阅读更多关于 pandas - apply function to current row against all other rows

问题 I am utilizing pandas to create a dataframe that appears as follows: ratings = pandas.DataFrame({ 'article_a':[1,1,0,0], 'article_b':[1,0,0,0], 'article_c':[1,0,0,0], 'article_d':[0,0,0,1], 'article_e':[0,0,0,1] },index=['Alice','Bob','Carol','Dave']) I would like to compute another matrix from this input one that will compare each row against all other rows. Let's assume for example the computation was a function to find the length of the intersection set, I'd like an output DataFrame with