pandas | 易学教程

pandas extract regex allowing mismatches

阅读更多关于 pandas extract regex allowing mismatches

问题 Pandas has a very fast and nice string method, extract(). This method works perfectly with a regex such as this one: strict_pattern = r"^(?P<pre_spacer>ACGAG)(?P<UMI>.{9,13})(?P<post_spacer>TGGAGTCT)" test_df R1 21 ACGAGTTTTCGTATTTTTGGAGTCTTGTGG 22 ACGAGTAGGGAGGGGGGTGGAGTCTCAGCG 23 ACGAGGGGGGGGAGGCTGGAGTCTCCGGGT 24 ACGAGAATAACGTTTGGTGGAGTCTACCAC 25 ACGAGGGGAATAAATATTGGAGTCTCCTCC 26 ACGAGATTGGGTATGCTGGAGTCTCTGTTC 27 ACGAGGTACCCGCGCCATGGAGTCTCTCTG 28 ACGAGTGGTTTTTGTCGTGGAGTCTCACCA 29

Problem with data format while Importing pandas DF from python into google sheets using df2gsheets

阅读更多关于 Problem with data format while Importing pandas DF from python into google sheets using df2gsheets

问题 I'm using df2gspread to import a certain pandas df into google sheets. The process runs without any issues, but the numeric information which I'd like to manipulate within Gsheets is imported as text. When I use basic math operations with the data stored as text it works, but when I try to use Sheets functions such as sum, average and pretty much anything else, the value returned is always a zero. Also, if I try to manually convert text into numbers within gsheet itself, it doesn't have any

How to create a new column if some value match from a list (something like get dummies)

阅读更多关于 How to create a new column if some value match from a list (something like get dummies)

问题 I have a df like: text hello how are you hello people hello stackoverflow and a list like this: words = ["Hello","people", "stackoverflow"] Expected output: text Hello people stackoverflow hello how are you 1 0 0 hello people 1 1 0 hello stackoverflow 1 0 1 回答1: Use Series.str.get_dummies with DataFrame.reindex for filter columns by list (vallues has to be lowercase for match) and last DataFrame.join to original: words = ["hello","people", "stackoverflow"] df1 = df.join(df['text'].str.get

How to create a new column if some value match from a list (something like get dummies)

阅读更多关于 How to create a new column if some value match from a list (something like get dummies)

How do I divide elements in a single column of a python dataframe?

阅读更多关于 How do I divide elements in a single column of a python dataframe?

问题 I need to divide every element in a specific column in a Pandas DataFrame by 100. By default, the .div() function in Pandas divides all elements across all columns, and attempting to specify columns to divide leaves me with only those columns. d = { 'SYMBOL':['AAAAA','BBBBB','CCCCC'], 'ASSETS':[5, 21, 74]} data = pd.DataFrame(d,columns=['SYMBOL','ASSETS']) data = data['ASSETS'].div(100) So, starting with 0 AAAAA 5 1 BBBBB 21 2 CCCCC 74 I end up getting 0 0.05 1 0.21 2 0.74 When I want 0 AAAAA

Pandas read csv where one header is missing

阅读更多关于 Pandas read csv where one header is missing

问题 I am trying to read a csv file with Pandas but the first column contains a first name and a last name seperated by a comma. This causes Pandas to think that there are 5 columns instead of 4 so the last column now has no header making it unable to be selected. The file looks like this: CustomerName,ClientID,EmailDate,EmailAddress FNAME1,LNAME1,100,2019-01-13 00:00:00.000,FNAME1@HOTMAIL.COM FNAME2,LNAME2,100,2019-01-13 00:00:00.000,FNAME2@GMAIL.COM FNAME3,LNAME3,100,2019-01-13 00:00:00.000

How to filter dataframe by splitting categories of a columns into sets?

阅读更多关于 How to filter dataframe by splitting categories of a columns into sets?

问题 I have a dataframe: Prop_ID Unit_ID Prop_Usage Unit_Usage 1 1 RESIDENTIAL RESIDENTIAL 1 2 RESIDENTIAL COMMERCIAL 1 3 RESIDENTIAL INDUSTRIAL 1 4 RESIDENTIAL RESIDENTIAL 2 1 COMMERCIAL RESIDENTIAL 2 2 COMMERCIAL COMMERCIAL 2 3 COMMERCIAL COMMERCIAL 3 1 INDUSTRIAL INDUSTRIAL 3 2 INDUSTRIAL COMMERCIAL 4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL 4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL 5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL 5 2 COMMERCIAL / RESIDENTIAL

How to filter dataframe by splitting categories of a columns into sets?

阅读更多关于 How to filter dataframe by splitting categories of a columns into sets?

How do I subdivide/refine a dimension in an xarray DataSet?

阅读更多关于 How do I subdivide/refine a dimension in an xarray DataSet?

问题 Summary: I have a dataset that is collected in such a way that the dimensions are not initially available. I would like to take what is essentially a big block of undifferentiated data and add dimensions to it so that it can be queried, subsetted, etc. That is the core of the following question. Here is an xarray DataSet that I have: <xarray.Dataset> Dimensions: (chain: 1, draw: 2000, rows: 24000) Coordinates: * chain (chain) int64 0 * draw (draw) int64 0 1 2 3 4 5 6 7 ... 1993 1994 1995 1996

Pandas: Using Append Adds New Column and Makes Another All NaN

阅读更多关于 Pandas: Using Append Adds New Column and Makes Another All NaN

问题 I just started learning pandas a week ago or so and I've been struggling with a pandas dataframe for a bit now. My data looks like this: State NY CA Other Total Year 2003 450 50 25 525 2004 300 75 5 380 2005 500 100 100 700 2006 250 50 100 400 I made this table from a dataset that included 30 or so values for the variable I'm representing as State here. If they weren't NY or CA, in the example, I summed them and put them in an 'Other' category. The years here were made from a normalized list