pandas

pandas extract regex allowing mismatches

人盡茶涼 提交于 2021-02-17 03:30:16
问题 Pandas has a very fast and nice string method, extract(). This method works perfectly with a regex such as this one: strict_pattern = r"^(?P<pre_spacer>ACGAG)(?P<UMI>.{9,13})(?P<post_spacer>TGGAGTCT)" test_df R1 21 ACGAGTTTTCGTATTTTTGGAGTCTTGTGG 22 ACGAGTAGGGAGGGGGGTGGAGTCTCAGCG 23 ACGAGGGGGGGGAGGCTGGAGTCTCCGGGT 24 ACGAGAATAACGTTTGGTGGAGTCTACCAC 25 ACGAGGGGAATAAATATTGGAGTCTCCTCC 26 ACGAGATTGGGTATGCTGGAGTCTCTGTTC 27 ACGAGGTACCCGCGCCATGGAGTCTCTCTG 28 ACGAGTGGTTTTTGTCGTGGAGTCTCACCA 29

Problem with data format while Importing pandas DF from python into google sheets using df2gsheets

北城余情 提交于 2021-02-17 03:26:16
问题 I'm using df2gspread to import a certain pandas df into google sheets. The process runs without any issues, but the numeric information which I'd like to manipulate within Gsheets is imported as text. When I use basic math operations with the data stored as text it works, but when I try to use Sheets functions such as sum, average and pretty much anything else, the value returned is always a zero. Also, if I try to manually convert text into numbers within gsheet itself, it doesn't have any

How to create a new column if some value match from a list (something like get dummies)

青春壹個敷衍的年華 提交于 2021-02-17 03:22:21
问题 I have a df like: text hello how are you hello people hello stackoverflow and a list like this: words = ["Hello","people", "stackoverflow"] Expected output: text Hello people stackoverflow hello how are you 1 0 0 hello people 1 1 0 hello stackoverflow 1 0 1 回答1: Use Series.str.get_dummies with DataFrame.reindex for filter columns by list (vallues has to be lowercase for match) and last DataFrame.join to original: words = ["hello","people", "stackoverflow"] df1 = df.join(df['text'].str.get

How to create a new column if some value match from a list (something like get dummies)

放肆的年华 提交于 2021-02-17 03:22:20
问题 I have a df like: text hello how are you hello people hello stackoverflow and a list like this: words = ["Hello","people", "stackoverflow"] Expected output: text Hello people stackoverflow hello how are you 1 0 0 hello people 1 1 0 hello stackoverflow 1 0 1 回答1: Use Series.str.get_dummies with DataFrame.reindex for filter columns by list (vallues has to be lowercase for match) and last DataFrame.join to original: words = ["hello","people", "stackoverflow"] df1 = df.join(df['text'].str.get

How do I divide elements in a single column of a python dataframe?

孤街醉人 提交于 2021-02-17 03:01:52
问题 I need to divide every element in a specific column in a Pandas DataFrame by 100. By default, the .div() function in Pandas divides all elements across all columns, and attempting to specify columns to divide leaves me with only those columns. d = { 'SYMBOL':['AAAAA','BBBBB','CCCCC'], 'ASSETS':[5, 21, 74]} data = pd.DataFrame(d,columns=['SYMBOL','ASSETS']) data = data['ASSETS'].div(100) So, starting with 0 AAAAA 5 1 BBBBB 21 2 CCCCC 74 I end up getting 0 0.05 1 0.21 2 0.74 When I want 0 AAAAA

Pandas read csv where one header is missing

主宰稳场 提交于 2021-02-17 02:44:26
问题 I am trying to read a csv file with Pandas but the first column contains a first name and a last name seperated by a comma. This causes Pandas to think that there are 5 columns instead of 4 so the last column now has no header making it unable to be selected. The file looks like this: CustomerName,ClientID,EmailDate,EmailAddress FNAME1,LNAME1,100,2019-01-13 00:00:00.000,FNAME1@HOTMAIL.COM FNAME2,LNAME2,100,2019-01-13 00:00:00.000,FNAME2@GMAIL.COM FNAME3,LNAME3,100,2019-01-13 00:00:00.000

How to filter dataframe by splitting categories of a columns into sets?

夙愿已清 提交于 2021-02-17 02:06:26
问题 I have a dataframe: Prop_ID Unit_ID Prop_Usage Unit_Usage 1 1 RESIDENTIAL RESIDENTIAL 1 2 RESIDENTIAL COMMERCIAL 1 3 RESIDENTIAL INDUSTRIAL 1 4 RESIDENTIAL RESIDENTIAL 2 1 COMMERCIAL RESIDENTIAL 2 2 COMMERCIAL COMMERCIAL 2 3 COMMERCIAL COMMERCIAL 3 1 INDUSTRIAL INDUSTRIAL 3 2 INDUSTRIAL COMMERCIAL 4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL 4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL 5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL 5 2 COMMERCIAL / RESIDENTIAL

How to filter dataframe by splitting categories of a columns into sets?

你离开我真会死。 提交于 2021-02-17 02:06:10
问题 I have a dataframe: Prop_ID Unit_ID Prop_Usage Unit_Usage 1 1 RESIDENTIAL RESIDENTIAL 1 2 RESIDENTIAL COMMERCIAL 1 3 RESIDENTIAL INDUSTRIAL 1 4 RESIDENTIAL RESIDENTIAL 2 1 COMMERCIAL RESIDENTIAL 2 2 COMMERCIAL COMMERCIAL 2 3 COMMERCIAL COMMERCIAL 3 1 INDUSTRIAL INDUSTRIAL 3 2 INDUSTRIAL COMMERCIAL 4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL 4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL 5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL 5 2 COMMERCIAL / RESIDENTIAL

How do I subdivide/refine a dimension in an xarray DataSet?

社会主义新天地 提交于 2021-02-17 01:59:33
问题 Summary: I have a dataset that is collected in such a way that the dimensions are not initially available. I would like to take what is essentially a big block of undifferentiated data and add dimensions to it so that it can be queried, subsetted, etc. That is the core of the following question. Here is an xarray DataSet that I have: <xarray.Dataset> Dimensions: (chain: 1, draw: 2000, rows: 24000) Coordinates: * chain (chain) int64 0 * draw (draw) int64 0 1 2 3 4 5 6 7 ... 1993 1994 1995 1996

Pandas: Using Append Adds New Column and Makes Another All NaN

谁说胖子不能爱 提交于 2021-02-17 01:56:52
问题 I just started learning pandas a week ago or so and I've been struggling with a pandas dataframe for a bit now. My data looks like this: State NY CA Other Total Year 2003 450 50 25 525 2004 300 75 5 380 2005 500 100 100 700 2006 250 50 100 400 I made this table from a dataset that included 30 or so values for the variable I'm representing as State here. If they weren't NY or CA, in the example, I summed them and put them in an 'Other' category. The years here were made from a normalized list