dataframe | 易学教程

Python: Extracting XML to DataFrame (Pandas)

阅读更多关于 Python: Extracting XML to DataFrame (Pandas)

问题 a have an XML file that looks like this: <?xml version="1.0" encoding="utf-8"?> <comments> <row Id="1" PostId="2" Score="0" Text="(...)" CreationDate="2011-08-30T21:15:28.063" UserId="16" /> <row Id="2" PostId="17" Score="1" Text="(...)" CreationDate="2011-08-30T21:24:56.573" UserId="27" /> <row Id="3" PostId="26" Score="0" Text="(...)" UserId="9" /> </comments> What I'm trying to do is to extract ID, Text and CreationDate colums into pandas DF and I've tryied following: import xml.etree

How to multiply every column of one Pandas Dataframe with every column of another Dataframe efficiently?

阅读更多关于 How to multiply every column of one Pandas Dataframe with every column of another Dataframe efficiently?

问题 I'm trying to multiply two pandas dataframes with each other. Specifically, I want to multiply every column with every column of the other df. The dataframes are one-hot encoded, so they look like this: col_1, col_2, col_3, ... 0 1 0 1 0 0 0 0 1 ... I could just iterate through each of the columns using a for loop, but in python that is computationally expensive, and I'm hoping there's an easier way. One of the dataframes has 500 columns, the other has 100 columns. This is the fastest version

Module 'rpy2.robjects.pandas2ri' has no attribute 'ri2py'

阅读更多关于 Module 'rpy2.robjects.pandas2ri' has no attribute 'ri2py'

问题 I'm trying to convert R-dataframe to Python Pandas DataFrame. I use the following code: from rpy2.robjects import pandas2ri pandas2ri.activate() r_dataframe = r_function(my_dataframe['Numbers']) print(r_dataframe) python_dataframe = pandas2ri.ri2py(r_dataframe) The above code works well in Jupyter Notebook (Anaconda). But if I run this code through a my_program.py file through the terminal, I get an error: :~$ python3 my_program.py Traceback (most recent call last): File "my_program.py", line

Module 'rpy2.robjects.pandas2ri' has no attribute 'ri2py'

阅读更多关于 Module 'rpy2.robjects.pandas2ri' has no attribute 'ri2py'

Python pandas dataframe: find max for each unique values of an another column

阅读更多关于 Python pandas dataframe: find max for each unique values of an another column

问题 I have a large dataframe (from 500k to 1M rows) which contains for example these 3 numeric columns: ID, A, B I want to filter the results in order to obtain a table like the one in the image below, where, for each unique value of column id, i have the maximum and minimum value of A and B. How can i do? EDIT: i have updated the image below in order to be more clear: when i get the max or min from a column i need to get also the data associated to it of the others columns 回答1: Sample data (note

Date column coerced to numeric when indexing dataframe with [[ and a vector

阅读更多关于 Date column coerced to numeric when indexing dataframe with [[ and a vector

问题 I am creating a data.frame with a column of type Date . When indexing the data frame with [[ and a numeric vector, the Date becomes a number. This is causing a problem when using purrr::pmap . Can anyone explain why this is happening and is there a work around? Example: x <- data.frame(d1 = lubridate::ymd(c("2018-01-01","2018-02-01"))) class(x$d1) # [1] "Date" x[[1]] # [1] "2018-01-01" "2018-02-01" x[[c(1, 1)]] # [1] 17532 回答1: Overview After reading why does unlist() kill dates in R and the

How to find a pattern in a string and extract it as a new column of data frame

阅读更多关于 How to find a pattern in a string and extract it as a new column of data frame

问题 I have a data frame as shown below: c("3.2% 1ST $100000 AND 1.1% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY", "$4000", "3.3% 1ST $100000 AND 1.2% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE", "3.2% 1ST $100000 1.1% BALANCE") [1] "3.2% 1ST $100000 AND 1.1% BALANCE" [2] "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY" [3] "$4000" [4] "3.3% 1ST $100000 AND 1.2% BALANCE" [5] "3.3% 1ST $100000 AND 1.2% BALANCE" [6] "3.2% 1ST $100000 1.1%

How to apply a function to multiple columns to create multiple new columns in R?

阅读更多关于 How to apply a function to multiple columns to create multiple new columns in R?

问题 I've this list of sequences aqi_range and a dataframe df : aqi_range = list(0:50,51:100,101:250) df PM10_mean PM10_min PM10_max PM2.5_mean PM2.5_min PM2.5_max 1 85.6 3 264 75.7 3 240 2 105. 6 243 76.4 3 191 3 95.8 19 287 48.4 8 134 4 85.5 50 166 64.8 32 103 5 55.9 24 117 46.7 19 77 6 37.5 6 116 31.3 3 87 7 26 5 69 15.5 3 49 8 82.3 34 169 49.6 25 120 9 170 68 272 133 67 201 10 254 189 323 226 173 269 Now I've created these two pretty simple functions that i want to apply to this dataframe to

Could there be an easier way to use pandas read_clipboard to read a Series?

阅读更多关于 Could there be an easier way to use pandas read_clipboard to read a Series?

问题 Some times, i want use read_clipboard to read Series es, and i would have to do: pd.Series(pd.read_clipboard(header=None).values[:,0]) So would it be nice if there was an easier way? I can do it very easily for data-frames, like: pd.read_clipboard() And that's it. But for Series , it's much longer-one-liner. So is there an easier way? That i don't know? Any secretive code? 回答1: Copy this to clipboard: 1 2 3 Better would be to use squeeze=True as an argument. pd.read_clipboard(header=None,

Cannot replace special characters in a Python pandas dataframe

阅读更多关于 Cannot replace special characters in a Python pandas dataframe

问题 I'm working with Python 3.5 in Windows. I have a dataframe where a 'titles' str type column contains titles of headlines, some of which have special characters such as â , € , ˜ . I am trying to replace these with a space '' using pandas.replace . I have tried various iterations and nothing works. I am able to replace regular characters, but these special characters just don't seem to work. The code runs without error, but the replacement simply does not occur, and instead the original title