dataframe

Python: Extracting XML to DataFrame (Pandas)

可紊 提交于 2021-02-07 14:21:22
问题 a have an XML file that looks like this: <?xml version="1.0" encoding="utf-8"?> <comments> <row Id="1" PostId="2" Score="0" Text="(...)" CreationDate="2011-08-30T21:15:28.063" UserId="16" /> <row Id="2" PostId="17" Score="1" Text="(...)" CreationDate="2011-08-30T21:24:56.573" UserId="27" /> <row Id="3" PostId="26" Score="0" Text="(...)" UserId="9" /> </comments> What I'm trying to do is to extract ID, Text and CreationDate colums into pandas DF and I've tryied following: import xml.etree

How to multiply every column of one Pandas Dataframe with every column of another Dataframe efficiently?

谁说胖子不能爱 提交于 2021-02-07 13:56:40
问题 I'm trying to multiply two pandas dataframes with each other. Specifically, I want to multiply every column with every column of the other df. The dataframes are one-hot encoded, so they look like this: col_1, col_2, col_3, ... 0 1 0 1 0 0 0 0 1 ... I could just iterate through each of the columns using a for loop, but in python that is computationally expensive, and I'm hoping there's an easier way. One of the dataframes has 500 columns, the other has 100 columns. This is the fastest version

Module 'rpy2.robjects.pandas2ri' has no attribute 'ri2py'

爷,独闯天下 提交于 2021-02-07 13:39:06
问题 I'm trying to convert R-dataframe to Python Pandas DataFrame. I use the following code: from rpy2.robjects import pandas2ri pandas2ri.activate() r_dataframe = r_function(my_dataframe['Numbers']) print(r_dataframe) python_dataframe = pandas2ri.ri2py(r_dataframe) The above code works well in Jupyter Notebook (Anaconda). But if I run this code through a my_program.py file through the terminal, I get an error: :~$ python3 my_program.py Traceback (most recent call last): File "my_program.py", line

Module 'rpy2.robjects.pandas2ri' has no attribute 'ri2py'

爱⌒轻易说出口 提交于 2021-02-07 13:34:16
问题 I'm trying to convert R-dataframe to Python Pandas DataFrame. I use the following code: from rpy2.robjects import pandas2ri pandas2ri.activate() r_dataframe = r_function(my_dataframe['Numbers']) print(r_dataframe) python_dataframe = pandas2ri.ri2py(r_dataframe) The above code works well in Jupyter Notebook (Anaconda). But if I run this code through a my_program.py file through the terminal, I get an error: :~$ python3 my_program.py Traceback (most recent call last): File "my_program.py", line

Python pandas dataframe: find max for each unique values of an another column

倖福魔咒の 提交于 2021-02-07 12:51:22
问题 I have a large dataframe (from 500k to 1M rows) which contains for example these 3 numeric columns: ID, A, B I want to filter the results in order to obtain a table like the one in the image below, where, for each unique value of column id, i have the maximum and minimum value of A and B. How can i do? EDIT: i have updated the image below in order to be more clear: when i get the max or min from a column i need to get also the data associated to it of the others columns 回答1: Sample data (note

Date column coerced to numeric when indexing dataframe with [[ and a vector

隐身守侯 提交于 2021-02-07 11:58:15
问题 I am creating a data.frame with a column of type Date . When indexing the data frame with [[ and a numeric vector, the Date becomes a number. This is causing a problem when using purrr::pmap . Can anyone explain why this is happening and is there a work around? Example: x <- data.frame(d1 = lubridate::ymd(c("2018-01-01","2018-02-01"))) class(x$d1) # [1] "Date" x[[1]] # [1] "2018-01-01" "2018-02-01" x[[c(1, 1)]] # [1] 17532 回答1: Overview After reading why does unlist() kill dates in R and the

How to find a pattern in a string and extract it as a new column of data frame

梦想与她 提交于 2021-02-07 10:47:19
问题 I have a data frame as shown below: c("3.2% 1ST $100000 AND 1.1% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY", "$4000", "3.3% 1ST $100000 AND 1.2% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE", "3.2% 1ST $100000 1.1% BALANCE") [1] "3.2% 1ST $100000 AND 1.1% BALANCE" [2] "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY" [3] "$4000" [4] "3.3% 1ST $100000 AND 1.2% BALANCE" [5] "3.3% 1ST $100000 AND 1.2% BALANCE" [6] "3.2% 1ST $100000 1.1%

How to apply a function to multiple columns to create multiple new columns in R?

我只是一个虾纸丫 提交于 2021-02-07 10:30:14
问题 I've this list of sequences aqi_range and a dataframe df : aqi_range = list(0:50,51:100,101:250) df PM10_mean PM10_min PM10_max PM2.5_mean PM2.5_min PM2.5_max 1 85.6 3 264 75.7 3 240 2 105. 6 243 76.4 3 191 3 95.8 19 287 48.4 8 134 4 85.5 50 166 64.8 32 103 5 55.9 24 117 46.7 19 77 6 37.5 6 116 31.3 3 87 7 26 5 69 15.5 3 49 8 82.3 34 169 49.6 25 120 9 170 68 272 133 67 201 10 254 189 323 226 173 269 Now I've created these two pretty simple functions that i want to apply to this dataframe to

Could there be an easier way to use pandas read_clipboard to read a Series?

北战南征 提交于 2021-02-07 10:01:10
问题 Some times, i want use read_clipboard to read Series es, and i would have to do: pd.Series(pd.read_clipboard(header=None).values[:,0]) So would it be nice if there was an easier way? I can do it very easily for data-frames, like: pd.read_clipboard() And that's it. But for Series , it's much longer-one-liner. So is there an easier way? That i don't know? Any secretive code? 回答1: Copy this to clipboard: 1 2 3 Better would be to use squeeze=True as an argument. pd.read_clipboard(header=None,

Cannot replace special characters in a Python pandas dataframe

我与影子孤独终老i 提交于 2021-02-07 09:55:47
问题 I'm working with Python 3.5 in Windows. I have a dataframe where a 'titles' str type column contains titles of headlines, some of which have special characters such as â , € , ˜ . I am trying to replace these with a space '' using pandas.replace . I have tried various iterations and nothing works. I am able to replace regular characters, but these special characters just don't seem to work. The code runs without error, but the replacement simply does not occur, and instead the original title