dataframe

Parallelizing comparisons between two dataframes with multiprocessing

半世苍凉 提交于 2021-02-10 15:57:06
问题 I've got the following function that allows me to do some comparison between the rows of two dataframes ( data and ref )and return the index of both rows if there's a match. def get_gene(row): m = np.equal(row[0], ref.iloc[:,0].values) & np.greater_equal(row[2], ref.iloc[:,2].values) & np.less_equal(row[3], ref.iloc[:,3].values) return ref.index[m] if m.any() else None Being a process that takes time (25min for 1.6M rows in data versus 20K rows in ref ), I tried to speed things up by

Parallelizing comparisons between two dataframes with multiprocessing

做~自己de王妃 提交于 2021-02-10 15:57:04
问题 I've got the following function that allows me to do some comparison between the rows of two dataframes ( data and ref )and return the index of both rows if there's a match. def get_gene(row): m = np.equal(row[0], ref.iloc[:,0].values) & np.greater_equal(row[2], ref.iloc[:,2].values) & np.less_equal(row[3], ref.iloc[:,3].values) return ref.index[m] if m.any() else None Being a process that takes time (25min for 1.6M rows in data versus 20K rows in ref ), I tried to speed things up by

Reading a CSV file to pandas works in windows, not in ubuntu

孤街浪徒 提交于 2021-02-10 15:56:13
问题 I have written some scrip in python using windows and want to run it in my raspberry with Ubuntu. I am reading a csv file with line separator new line. When I load the df I use the following code: dfaux = pd.read_csv(r'/home/ubuntu/Downloads/data.csv', sep=';') which loads a df with just one row. I have also tried including the argument lineterminator = '\n\t' which throws this error message: ValueError: Only length-1 line terminators supported In windows I see the line breaks in the csv file

Reading a CSV file to pandas works in windows, not in ubuntu

旧巷老猫 提交于 2021-02-10 15:54:57
问题 I have written some scrip in python using windows and want to run it in my raspberry with Ubuntu. I am reading a csv file with line separator new line. When I load the df I use the following code: dfaux = pd.read_csv(r'/home/ubuntu/Downloads/data.csv', sep=';') which loads a df with just one row. I have also tried including the argument lineterminator = '\n\t' which throws this error message: ValueError: Only length-1 line terminators supported In windows I see the line breaks in the csv file

How to split datatable dataframe into train and test dataset in python

耗尽温柔 提交于 2021-02-10 15:53:53
问题 I am using datatable dataframe. How can I split the dataframe into train and test dataset? Similarly to pandas dataframe, I tried to use train_test_split(dt_df,classes) from sklearn.model_selection, but it doesn't work and I get error. import datatable as dt import numpy as np from sklearn.model_selection import train_test_split dt_df = dt.fread(csv_file_path) classe = dt_df[:, "classe"]) del dt_df[:, "classe"]) X_train, X_test, y_train, y_test = train_test_split(dt_df, classe, test_size=test

Python scrape table from website?

故事扮演 提交于 2021-02-10 15:53:16
问题 I'd like to scrape every treasury yield rate that is available on treasury.gov website. https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll How would I go about taking this information? I'm assuming that I'd have to use BeautifulSoup or Selenium or something like that (preferably BS4). I'd eventually like to put this data in a Pandas DataFrame. 回答1: Here's one way you can grab the data in a table using requests and beautifulsoup import

Python scrape table from website?

早过忘川 提交于 2021-02-10 15:52:12
问题 I'd like to scrape every treasury yield rate that is available on treasury.gov website. https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll How would I go about taking this information? I'm assuming that I'd have to use BeautifulSoup or Selenium or something like that (preferably BS4). I'd eventually like to put this data in a Pandas DataFrame. 回答1: Here's one way you can grab the data in a table using requests and beautifulsoup import

How to split datatable dataframe into train and test dataset in python

南笙酒味 提交于 2021-02-10 15:50:21
问题 I am using datatable dataframe. How can I split the dataframe into train and test dataset? Similarly to pandas dataframe, I tried to use train_test_split(dt_df,classes) from sklearn.model_selection, but it doesn't work and I get error. import datatable as dt import numpy as np from sklearn.model_selection import train_test_split dt_df = dt.fread(csv_file_path) classe = dt_df[:, "classe"]) del dt_df[:, "classe"]) X_train, X_test, y_train, y_test = train_test_split(dt_df, classe, test_size=test

Pandas: Sort a Multiindex Dataframe's multi-level column with mixed datatypes

牧云@^-^@ 提交于 2021-02-10 15:43:56
问题 Below is my dataframe: In [2804]: df = pd.DataFrame({'A':[1,2,3,4,5,6], 'D':[{"value": '126', "perc": None, "unit": None}, {"value": 324, "perc": None, "unit": None}, {"value": 'N/A', "perc": None, "unit": None}, {}, {"value": '100', "perc": None, "unit": None}, np.nan]}) In [2794]: df.columns = pd.MultiIndex.from_product([df.columns, ['E']]) In [2807]: df Out[2807]: A D E E 0 1 {'value': '126', 'perc': None, 'unit': None} 1 2 {'value': 324, 'perc': None, 'unit': None} 2 3 {'value': 'N/A',

Expand nested dataframe into parent

时光总嘲笑我的痴心妄想 提交于 2021-02-10 15:10:22
问题 I have a dataframe nested within a dataframe that I'm getting from Mongo. The number of rows match in each so that when viewed it looks like a typical dataframe. My question, how do I expand the nested dataframe into the parent so that I can run dplyr selects? See the layout below 'data.frame': 10 obs. of 2 variables: $ _id : int 1551 1033 1061 1262 1032 1896 1080 1099 1679 1690 $ personalInfo:'data.frame': 10 obs. of 2 variables: ..$ FirstName :List of 10 .. ..$ : chr "Jack" .. ..$ : chr