pandas

Parallelizing comparisons between two dataframes with multiprocessing

半世苍凉 提交于 2021-02-10 15:57:06
问题 I've got the following function that allows me to do some comparison between the rows of two dataframes ( data and ref )and return the index of both rows if there's a match. def get_gene(row): m = np.equal(row[0], ref.iloc[:,0].values) & np.greater_equal(row[2], ref.iloc[:,2].values) & np.less_equal(row[3], ref.iloc[:,3].values) return ref.index[m] if m.any() else None Being a process that takes time (25min for 1.6M rows in data versus 20K rows in ref ), I tried to speed things up by

Parallelizing comparisons between two dataframes with multiprocessing

做~自己de王妃 提交于 2021-02-10 15:57:04
问题 I've got the following function that allows me to do some comparison between the rows of two dataframes ( data and ref )and return the index of both rows if there's a match. def get_gene(row): m = np.equal(row[0], ref.iloc[:,0].values) & np.greater_equal(row[2], ref.iloc[:,2].values) & np.less_equal(row[3], ref.iloc[:,3].values) return ref.index[m] if m.any() else None Being a process that takes time (25min for 1.6M rows in data versus 20K rows in ref ), I tried to speed things up by

Reading a CSV file to pandas works in windows, not in ubuntu

孤街浪徒 提交于 2021-02-10 15:56:13
问题 I have written some scrip in python using windows and want to run it in my raspberry with Ubuntu. I am reading a csv file with line separator new line. When I load the df I use the following code: dfaux = pd.read_csv(r'/home/ubuntu/Downloads/data.csv', sep=';') which loads a df with just one row. I have also tried including the argument lineterminator = '\n\t' which throws this error message: ValueError: Only length-1 line terminators supported In windows I see the line breaks in the csv file

Reading a CSV file to pandas works in windows, not in ubuntu

旧巷老猫 提交于 2021-02-10 15:54:57
问题 I have written some scrip in python using windows and want to run it in my raspberry with Ubuntu. I am reading a csv file with line separator new line. When I load the df I use the following code: dfaux = pd.read_csv(r'/home/ubuntu/Downloads/data.csv', sep=';') which loads a df with just one row. I have also tried including the argument lineterminator = '\n\t' which throws this error message: ValueError: Only length-1 line terminators supported In windows I see the line breaks in the csv file

How to split datatable dataframe into train and test dataset in python

耗尽温柔 提交于 2021-02-10 15:53:53
问题 I am using datatable dataframe. How can I split the dataframe into train and test dataset? Similarly to pandas dataframe, I tried to use train_test_split(dt_df,classes) from sklearn.model_selection, but it doesn't work and I get error. import datatable as dt import numpy as np from sklearn.model_selection import train_test_split dt_df = dt.fread(csv_file_path) classe = dt_df[:, "classe"]) del dt_df[:, "classe"]) X_train, X_test, y_train, y_test = train_test_split(dt_df, classe, test_size=test

Python scrape table from website?

故事扮演 提交于 2021-02-10 15:53:16
问题 I'd like to scrape every treasury yield rate that is available on treasury.gov website. https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll How would I go about taking this information? I'm assuming that I'd have to use BeautifulSoup or Selenium or something like that (preferably BS4). I'd eventually like to put this data in a Pandas DataFrame. 回答1: Here's one way you can grab the data in a table using requests and beautifulsoup import

Python scrape table from website?

早过忘川 提交于 2021-02-10 15:52:12
问题 I'd like to scrape every treasury yield rate that is available on treasury.gov website. https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll How would I go about taking this information? I'm assuming that I'd have to use BeautifulSoup or Selenium or something like that (preferably BS4). I'd eventually like to put this data in a Pandas DataFrame. 回答1: Here's one way you can grab the data in a table using requests and beautifulsoup import

How to split datatable dataframe into train and test dataset in python

南笙酒味 提交于 2021-02-10 15:50:21
问题 I am using datatable dataframe. How can I split the dataframe into train and test dataset? Similarly to pandas dataframe, I tried to use train_test_split(dt_df,classes) from sklearn.model_selection, but it doesn't work and I get error. import datatable as dt import numpy as np from sklearn.model_selection import train_test_split dt_df = dt.fread(csv_file_path) classe = dt_df[:, "classe"]) del dt_df[:, "classe"]) X_train, X_test, y_train, y_test = train_test_split(dt_df, classe, test_size=test

write unicode data to mssql with python?

前提是你 提交于 2021-02-10 15:50:14
问题 I'm trying to write a table from a .csv file with Hebrew text in it to an sql server database. the table is valid and pandas reads the data correct (even displays the hebrew properly in pycharm), but when i try to write it to a table in the database i get question marks ( "???" ) where the Hebrew should be. this is what i've tried, using pandas and sqlalchemy: import pandas as pd from sqlalchemy import create_engine engine = create_engine('mssql+pymssql://server/test?charset=utf8') connection

Pandas: Sort a Multiindex Dataframe's multi-level column with mixed datatypes

牧云@^-^@ 提交于 2021-02-10 15:43:56
问题 Below is my dataframe: In [2804]: df = pd.DataFrame({'A':[1,2,3,4,5,6], 'D':[{"value": '126', "perc": None, "unit": None}, {"value": 324, "perc": None, "unit": None}, {"value": 'N/A', "perc": None, "unit": None}, {}, {"value": '100', "perc": None, "unit": None}, np.nan]}) In [2794]: df.columns = pd.MultiIndex.from_product([df.columns, ['E']]) In [2807]: df Out[2807]: A D E E 0 1 {'value': '126', 'perc': None, 'unit': None} 1 2 {'value': 324, 'perc': None, 'unit': None} 2 3 {'value': 'N/A',