dataframe | 易学教程

Parallelizing comparisons between two dataframes with multiprocessing

阅读更多关于 Parallelizing comparisons between two dataframes with multiprocessing

问题 I've got the following function that allows me to do some comparison between the rows of two dataframes ( data and ref )and return the index of both rows if there's a match. def get_gene(row): m = np.equal(row[0], ref.iloc[:,0].values) & np.greater_equal(row[2], ref.iloc[:,2].values) & np.less_equal(row[3], ref.iloc[:,3].values) return ref.index[m] if m.any() else None Being a process that takes time (25min for 1.6M rows in data versus 20K rows in ref ), I tried to speed things up by

Parallelizing comparisons between two dataframes with multiprocessing

阅读更多关于 Parallelizing comparisons between two dataframes with multiprocessing

Reading a CSV file to pandas works in windows, not in ubuntu

阅读更多关于 Reading a CSV file to pandas works in windows, not in ubuntu

问题 I have written some scrip in python using windows and want to run it in my raspberry with Ubuntu. I am reading a csv file with line separator new line. When I load the df I use the following code: dfaux = pd.read_csv(r'/home/ubuntu/Downloads/data.csv', sep=';') which loads a df with just one row. I have also tried including the argument lineterminator = '\n\t' which throws this error message: ValueError: Only length-1 line terminators supported In windows I see the line breaks in the csv file

Reading a CSV file to pandas works in windows, not in ubuntu

阅读更多关于 Reading a CSV file to pandas works in windows, not in ubuntu

How to split datatable dataframe into train and test dataset in python

阅读更多关于 How to split datatable dataframe into train and test dataset in python

问题 I am using datatable dataframe. How can I split the dataframe into train and test dataset? Similarly to pandas dataframe, I tried to use train_test_split(dt_df,classes) from sklearn.model_selection, but it doesn't work and I get error. import datatable as dt import numpy as np from sklearn.model_selection import train_test_split dt_df = dt.fread(csv_file_path) classe = dt_df[:, "classe"]) del dt_df[:, "classe"]) X_train, X_test, y_train, y_test = train_test_split(dt_df, classe, test_size=test

Python scrape table from website?

阅读更多关于 Python scrape table from website?

问题 I'd like to scrape every treasury yield rate that is available on treasury.gov website. https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll How would I go about taking this information? I'm assuming that I'd have to use BeautifulSoup or Selenium or something like that (preferably BS4). I'd eventually like to put this data in a Pandas DataFrame. 回答1: Here's one way you can grab the data in a table using requests and beautifulsoup import

Python scrape table from website?

阅读更多关于 Python scrape table from website?

How to split datatable dataframe into train and test dataset in python

阅读更多关于 How to split datatable dataframe into train and test dataset in python

Pandas: Sort a Multiindex Dataframe's multi-level column with mixed datatypes

阅读更多关于 Pandas: Sort a Multiindex Dataframe's multi-level column with mixed datatypes

问题 Below is my dataframe: In [2804]: df = pd.DataFrame({'A':[1,2,3,4,5,6], 'D':[{"value": '126', "perc": None, "unit": None}, {"value": 324, "perc": None, "unit": None}, {"value": 'N/A', "perc": None, "unit": None}, {}, {"value": '100', "perc": None, "unit": None}, np.nan]}) In [2794]: df.columns = pd.MultiIndex.from_product([df.columns, ['E']]) In [2807]: df Out[2807]: A D E E 0 1 {'value': '126', 'perc': None, 'unit': None} 1 2 {'value': 324, 'perc': None, 'unit': None} 2 3 {'value': 'N/A',

Expand nested dataframe into parent

阅读更多关于 Expand nested dataframe into parent

问题 I have a dataframe nested within a dataframe that I'm getting from Mongo. The number of rows match in each so that when viewed it looks like a typical dataframe. My question, how do I expand the nested dataframe into the parent so that I can run dplyr selects? See the layout below 'data.frame': 10 obs. of 2 variables: $ _id : int 1551 1033 1061 1262 1032 1896 1080 1099 1679 1690 $ personalInfo:'data.frame': 10 obs. of 2 variables: ..$ FirstName :List of 10 .. ..$ : chr "Jack" .. ..$ : chr