pandas

Error handling ñ in pandas

流过昼夜 提交于 2021-02-08 07:23:27
问题 I am writing a script that reads a csv file and uses the pandas library to create a pivot table. I keep receiving an error ('utf-8' codec can't decode byte 0xf1 in position 6: invalid continuation byte) that I have linked back to the use of 'ñ' in one of the names in the csv file. I have searched for hours trying to find a way to handle this. I have tried including the encoding type in my pandas.read_csv and have had no luck. Here is my code: df = pandas.read_csv( os.path.join(wd,'Birthday_%s

How to filter out positional data based on distance from a known reference trajectory?

假如想象 提交于 2021-02-08 07:22:49
问题 I have a 87288-point dataset that I need to filter. The filtering fields for the dataset are a X position and a Y position, as latitude and longitude. Plotted the data looks like this: The problem is , I only need data along a certain path, which is known in advance. Something like this: I already know how to filter data in a Pandas DF, but given the path is not linear, I need an effective strategy to clear out all the noisy data with a certain degree of precision (since the dataset is so

Error handling ñ in pandas

纵然是瞬间 提交于 2021-02-08 07:22:06
问题 I am writing a script that reads a csv file and uses the pandas library to create a pivot table. I keep receiving an error ('utf-8' codec can't decode byte 0xf1 in position 6: invalid continuation byte) that I have linked back to the use of 'ñ' in one of the names in the csv file. I have searched for hours trying to find a way to handle this. I have tried including the encoding type in my pandas.read_csv and have had no luck. Here is my code: df = pandas.read_csv( os.path.join(wd,'Birthday_%s

Using Pandas to calculate distance between coordinates from imported csv

耗尽温柔 提交于 2021-02-08 07:21:52
问题 I am trying to import a .csv that contains two columns of location data (lat/long), compute the distance between points, write the distance to a new column, loop the function to the next set of coordinates, and write the output data frame to a new .csv. I have the following code written and it import pandas as pd import numpy as np pd.read_csv("input.csv") def dist_from_coordinates(lat1, lon1, lat2, lon2): R = 6371 # Earth radius in km #conversion to radians d_lat = np.radians(lat2-lat1) d

Sample with different sample sizes per customer

人走茶凉 提交于 2021-02-08 07:20:59
问题 I have a data frame as such Customer Day 0. A 1 1. A 1 2. A 1 3. A 2 4. B 3 5. B 4 and I want to sample from it but I want to sample different sizes for each customer. I have the size of each customer in another dataframe. For example, Customer Day 0. A 2 1. B 1 Suppose I want to sample per customer per day. So far I have this function: def sampling(frame,a): return np.random.choice(frame.Id,size=a) grouped = frame.groupby(['Customer','Day']) sampled = grouped.apply(sampling, a=??).reset

How to filter out positional data based on distance from a known reference trajectory?

戏子无情 提交于 2021-02-08 07:20:49
问题 I have a 87288-point dataset that I need to filter. The filtering fields for the dataset are a X position and a Y position, as latitude and longitude. Plotted the data looks like this: The problem is , I only need data along a certain path, which is known in advance. Something like this: I already know how to filter data in a Pandas DF, but given the path is not linear, I need an effective strategy to clear out all the noisy data with a certain degree of precision (since the dataset is so

Rolling PCA on pandas dataframe

时光毁灭记忆、已成空白 提交于 2021-02-08 06:59:24
问题 I'm wondering if anyone knows of how to implement a rolling/moving window PCA on a pandas dataframe. I've looked around and found implementations in R and MATLAB but not Python. Any help would be appreciated! This is not a duplicate - moving window PCA is not the same as PCA on the entire dataframe. Please see pandas.DataFrame.rolling() if you do not understand the difference 回答1: Unfortunately, pandas.DataFrame.rolling() seems to flatten the df before rolling, so it cannot be used as one

How to write a function in Python that translates each row of a csv to another language?

假如想象 提交于 2021-02-08 06:56:27
问题 How to write a function in Python that translates each row of a csv file to another language and adds the translation as another column to the same csv using pandas? The input file I have, looks like this: and I would like my output to be like: I started with this: from googletrans import Translator import pandas as pd data = pd.read_csv('~/file/my_file.csv')[['A','B']] df = pd.DataFrame(data, columns=['A','B','A_translation', 'B_translation']) and for translating a single sentence the

How to use previous N values in pandas column to fill NaNs?

烈酒焚心 提交于 2021-02-08 06:52:11
问题 Say I have a time series data as below. df priceA priceB 0 25.67 30.56 1 34.12 28.43 2 37.14 29.08 3 Nan 34.23 4 32 Nan 5 18.75 41.1 6 Nan 45.12 7 23 39.67 8 Nan 36.45 9 36 Nan Now I want to fill NaNs in column priceA by taking mean of previous N values in the column. In this case take N=3. And for column priceB I have to fill Nan by value M rows above(current index-M). I tried to write for loop for it which is not a good practice as my data is too large. Is there a better way to do this? N=3

How to use previous N values in pandas column to fill NaNs?

守給你的承諾、 提交于 2021-02-08 06:51:28
问题 Say I have a time series data as below. df priceA priceB 0 25.67 30.56 1 34.12 28.43 2 37.14 29.08 3 Nan 34.23 4 32 Nan 5 18.75 41.1 6 Nan 45.12 7 23 39.67 8 Nan 36.45 9 36 Nan Now I want to fill NaNs in column priceA by taking mean of previous N values in the column. In this case take N=3. And for column priceB I have to fill Nan by value M rows above(current index-M). I tried to write for loop for it which is not a good practice as my data is too large. Is there a better way to do this? N=3