pandas

Add a column value depending on a date range (if-else)

我只是一个虾纸丫 提交于 2021-02-10 17:50:26
问题 I have a date column in my dataframe and want to add a column called location. The value of location in each row should depend on which date range it falls under. For example, the date 13th November falls between 12th November and 16th November & therefore the location should be Seattle. The date 17th November falls between 17th November and 18th November and must be New York. Below is an example of the data frame I want to achieve Dates | Location (column I want to add) .....................

pandas/sqlalchemy/pyodbc: Result object does not return rows from stored proc when UPDATE statement appears before SELECT

人盡茶涼 提交于 2021-02-10 17:44:43
问题 I'm using SQL Server 2014, pandas 0.23.4, sqlalchemy 1.2.11, pyodbc 4.0.24, and Python 3.7.0. I have a very simple stored procedure that performs an UPDATE on a table and then a SELECT on it: CREATE PROCEDURE my_proc_1 @v2 INT AS BEGIN UPDATE my_table_1 SET v2 = @v2 ; SELECT * from my_table_1 ; END GO This runs fine in MS SQL Server Management Studio. However, when I try to invoke it via Python using this code: import pandas as pd from sqlalchemy import create_engine if __name__ == "__main__"

Merge rows together who have the same value in a column

大兔子大兔子 提交于 2021-02-10 17:44:30
问题 I have a CSV file like this (which parsed by using the pandas by using read_csv): Filename f1 f2 f3 1.jpg 1 0.2 0.3 1.jpg 0 0.8 0.7 2.jpg 1 0.3 0.2 How would I use this dataset and change it into a numpy array which will look like this: [ [[1,0.2,0.3],[0,0.8.0.7]], [[1,0.3,0.2]] ] 回答1: You can create nested lists by GroupBy.apply with lambda function, DataFrame.set_index is for avoid convert column Filename to lists: df = pd.read_csv(file) L = (df.set_index('Filename') .groupby('Filename')

Python: No tables found matching pattern '.+'

流过昼夜 提交于 2021-02-10 17:38:35
问题 I am trying to do is export this table as a CSV for all 7 pages of 100 rows each within a Python script but an running into this error below the script. "http://www.nhl.com/stats/player?aggregate=1&gameType=2&report=points&pos=S&reportType=game&startDate=2017-10-19&endDate=2017-10-29&filter=gamesPlayed,gte,1&sort=points,goals" import pandas as pd dfs = pd.read_html('http://www.nhl.com/stats/player?aggregate=1&gameType=2&report=skatersummary&pos=S&reportType=game&startDate=2017-10-19&endDate

I want to assign labels 0/1 to pandas datafrmae according to columns

徘徊边缘 提交于 2021-02-10 16:57:20
问题 I am fairly new to python.I am trying to assign labels in a pandas dataframe.This is how my dataframe looks : final.head(3) Match Team1 Team2 winner A 2 3 3 B 1 2 1 C 3 1 1 I want to create a new column which demonstrates who won the match.As in if Team1 wins the game label should be 0 and if Team2 wins the game label should be 1. Expected outcome should be : - Match Team1 Team2 winner label A 2 3 3 1 B 1 2 1 0 C 3 1 1 1 Please tell me how should i proceed.Thanks in advance. 回答1: Your label

ffill not filling data in pandas dataframe

萝らか妹 提交于 2021-02-10 16:55:32
问题 I have a dataframe like this : A B C E D --------------- 0 a r g g 1 x 2 x f f r 3 t 3 y I am trying for forward filling using ffill. It is not working cols = df.columns[:4].tolist() df[cols] = df[cols].ffill() I also tried : df[cols] = df[cols].fillna(method='ffill') But it is not getting filled. Is it the empty columns in data causing this issue? Data is mocked. Exact data is different (contains strings,numbers and empty columns) desired o/p: A B C E D --------------- 0 a r g g 1 a r g x 2

Shape error when using PolynomialFeatures

匆匆过客 提交于 2021-02-10 16:54:32
问题 The Issue To begin with I'm pretty new to machine learning. I have decided to test up some of the things that I have learned on some financial datam my machine learning model looks like this: import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv("/Users/Documents/Trading.csv") poly_features = PolynomialFeatures(degree=2, include_bias=False) linear_reg = LinearRegression(fit_intercept = True) X = df_copy[[

Filter a pandas dataframe with a dictionary with various functions

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-10 16:48:25
问题 Let's say I have a dataframe df with an arbitrary number of columns. As an example, say we have a b c 0 5 foo 2 1 5 bar 3 2 4 foo 2 3 5 test 1 4 4 bar 7 Suppose I want a filter like df[(df['a'] == 5) & (~df['b'].isin(['foo','bar'])) & (df['c'].isin(range(5)))] or maybe something like df[(df['a'] == 5) & (~df['b'].isin(['test','bar'])) | (df['c'].isin(range(5)))] but I want something that can easily be plugged in as an input, something like: def filter_df(filter_kwargs, df): # do the filtering

Importing Multiple Data-frames with Pandas

流过昼夜 提交于 2021-02-10 16:48:22
问题 I'm trying to import multiple datasets into a single data frame through a function. # function to import each of the new datasets def csvImport(yearOfDataset): import glob, os for items in yearOfDataset: # dataset name ds = pd.concat(map(pd.read_csv, glob.glob(os.path.join("PSNI_StreetCrime_"+str(yearOfDataset)),"*.csv"))) I want to pass the argument to the function as follows, as it means I can call it quicker for the multiple folders I have; The folder name follow the pattern ChildFolder

fill in dates and use previous values

无人久伴 提交于 2021-02-10 16:44:01
问题 my pandas dataframe looks like the below country date gd US 01-01-2014 2 US 01-01-2015 3 US 01-01-2013 0.4 UK 01-01-2000 0.7 UK 02-01-2001 0.5 UK 01-01-2016 1 what I want to do is : 1) Fill all dates (daily) starting from each countries minimum date so say for US it is 01-01-2013 upto today and for UK it is 01-01-2000 daily upto today. 2) Fill gd column with previous available data many thanks for your help 回答1: In [67]: today = pd.to_datetime(pd.datetime.now()).normalize() In [68]: l = df