pandas | 易学教程

Add a column value depending on a date range (if-else)

阅读更多关于 Add a column value depending on a date range (if-else)

问题 I have a date column in my dataframe and want to add a column called location. The value of location in each row should depend on which date range it falls under. For example, the date 13th November falls between 12th November and 16th November & therefore the location should be Seattle. The date 17th November falls between 17th November and 18th November and must be New York. Below is an example of the data frame I want to achieve Dates | Location (column I want to add) .....................

pandas/sqlalchemy/pyodbc: Result object does not return rows from stored proc when UPDATE statement appears before SELECT

阅读更多关于 pandas/sqlalchemy/pyodbc: Result object does not return rows from stored proc when UPDATE statement appears before SELECT

问题 I'm using SQL Server 2014, pandas 0.23.4, sqlalchemy 1.2.11, pyodbc 4.0.24, and Python 3.7.0. I have a very simple stored procedure that performs an UPDATE on a table and then a SELECT on it: CREATE PROCEDURE my_proc_1 @v2 INT AS BEGIN UPDATE my_table_1 SET v2 = @v2 ; SELECT * from my_table_1 ; END GO This runs fine in MS SQL Server Management Studio. However, when I try to invoke it via Python using this code: import pandas as pd from sqlalchemy import create_engine if __name__ == "__main__"

Merge rows together who have the same value in a column

阅读更多关于 Merge rows together who have the same value in a column

问题 I have a CSV file like this (which parsed by using the pandas by using read_csv): Filename f1 f2 f3 1.jpg 1 0.2 0.3 1.jpg 0 0.8 0.7 2.jpg 1 0.3 0.2 How would I use this dataset and change it into a numpy array which will look like this: [ [[1,0.2,0.3],[0,0.8.0.7]], [[1,0.3,0.2]] ] 回答1: You can create nested lists by GroupBy.apply with lambda function, DataFrame.set_index is for avoid convert column Filename to lists: df = pd.read_csv(file) L = (df.set_index('Filename') .groupby('Filename')

Python: No tables found matching pattern '.+'

阅读更多关于 Python: No tables found matching pattern '.+'

问题 I am trying to do is export this table as a CSV for all 7 pages of 100 rows each within a Python script but an running into this error below the script. "http://www.nhl.com/stats/player?aggregate=1&gameType=2&report=points&pos=S&reportType=game&startDate=2017-10-19&endDate=2017-10-29&filter=gamesPlayed,gte,1&sort=points,goals" import pandas as pd dfs = pd.read_html('http://www.nhl.com/stats/player?aggregate=1&gameType=2&report=skatersummary&pos=S&reportType=game&startDate=2017-10-19&endDate

I want to assign labels 0/1 to pandas datafrmae according to columns

阅读更多关于 I want to assign labels 0/1 to pandas datafrmae according to columns

问题 I am fairly new to python.I am trying to assign labels in a pandas dataframe.This is how my dataframe looks : final.head(3) Match Team1 Team2 winner A 2 3 3 B 1 2 1 C 3 1 1 I want to create a new column which demonstrates who won the match.As in if Team1 wins the game label should be 0 and if Team2 wins the game label should be 1. Expected outcome should be : - Match Team1 Team2 winner label A 2 3 3 1 B 1 2 1 0 C 3 1 1 1 Please tell me how should i proceed.Thanks in advance. 回答1: Your label

ffill not filling data in pandas dataframe

阅读更多关于 ffill not filling data in pandas dataframe

问题 I have a dataframe like this : A B C E D --------------- 0 a r g g 1 x 2 x f f r 3 t 3 y I am trying for forward filling using ffill. It is not working cols = df.columns[:4].tolist() df[cols] = df[cols].ffill() I also tried : df[cols] = df[cols].fillna(method='ffill') But it is not getting filled. Is it the empty columns in data causing this issue? Data is mocked. Exact data is different (contains strings,numbers and empty columns) desired o/p: A B C E D --------------- 0 a r g g 1 a r g x 2

Shape error when using PolynomialFeatures

阅读更多关于 Shape error when using PolynomialFeatures

问题 The Issue To begin with I'm pretty new to machine learning. I have decided to test up some of the things that I have learned on some financial datam my machine learning model looks like this: import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv("/Users/Documents/Trading.csv") poly_features = PolynomialFeatures(degree=2, include_bias=False) linear_reg = LinearRegression(fit_intercept = True) X = df_copy[[

Filter a pandas dataframe with a dictionary with various functions

阅读更多关于 Filter a pandas dataframe with a dictionary with various functions

问题 Let's say I have a dataframe df with an arbitrary number of columns. As an example, say we have a b c 0 5 foo 2 1 5 bar 3 2 4 foo 2 3 5 test 1 4 4 bar 7 Suppose I want a filter like df[(df['a'] == 5) & (~df['b'].isin(['foo','bar'])) & (df['c'].isin(range(5)))] or maybe something like df[(df['a'] == 5) & (~df['b'].isin(['test','bar'])) | (df['c'].isin(range(5)))] but I want something that can easily be plugged in as an input, something like: def filter_df(filter_kwargs, df): # do the filtering

Importing Multiple Data-frames with Pandas

阅读更多关于 Importing Multiple Data-frames with Pandas

问题 I'm trying to import multiple datasets into a single data frame through a function. # function to import each of the new datasets def csvImport(yearOfDataset): import glob, os for items in yearOfDataset: # dataset name ds = pd.concat(map(pd.read_csv, glob.glob(os.path.join("PSNI_StreetCrime_"+str(yearOfDataset)),"*.csv"))) I want to pass the argument to the function as follows, as it means I can call it quicker for the multiple folders I have; The folder name follow the pattern ChildFolder

fill in dates and use previous values

阅读更多关于 fill in dates and use previous values

问题 my pandas dataframe looks like the below country date gd US 01-01-2014 2 US 01-01-2015 3 US 01-01-2013 0.4 UK 01-01-2000 0.7 UK 02-01-2001 0.5 UK 01-01-2016 1 what I want to do is : 1) Fill all dates (daily) starting from each countries minimum date so say for US it is 01-01-2013 upto today and for UK it is 01-01-2000 daily upto today. 2) Fill gd column with previous available data many thanks for your help 回答1: In [67]: today = pd.to_datetime(pd.datetime.now()).normalize() In [68]: l = df