pandas

pandas returning the unnamed columns

爷,独闯天下 提交于 2021-02-07 10:24:26
问题 The following is example of data I have in excel sheet. A B C 1 2 3 4 5 6 I am trying to get the columns name using the following code: p1 = list(df1t.columns.values) the output is like this [A, B, C, 'Unnamed: 3', 'unnamed 4', 'unnamed 5', .....] I checked the excel sheet, there is only three columns named A, B, and C. Other columns are blank. Any suggestion? 回答1: There is problem some cells are not empty but contains some whitespaces. If need columns names with filtering Unnamed : cols =

pandas returning the unnamed columns

我只是一个虾纸丫 提交于 2021-02-07 10:22:29
问题 The following is example of data I have in excel sheet. A B C 1 2 3 4 5 6 I am trying to get the columns name using the following code: p1 = list(df1t.columns.values) the output is like this [A, B, C, 'Unnamed: 3', 'unnamed 4', 'unnamed 5', .....] I checked the excel sheet, there is only three columns named A, B, and C. Other columns are blank. Any suggestion? 回答1: There is problem some cells are not empty but contains some whitespaces. If need columns names with filtering Unnamed : cols =

Python - Extract CSV Files from Multiple Zip Files and Combine the Data

你说的曾经没有我的故事 提交于 2021-02-07 10:20:32
问题 I have a Python script and pandas to combine multiple ZIP files. I am using data hosted in a GitHub repository here: https://github.com/statistikat/coronaDAT The script should take all ZIP files in a folder structure, find the "Bezirke.csv" file in the ZIP file, and combine all the Bezirke.csv files into one large CSV file. However, the code is only grabbing one ZIP file from the folder. Any suggestions on why the it is not taking the data from the other ZIP files in the folder? import glob

Row wise outlier detection in python

…衆ロ難τιáo~ 提交于 2021-02-07 10:19:55
问题 I have the CSV data as follows: A_ID P_ID 1429982904 1430370002 1430974801 1431579602 1432184403 1432789202 1435208402 1435308653 11Jgipc qjMakF 364 365 363 363 364 364 364 367 11Jgipc qxL8FJ 18 18 18 18 18 18 18 18 11Jgipc r0Bpnt 40 40 41 41 41 42 42 42 11Jgipc roLk4N 140 140 143 143 146 147 147 149 11Jgipc tOudhM 12 13 13 13 13 13 14 14 11Jgipc u-x6o8 678 678 688 688 689 690 692 695 11Jgipc u5HHmV 1778 1785 1811 1811 1819 1826 1834 1836 11Jgipc ufrVoP 67 67 67 67 67 67 67 67 11Jgipc vRqMK4

How to Read multiple files in Python for Pandas separate dataframes

自作多情 提交于 2021-02-07 10:19:22
问题 I am trying to read 6 files into 7 different data frames but I am unable to figure out how should I do that. File names can be complete random, that is I know the files but it is not like data1.csv data2.csv. I tried using something like this: import sys import os import numpy as np import pandas as pd from datetime import datetime, timedelta f1='Norway.csv' f='Canada.csv' f='Chile.csv' Norway = pd.read_csv(Norway.csv) Canada = pd.read_csv(Canada.csv) Chile = pd.read_csv(Chile.csv ) I need to

Row wise outlier detection in python

余生长醉 提交于 2021-02-07 10:18:26
问题 I have the CSV data as follows: A_ID P_ID 1429982904 1430370002 1430974801 1431579602 1432184403 1432789202 1435208402 1435308653 11Jgipc qjMakF 364 365 363 363 364 364 364 367 11Jgipc qxL8FJ 18 18 18 18 18 18 18 18 11Jgipc r0Bpnt 40 40 41 41 41 42 42 42 11Jgipc roLk4N 140 140 143 143 146 147 147 149 11Jgipc tOudhM 12 13 13 13 13 13 14 14 11Jgipc u-x6o8 678 678 688 688 689 690 692 695 11Jgipc u5HHmV 1778 1785 1811 1811 1819 1826 1834 1836 11Jgipc ufrVoP 67 67 67 67 67 67 67 67 11Jgipc vRqMK4

Row wise outlier detection in python

99封情书 提交于 2021-02-07 10:16:40
问题 I have the CSV data as follows: A_ID P_ID 1429982904 1430370002 1430974801 1431579602 1432184403 1432789202 1435208402 1435308653 11Jgipc qjMakF 364 365 363 363 364 364 364 367 11Jgipc qxL8FJ 18 18 18 18 18 18 18 18 11Jgipc r0Bpnt 40 40 41 41 41 42 42 42 11Jgipc roLk4N 140 140 143 143 146 147 147 149 11Jgipc tOudhM 12 13 13 13 13 13 14 14 11Jgipc u-x6o8 678 678 688 688 689 690 692 695 11Jgipc u5HHmV 1778 1785 1811 1811 1819 1826 1834 1836 11Jgipc ufrVoP 67 67 67 67 67 67 67 67 11Jgipc vRqMK4

sklearn grid.fit(X,y) - error: “positional indexers are out-of-bounds” for X_train,y_train

人走茶凉 提交于 2021-02-07 10:11:26
问题 This is a question about scikit learn (version 0.17.0) in Python 2.7 along with Pandas 0.17.1. In order to split raw data (with no missing entries) using the approach detailed here, I have found that if the split data are used to proceed with a .fit() that there is an error that appears. Here is the code taken largely unchanged from the other stackoverflow question with renaming of variables. I have then instantiated a grid and tried to fit the split data with the aim of determining optimal

Find the most similar row using Python

大兔子大兔子 提交于 2021-02-07 10:11:11
问题 I have two data frames (df1 and df2). In the df1 I store one row with a set of values and I want to find the most similar row in the df2. import pandas as pd import numpy as np # Df1 has only one row and four columns. df1 = pd.DataFrame(np.array([[30, 60, 70, 40]]), columns=['A', 'B', 'C','D']) # Df2 has 50 rows and four columns df2 = pd.DataFrame(np.random.randint(0,100,size=(50, 4)), columns=list('ABCD')) Question: Based on the df1 what is the most similar row in df2? 回答1: If you want the

Rolling 3 previous months with unique counts after groupby in pandas dataframe

早过忘川 提交于 2021-02-07 10:10:25
问题 The following is the dataframe Date Name data 01/01/2017 Alpha A 02/01/2017 Alpha A 03/01/2017 Alpha B 01/01/2017 Beta A 01/20/2017 Beta D 03/01/2017 Beta C 04/01/2017 Beta C 05/01/2017 Beta B Expected Output: Date Name data Jan 2017 Alpha 1 Feb 2017 Alpha 1 Mar 2017 Alpha 2 Jan 2017 Beta 2 Mar 2017 Beta 3 Apr 2017 Beta 1 May 2017 Beta 2 I am looking for unique counts of "data" group by "Name" on 3 month rolling basis. Consider the example of "March 2017" and "Name" -> "Beta". So the months