data-science

Append Multiple Excel Files(xlsx) together in python

流过昼夜 提交于 2019-12-05 11:22:59
import pandas as pd import os import glob all_data = pd.DataFrame() for f in glob.glob("output/test*.xlsx") df = pd.read_excel(f) all_data = all_data.append(df, ignore_index=True) I want to put multiple xlsx files into one xlsx. the excel files are in the output/test folder. The columns are the same, in all but I want concat the rows. the above code doesn't seem to work Let all_data be a list. all_data = [] for f in glob.glob("output/test/*.xlsx"): all_data.append(pd.read_excel(f)) Now, call pd.concat : df = pd.concat(all_data, ignore_index=True) Make sure all column names are the same,

Neural network is not giving the expected output after training in Python

走远了吗. 提交于 2019-12-05 09:51:34
My neural network is not giving the expected output after training in Python. Is there any error in the code? Is there any way to reduce the mean squared error (MSE)? I tried to train (Run the program) the network repeatedly but it is not learning, instead it is giving the same MSE and output. Here is the Data I used: https://drive.google.com/open?id=1GLm87-5E_6YhUIPZ_CtQLV9F9wcGaTj2 Here is my code: #load and evaluate a saved model from numpy import loadtxt from tensorflow.keras.models import load_model # load model model = load_model('ANNnew.h5') # summarize model. model.summary() #Model

Selectively import from another Jupyter Notebook

大兔子大兔子 提交于 2019-12-05 00:51:42
问题 I arranged my Jupyter notebooks into: data.ipynb , methods.ipynb and results.ipynb . How can I selectively import cells from data and methods notebooks for use in the results notebook? I know of nbimporter and ipynb but neither of those offers selective import of variables. There is an option to import definitions - including variables that are uppercase - but this does not work for me as I would have to convert most of the variables in my notebooks to uppercase. I would rather import

Time difference within group by objects in Python Pandas

强颜欢笑 提交于 2019-12-04 22:53:25
I have a dataframe that looks like this: from to datetime other ------------------------------------------------- 11 1 2016-11-06 22:00:00 - 11 1 2016-11-06 20:00:00 - 11 1 2016-11-06 15:45:00 - 11 12 2016-11-06 15:00:00 - 11 1 2016-11-06 12:00:00 - 11 18 2016-11-05 10:00:00 - 11 12 2016-11-05 10:00:00 - 12 1 2016-10-05 10:00:59 - 12 3 2016-09-06 10:00:34 - I want to groupby "from" and then "to" columns and then sort the "datetime" in descending order and then finally want to calculate the time difference within these grouped by objects between the current time and the next time. For eg, in

Split Column into Unknown Number of Columns by Delimiter Pandas

断了今生、忘了曾经 提交于 2019-12-04 17:08:42
I am trying to split a column into multiple columns based off comma/space seperation. my dataframe currently looks like Item Colors 0 ID-1 Red, Blue, Green 1 ID-2 Red, Blue 2 ID-3 Blue, Green 3 ID-4 Blue 4 ID-5 Red I would like to transform the 'Colors' column into Red, Blue and Green like this: Item Red Blue Green 0 ID-1 1 1 1 1 ID-2 1 1 0 2 ID-3 0 1 1 3 ID-4 0 1 0 4 ID-5 1 0 1 I really have no idea how to do this. Any help would be greatly appreciated. WeNYoBen You can using get_dummies pd.concat([df,df.Colors.str.get_dummies(sep=', ')],1) Out[450]: Item Colors Blue Green Red 0 ID-1 Red,Blue

How do I add limiting conditions when using GpyOpt?

假装没事ソ 提交于 2019-12-04 16:50:36
Currently I try to minimize the function and get optimized parameters using GPyOpt. import GPy import GPyOpt from math import log def f(x): x0,x1,x2,x3,x4,x5 = x[:,0],x[:,1],x[:,2],x[:,3],x[:,4],x[:,5], f0 = 0.2 * log(x0) f1 = 0.3 * log(x1) f2 = 0.4 * log(x2) f3 = 0.2 * log(x3) f4 = 0.5 * log(x4) f5 = 0.2 * log(x5) return -(f0 + f1 + f2 + f3 + f4 + f5) bounds = [ {'name': 'x0', 'type': 'discrete', 'domain': (1,1000000)}, {'name': 'x1', 'type': 'discrete', 'domain': (1,1000000)}, {'name': 'x2', 'type': 'discrete', 'domain': (1,1000000)}, {'name': 'x3', 'type': 'discrete', 'domain': (1,1000000)}

scikit-learn: applying an arbitary function as part of a pipeline

ε祈祈猫儿з 提交于 2019-12-04 13:21:03
I've just discovered the Pipeline feature of scikit-learn, and I find it very useful for testing different combinations of preprocessing steps before training my model. A pipeline is a chain of objects that implement the fit and transform methods. Now, if I wanted to add a new preprocessing step, I used to write a class that inherits from sklearn.base.estimator . However, I'm thinking that there must be a simpler method. Do I really need to wrap every function I want to apply in an estimator class? Example: class Categorizer(sklearn.base.BaseEstimator): """ Converts given columns into pandas

R : knnImputation Giving Error

那年仲夏 提交于 2019-12-04 13:06:22
Getting below error in R coding. in my Brand_X.xlsx dataset, there are few NA values which I am trying to compute using KNN imputation but I am getting below error. whats wrong here? Thanks! > library(readxl) > Brand_X <- read_excel("Brand_X.xlsx") > str(Brand_X) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 101 obs. of 8 variables: $ Rel_price_lag5: num 108 111 105 103 109 104 110 114 103 108 ... $ Rel_price_lag1: num 110 109 217 241 855 271 234 297 271 999 ... $ Rel_Price : num 122 110 109 217 241 855 271 234 297 271 ... $ Promo : num 74 29 32 24 16 31 22 7 32 22 ... $ Loy_HH : num 37 26 35 30

Count number of counties per state using python {census}

廉价感情. 提交于 2019-12-04 12:32:39
I am troubling with counting the number of counties using famous cenus.csv data. Task: Count number of counties in each state. Facing comparing (I think) / Please read below? I've tried this: df = pd.read_csv('census.csv') dfd = df[:]['STNAME'].unique() //Gives out names of state serr = pd.Series(dfd) // converting to series (from array) After this, i've tried using two approaches: 1: df[df['STNAME'] == serr] **//ERROR: series length must match** 2: i = 0 for name in serr: //This generate error 'Alabama' df['STNAME'] == name for i in serr: serr[i] == serr[name] print(serr[name].count) i+=1

Wor2vec fine tuning

怎甘沉沦 提交于 2019-12-04 12:28:24
I am new at working with word2vec. I need to fine tune my word2vec model. I have 2 datasets: data1 and data2 what i did so far is : model = gensim.models.Word2Vec( data1, size=size_v, window=size_w, min_count=min_c, workers=work) model.train(data1, total_examples=len(data1), epochs=epochs) model.train(data2, total_examples=len(data2), epochs=epochs) Is this correct? Do I need to store learned weights somewhere? I checked this answer and this one but I couldn't understand how it's done. Can someone explain to me the steps to follow? Thank you in advance Note you don't need to call train() with