data-science

Does Python's datatable package support out-of-memory datasets?

ぐ巨炮叔叔 提交于 2019-12-11 15:56:12
问题 datatable is a relatively fresh high performance DataFrame/data.table alternative for Python. The datatable documentation states: It focuses on: big data support, high performance, both in-memory and out-of-memory datasets, and multi-threaded algorithms. Still, haven't found operations related to caching or keeping a part of the data out-of-memory. In what sense does it support out-of-memory datasets? 来源: https://stackoverflow.com/questions/56572117/does-pythons-datatable-package-support-out

Convert Class Probabilities of a multiclass model to scores in range 0-100

筅森魡賤 提交于 2019-12-11 15:54:30
问题 What I want to do is to generate a score of 0-100 based on the predictions of a three class classification model. For eg. The predict_proba of a 3 class logistic regression model gives me 3 probabilities x, y, z as shown below - 0 1 2 x y z Now, I want to generate a score of 0-100 based on these probabilities, where 0 is closer to class 0 and 100 is closer to class 2. 回答1: Try this: prob['P']=(prob['1']*1+prob['2']*2)/2 prob['0'] is multiplied by 0, so you don't need it. examples: prob['0']=0

Plot ROC curve from multiclass classifier with varying probability using scikit

自闭症网瘾萝莉.ら 提交于 2019-12-11 15:28:16
问题 The output of my multi-class classifier looks like this as shown below for which i need to plot ROC curve and get auc Utterence Actual Predicted Conf_intent1 Conf_Intent2 Conf_Intent3 Uttr 1 Intent1 Intent1 0.86 0.45 0.24 Uttr2 Intent3 Intent2 0.47 0.76 0.55 Uttr3 Intent1 Intent1 0.70 0.20 0.44 Uttr4 Intent3 Intent2 0.42 0.67 0.56 Uttr5 Intent1 Intent1 0.70 0.55 0.36 Note: Probability is done on absolute scoring so will not add to 1 for particular utterence the highest probability will be

Pandas scale multiple columns at once and inverse transform with groupby()

ε祈祈猫儿з 提交于 2019-12-11 15:09:56
问题 I have a dataframe like below.I want to apply two MinMaxscalers on x_data ad y_data on multiple columns and then inverse transform should give me the actual values.Please suggest and help me on this.Thanks in advance DataFrame: X_data y_data Customer 0 1 2 3 Customer 0 1 0 A 855.0 989.0 454.0 574.0 A 395.0 162.0 1 A 989.0 454.0 574.0 395.0 A 162.0 123.0 2 A 454.0 574.0 395.0 162.0 A 123.0 342.0 3 A 574.0 395.0 162.0 123.0 A 342.0 232.0 4 A 395.0 162.0 123.0 342.0 A 232.0 657.0 5 B 875.0 999.0

How to classify both sentiment and genres from movie reviews using CNN Tensorflow

自古美人都是妖i 提交于 2019-12-11 14:18:38
问题 I am trying to classify sentiment on movie review and predict the genres of that movie based on the review itself. Now Sentiment is a Binary Classification problem where as Genres can be Multi-Label Classification problem. Another example to clarify the problem is classifying Sentiment of a sentence and also predicting whether the tone of the sentence is happy, sarcastic, sad, pitiful, angry or fearful. More to that is, I want to perform this classification using Tensorflow CNN. My problem is

AttributeError: 'Int64Index' object has no attribute 'month'

心已入冬 提交于 2019-12-11 12:06:09
问题 I have some time series data with three separate colums (Date, Time, kW) that looks like this: Date Time kW 3/1/2011 12:15:00 AM 171.36 3/1/2011 12:30:00 AM 181.44 3/1/2011 12:45:00 AM 175.68 3/1/2011 1:00:00 AM 180.00 3/1/2011 1:15:00 AM 175.68 And reading the csv file directly from Pandas I can parse the Date & Time: df= pd.read_csv('C:\\Users\\desktop\\master.csv', parse_dates=[['Date', 'Time']]) Which appears to work nicely, but the problem is I want to create another data frame in Pandas

Disconnect points to plot overlay in Vega-lite / Vega

徘徊边缘 提交于 2019-12-11 08:33:31
问题 An example in vega-editor here I don’t want dateTime 5 & dateTime 7 to be connected since they are not consecutive. Idea is to plot on overlay based on some condition and connect only when the count is >=5. Has anyone tried this already? 回答1: You can replace your filter statement: {"filter": "datum.count >= 5"} With a calculate statement that sets filtered values to null: {"as": "count", "calculate": "if(datum.count >= 5, datum.count, null)"} The result is here 来源: https://stackoverflow.com

Replacing a node in a frozen Tensorflow model

主宰稳场 提交于 2019-12-11 08:26:51
问题 I have a frozen inference graph stored in a .pb file , which was obtained from a trained Tensorflow model by the freeze_graph function. Suppose, for simplicity, that I would like to change some of the sigmoid activations in the model to tanh activations (and let's not discuss whether this is a good idea). How can this be done with access only to the frozen graph in the .pb file, and without the possibility to retrain the model? I am aware of the Graph Editor library in tf.contrib , which

Error:-too many values to unpack (expected 2), while trying to iterate over two columns in a Data Frame

我只是一个虾纸丫 提交于 2019-12-11 07:20:31
问题 for L,M in laundry1['latitude'],laundry1['longitude']: print('latitude:-') print(L) print('longitude:-') print(M) i am trying to iterate over the two columns of a data-frame, assigning there value to L & M and printing there value but it shows error of "too many values to unpack (expected 2) " view of the dataset with error view ->enter image description here sample output: latitude:- 22.1449787 18.922290399999998 22.1544736 22.136872 22.173595499999998 longitude:- -101.0056829 -99.234332

What is the solution python gives me “ValueError: setting an array element with a sequence.”

老子叫甜甜 提交于 2019-12-11 07:04:54
问题 I am running the code below but it's giving me an error about arrays. I have tried to find a solution and somehow understand the problem but I couldn't solve the problem. Here is my code: import tensorflow as tf import pandas as pa import numpy as np iris = pa.read_csv("iris.csv", names = ['F1', 'F2', 'F3', 'F4', 'class']) print(iris.head(5)) iris['class'].value_counts() #mapping data A1 = np.asarray([1,0,0]) A2 = np.asarray([0,1,0]) A3 = np.asarray([0,0,1]) Irises = {'Iris-setosa' : A1, 'two