data-science | 易学教程

Does Python's datatable package support out-of-memory datasets?

阅读更多关于 Does Python's datatable package support out-of-memory datasets?

问题 datatable is a relatively fresh high performance DataFrame/data.table alternative for Python. The datatable documentation states: It focuses on: big data support, high performance, both in-memory and out-of-memory datasets, and multi-threaded algorithms. Still, haven't found operations related to caching or keeping a part of the data out-of-memory. In what sense does it support out-of-memory datasets? 来源： https://stackoverflow.com/questions/56572117/does-pythons-datatable-package-support-out

Convert Class Probabilities of a multiclass model to scores in range 0-100

阅读更多关于 Convert Class Probabilities of a multiclass model to scores in range 0-100

问题 What I want to do is to generate a score of 0-100 based on the predictions of a three class classification model. For eg. The predict_proba of a 3 class logistic regression model gives me 3 probabilities x, y, z as shown below - 0 1 2 x y z Now, I want to generate a score of 0-100 based on these probabilities, where 0 is closer to class 0 and 100 is closer to class 2. 回答1: Try this: prob['P']=(prob['1']*1+prob['2']*2)/2 prob['0'] is multiplied by 0, so you don't need it. examples: prob['0']=0

Plot ROC curve from multiclass classifier with varying probability using scikit

阅读更多关于 Plot ROC curve from multiclass classifier with varying probability using scikit

问题 The output of my multi-class classifier looks like this as shown below for which i need to plot ROC curve and get auc Utterence Actual Predicted Conf_intent1 Conf_Intent2 Conf_Intent3 Uttr 1 Intent1 Intent1 0.86 0.45 0.24 Uttr2 Intent3 Intent2 0.47 0.76 0.55 Uttr3 Intent1 Intent1 0.70 0.20 0.44 Uttr4 Intent3 Intent2 0.42 0.67 0.56 Uttr5 Intent1 Intent1 0.70 0.55 0.36 Note: Probability is done on absolute scoring so will not add to 1 for particular utterence the highest probability will be

Pandas scale multiple columns at once and inverse transform with groupby()

阅读更多关于 Pandas scale multiple columns at once and inverse transform with groupby()

问题 I have a dataframe like below.I want to apply two MinMaxscalers on x_data ad y_data on multiple columns and then inverse transform should give me the actual values.Please suggest and help me on this.Thanks in advance DataFrame: X_data y_data Customer 0 1 2 3 Customer 0 1 0 A 855.0 989.0 454.0 574.0 A 395.0 162.0 1 A 989.0 454.0 574.0 395.0 A 162.0 123.0 2 A 454.0 574.0 395.0 162.0 A 123.0 342.0 3 A 574.0 395.0 162.0 123.0 A 342.0 232.0 4 A 395.0 162.0 123.0 342.0 A 232.0 657.0 5 B 875.0 999.0

How to classify both sentiment and genres from movie reviews using CNN Tensorflow

阅读更多关于 How to classify both sentiment and genres from movie reviews using CNN Tensorflow

问题 I am trying to classify sentiment on movie review and predict the genres of that movie based on the review itself. Now Sentiment is a Binary Classification problem where as Genres can be Multi-Label Classification problem. Another example to clarify the problem is classifying Sentiment of a sentence and also predicting whether the tone of the sentence is happy, sarcastic, sad, pitiful, angry or fearful. More to that is, I want to perform this classification using Tensorflow CNN. My problem is

AttributeError: 'Int64Index' object has no attribute 'month'

阅读更多关于 AttributeError: 'Int64Index' object has no attribute 'month'

问题 I have some time series data with three separate colums (Date, Time, kW) that looks like this: Date Time kW 3/1/2011 12:15:00 AM 171.36 3/1/2011 12:30:00 AM 181.44 3/1/2011 12:45:00 AM 175.68 3/1/2011 1:00:00 AM 180.00 3/1/2011 1:15:00 AM 175.68 And reading the csv file directly from Pandas I can parse the Date & Time: df= pd.read_csv('C:\\Users\\desktop\\master.csv', parse_dates=[['Date', 'Time']]) Which appears to work nicely, but the problem is I want to create another data frame in Pandas

Disconnect points to plot overlay in Vega-lite / Vega

阅读更多关于 Disconnect points to plot overlay in Vega-lite / Vega

问题 An example in vega-editor here I don’t want dateTime 5 & dateTime 7 to be connected since they are not consecutive. Idea is to plot on overlay based on some condition and connect only when the count is >=5. Has anyone tried this already? 回答1: You can replace your filter statement: {"filter": "datum.count >= 5"} With a calculate statement that sets filtered values to null: {"as": "count", "calculate": "if(datum.count >= 5, datum.count, null)"} The result is here 来源： https://stackoverflow.com

Replacing a node in a frozen Tensorflow model

阅读更多关于 Replacing a node in a frozen Tensorflow model

问题 I have a frozen inference graph stored in a .pb file , which was obtained from a trained Tensorflow model by the freeze_graph function. Suppose, for simplicity, that I would like to change some of the sigmoid activations in the model to tanh activations (and let's not discuss whether this is a good idea). How can this be done with access only to the frozen graph in the .pb file, and without the possibility to retrain the model? I am aware of the Graph Editor library in tf.contrib , which

Error:-too many values to unpack (expected 2), while trying to iterate over two columns in a Data Frame

阅读更多关于 Error:-too many values to unpack (expected 2), while trying to iterate over two columns in a Data Frame

问题 for L,M in laundry1['latitude'],laundry1['longitude']: print('latitude:-') print(L) print('longitude:-') print(M) i am trying to iterate over the two columns of a data-frame, assigning there value to L & M and printing there value but it shows error of "too many values to unpack (expected 2) " view of the dataset with error view ->enter image description here sample output: latitude:- 22.1449787 18.922290399999998 22.1544736 22.136872 22.173595499999998 longitude:- -101.0056829 -99.234332

What is the solution python gives me “ValueError: setting an array element with a sequence.”

阅读更多关于 What is the solution python gives me “ValueError: setting an array element with a sequence.”

问题 I am running the code below but it's giving me an error about arrays. I have tried to find a solution and somehow understand the problem but I couldn't solve the problem. Here is my code: import tensorflow as tf import pandas as pa import numpy as np iris = pa.read_csv("iris.csv", names = ['F1', 'F2', 'F3', 'F4', 'class']) print(iris.head(5)) iris['class'].value_counts() #mapping data A1 = np.asarray([1,0,0]) A2 = np.asarray([0,1,0]) A3 = np.asarray([0,0,1]) Irises = {'Iris-setosa' : A1, 'two