scikits | 易学教程

How to use dummy variable to represent categorical data in python scikit-learn random forest

阅读更多关于 How to use dummy variable to represent categorical data in python scikit-learn random forest

I'm generating feature vector for random forest classifier of scikit-learn . The feature vector represents the name of 9 protein amino acid residues. There are 20 possible residue names. So, I use 20 dummy variables to represent one residue name, for 9 residue, I have 180 dummy variables. For example, if the 9 residues in the sliding window are: ARNDCQEGH (every one letter represent a name of a protein residue),my feature vector will be: "True\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\t False\tTrue

Something wrong with scikits.talkbox with Python3?

阅读更多关于 Something wrong with scikits.talkbox with Python3?

问题 I am migrating a Python program from 2.10 to 3.6. The packages scikits.talkbox is part of it. However, I cannot figure out how to use it any more. The installation from pip seems to work fine but I cannot import it. Has anyone faced this problem before ? [manjaro@manjaro-pc ~]$ python --version Python 3.6.0 [manjaro@manjaro-pc ~]$ sudo pip install scikits.talkbox Collecting scikits.talkbox Using cached scikits.talkbox-0.2.5.tar.gz Requirement already satisfied: numpy in /usr/lib/python3.6

Missing values in scikits machine learning

阅读更多关于 Missing values in scikits machine learning

Is it possible to have missing values in scikit-learn ? How should they be represented? I couldn't find any documentation about that. Missing values are simply not supported in scikit-learn. There has been discussion on the mailing list about this before, but no attempt to actually write code to handle them. Whatever you do, don't use NaN to encode missing values, since many of the algorithms refuse to handle samples containing NaNs. The above answer is outdated; the latest release of scikit-learn has a class Imputer that does simple, per-feature missing value imputation. You can feed it

Missing values in scikits machine learning

阅读更多关于 Missing values in scikits machine learning

问题 Is it possible to have missing values in scikit-learn ? How should they be represented? I couldn't find any documentation about that. 回答1: Missing values are simply not supported in scikit-learn. There has been discussion on the mailing list about this before, but no attempt to actually write code to handle them. Whatever you do, don't use NaN to encode missing values, since many of the algorithms refuse to handle samples containing NaNs. The above answer is outdated; the latest release of

View onto a numpy array?

阅读更多关于 View onto a numpy array?

I have a 2D numpy array. Is there a way to create a view onto it that would include the first k rows and all columns? The point is to avoid copying the underlying data (the array is so large that making partial copies is not feasible.) Joe Kington Sure, just index it as you normally would. E.g. y = x[:k, :] This will return a view into the original array. No data will be copied, and any updates made to y will be reflected in x and vice versa. Edit: I commonly work with >10GB 3D arrays of uint8's, so I worry about this a lot... Numpy can be very efficient at memory management if you keep a few

View onto a numpy array?

阅读更多关于 View onto a numpy array?

问题 I have a 2D numpy array. Is there a way to create a view onto it that would include the first k rows and all columns? The point is to avoid copying the underlying data (the array is so large that making partial copies is not feasible.) 回答1: Sure, just index it as you normally would. E.g. y = x[:k, :] This will return a view into the original array. No data will be copied, and any updates made to y will be reflected in x and vice versa. Edit: I commonly work with >10GB 3D arrays of uint8's, so