canopy | 易学教程

Memory Error at Python while converting to array

阅读更多关于 Memory Error at Python while converting to array

问题 My code is shown below: from sklearn.datasets import load_svmlight_files import numpy as np perm1 =np.random.permutation(25000) perm2 = np.random.permutation(25000) X_tr, y_tr, X_te, y_te = load_svmlight_files(("dir/file.feat", "dir/file.feat")) #randomly shuffle data X_train = X_tr[perm1,:].toarray()[:,0:2000] y_train = y_tr[perm1]>5 #turn into binary problem The code works fine until here, but when I try to convert one more object to an array, my program returns a memory error. Code: X_test

Run scripts simultaniously in Canopy

阅读更多关于 Run scripts simultaniously in Canopy

This is an update request on this 4 year old question. I have been using Canopy for many years but one draw back is that I can't debug a different project while another is running. I often run code that takes about an hour so it would be ideal to be able to run one project while working on another. In short, I would like to use multiple kernels integrated with the IDE. Perhaps as many as three as I have four cores. Canopy is my default an only python on my windows machine. I am using Canopy 2.1.3 with python 3.5 The use-case is clear, but is not yet implemented, though it is getting closer.

实战Mahout聚类算法Canopy+K-means

阅读更多关于实战Mahout聚类算法Canopy+K-means

Mahout是Apache的顶级开源项目，它由Lucene衍生而来，且基于Hadoop的，对处理大规模数据的机器学习的经典算法提供了高效的实现。其中，对经典的聚类算法即提供了单机实现，同时也提供了基于hadoop分布式的实现，都是非常好的学习资料。聚类分析聚类（Clustering）可以简单的理解为将数据对象分为多个簇（Cluster），每个簇里的所有数据对象具有一定的相似性，这样一个簇可以看多一个整体对待，以此可以提高计算质量或减少计算量。而数据对象间相似性的衡量有不少经典算法可以用，但它们所需的数据结构基本一致，那就是向量；常见的有欧几里得距离算法、余弦距离算法、皮尔逊相关系数算法等，Mahout对此都提供了实现，并且你可以在实现自己的聚类时，通过接口切换不同的距离算法。数据模型在Mahout的聚类分析的计算过程中，数据对象会转化成向量（ Vector ）参与运算，在Mahout中的接口是 org.apache.mahout.math.Vector 它里面每个域用一个浮点数（ double ）表示，你可以通过继承Mahout里的基类如： AbstractVector来实现自己的向量模型，也可以直接使用一些它提供的已有实现如下： 1. DenseVector，它的实现就是一个浮点数数组，对向量里所有域都进行存储，适合用于存储密集向量。 2.

Running winpdb from within Enthought Canopy on MacOS 10.9.2

阅读更多关于 Running winpdb from within Enthought Canopy on MacOS 10.9.2

I have Enthought Canopy 1.4 installed on MacOS 10.9.2. Trying to run the winpdb debugger results in the following message: This program needs access to the screen. Please run with a Framework build of python, and only when you are logged in on the main display of your Mac. As a workaround, I tried creating a shell script run.sh , containing PYVER=2.7 PYTHON=/System/Library/Frameworks/Python.framework/Versions/$PYVER/bin/python$PYVER # find the root of the virtualenv, it should be the parent of the dir this script is in ENV=`$PYTHON -c "import os; print os.path.abspath(os.path.join(os.path

MemoryError with python/pandas and large left outer joins

阅读更多关于 MemoryError with python/pandas and large left outer joins

I'm fairly new to both Python and Pandas, and trying to figure out the fastest way to execute a mammoth left outer join between a left dataset with roughly 11 million rows and a right dataset with ~160K rows and four columns. It should be a many-to-one situation but I'd like the join to not kick out an error if there's a duplicate row on the right side. I'm using Canopy Express on a Windows 7 64-bit system with 8 Gb RAM, and I'm pretty much stuck with that. Here's a model of the code I've put together so far: import pandas as pd leftcols = ['a','b','c','d','e','key'] leftdata = pd.read_csv(

pyside-rcc “dyld: Library not loaded:…”

阅读更多关于 pyside-rcc “dyld: Library not loaded:…”

问题 I'm a python and Qt rookie and I have some problems running pyside-rcc (and pyrcc4). The problem is not the link to the executable but a library problem it seems. That I'm not a unix wizard, probably doesn't help either:) When I run $ pyside-rcc i get the following error dyld: Library not loaded: @rpath/lib/QtCore.framework/Versions/4/QtCore Referenced from: /Users/[USERNAME]/Library/Enthought/Canopy_64bit/User/bin/pyside-rcc Reason: image not found Trace/BPT trap: 5 For $ pyrcc4 the error is

scipy with enthought canopy

阅读更多关于 scipy with enthought canopy

问题 I am evaluating the Enthought package. I installed the 32bit canopy (downloaded from https://www.enthought.com/downloads/) in Ubuntu. $ sudo bash canopy-1.0.1-rh5-32.sh Upon testing I don't see scipy in /usr/local/Canopy/appdata/canopy-1.0.0.1160.rh5-x86/lib/python2.7/site-packages : $ /usr/local/Canopy/appdata/canopy-1.0.0.1160.rh5-x86/bin/python Enthought Canopy Python 2.7.3 | 32-bit | (default, Mar 25 2013, 15:45:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2 Type "help", "copyright

Error: Line magic function

阅读更多关于 Error: Line magic function

I'm trying to read a file using python and I keep getting this error ERROR: Line magic function `%user_vars` not found. My code is very basic just names = read_csv('Combined data.csv') names.head() I get this for anytime I try to read or open a file. I tried using this thread for help. ERROR: Line magic function `%matplotlib` not found I'm using enthought canopy and I have IPython version 2.4.1. I made sure to update using the IPython installation page for help. I'm not sure what's wrong because it should be very simple to open/read files. I even get this error for opening text files. EDIT: I

HTML not rendering properly with Canopy 1.7.1.3323 / IPython 4.1.2

阅读更多关于 HTML not rendering properly with Canopy 1.7.1.3323 / IPython 4.1.2

I've just upgraded to Canopy 1.7.1; I think this problem stems from the change in IPython version from 2.4.1 to 4.1.2. The issue I have is that calling a DataFrame object in Python seems to use the __print__ method, i.e. there's no difference between typing print df and df into the interpreter, and unfortunately this gives me an all-text output rather than the nice tables I normally get. So I get something that looks exactly like this when I call df rather than a table: date flag 1 20151102 0 98663 20151101 1 This happened immediately after the upgrade, and I also tried updating all my

Memory Error at Python while converting to array

阅读更多关于 Memory Error at Python while converting to array

My code is shown below: from sklearn.datasets import load_svmlight_files import numpy as np perm1 =np.random.permutation(25000) perm2 = np.random.permutation(25000) X_tr, y_tr, X_te, y_te = load_svmlight_files(("dir/file.feat", "dir/file.feat")) #randomly shuffle data X_train = X_tr[perm1,:].toarray()[:,0:2000] y_train = y_tr[perm1]>5 #turn into binary problem The code works fine until here, but when I try to convert one more object to an array, my program returns a memory error. Code: X_test = X_te[perm2,:].toarray()[:,0:2000] Error: -----------------------------------------------------------