canopy

Memory Error at Python while converting to array

我们两清 提交于 2019-12-05 03:40:25
问题 My code is shown below: from sklearn.datasets import load_svmlight_files import numpy as np perm1 =np.random.permutation(25000) perm2 = np.random.permutation(25000) X_tr, y_tr, X_te, y_te = load_svmlight_files(("dir/file.feat", "dir/file.feat")) #randomly shuffle data X_train = X_tr[perm1,:].toarray()[:,0:2000] y_train = y_tr[perm1]>5 #turn into binary problem The code works fine until here, but when I try to convert one more object to an array, my program returns a memory error. Code: X_test

Run scripts simultaniously in Canopy

感情迁移 提交于 2019-12-04 19:28:27
This is an update request on this 4 year old question. I have been using Canopy for many years but one draw back is that I can't debug a different project while another is running. I often run code that takes about an hour so it would be ideal to be able to run one project while working on another. In short, I would like to use multiple kernels integrated with the IDE. Perhaps as many as three as I have four cores. Canopy is my default an only python on my windows machine. I am using Canopy 2.1.3 with python 3.5 The use-case is clear, but is not yet implemented, though it is getting closer.

实战Mahout聚类算法Canopy+K-means

旧巷老猫 提交于 2019-12-04 18:57:09
Mahout是Apache的顶级开源项目,它由Lucene衍生而来,且基于Hadoop的,对处理大规模数据的机器学习的经典算法提供了高效的实现。其中,对经典的聚类算法即提供了单机实现,同时也提供了基于hadoop分布式的实现,都是非常好的学习资料。 聚类分析 聚类(Clustering)可以简单的理解为将数据对象分为多个 簇(Cluster),每个 簇 里的所有数据对象具有一定的相似性,这样一个 簇可以看多一个整体对待,以此可以提高计算质量或减少计算量。而数据对象间相似性的衡量有不少经典算法可以用,但它们所需的数据结构基本一致,那就是向量;常见的有 欧几里得距离算法、余弦距离算法、皮尔逊相关系数算法等,Mahout对此都提供了实现,并且你可以在实现自己的聚类时,通过接口切换不同的距离算法。 数据模型 在Mahout的聚类分析的计算过程中,数据对象会转化成向量( Vector )参与运算,在Mahout中的接口是 org.apache.mahout.math.Vector 它里面每个域用一个浮点数( double )表示,你可以通过继承Mahout里的基类如: AbstractVector来实现自己的向量模型,也可以直接使用一些它提供的已有实现如下: 1. DenseVector,它的实现就是一个浮点数数组,对向量里所有域都进行存储,适合用于存储密集向量。 2.

Running winpdb from within Enthought Canopy on MacOS 10.9.2

对着背影说爱祢 提交于 2019-12-04 18:12:04
I have Enthought Canopy 1.4 installed on MacOS 10.9.2. Trying to run the winpdb debugger results in the following message: This program needs access to the screen. Please run with a Framework build of python, and only when you are logged in on the main display of your Mac. As a workaround, I tried creating a shell script run.sh , containing PYVER=2.7 PYTHON=/System/Library/Frameworks/Python.framework/Versions/$PYVER/bin/python$PYVER # find the root of the virtualenv, it should be the parent of the dir this script is in ENV=`$PYTHON -c "import os; print os.path.abspath(os.path.join(os.path

MemoryError with python/pandas and large left outer joins

折月煮酒 提交于 2019-12-04 17:51:31
I'm fairly new to both Python and Pandas, and trying to figure out the fastest way to execute a mammoth left outer join between a left dataset with roughly 11 million rows and a right dataset with ~160K rows and four columns. It should be a many-to-one situation but I'd like the join to not kick out an error if there's a duplicate row on the right side. I'm using Canopy Express on a Windows 7 64-bit system with 8 Gb RAM, and I'm pretty much stuck with that. Here's a model of the code I've put together so far: import pandas as pd leftcols = ['a','b','c','d','e','key'] leftdata = pd.read_csv(

pyside-rcc “dyld: Library not loaded:…”

半城伤御伤魂 提交于 2019-12-04 05:31:43
问题 I'm a python and Qt rookie and I have some problems running pyside-rcc (and pyrcc4). The problem is not the link to the executable but a library problem it seems. That I'm not a unix wizard, probably doesn't help either:) When I run $ pyside-rcc i get the following error dyld: Library not loaded: @rpath/lib/QtCore.framework/Versions/4/QtCore Referenced from: /Users/[USERNAME]/Library/Enthought/Canopy_64bit/User/bin/pyside-rcc Reason: image not found Trace/BPT trap: 5 For $ pyrcc4 the error is

scipy with enthought canopy

我的梦境 提交于 2019-12-04 05:19:52
问题 I am evaluating the Enthought package. I installed the 32bit canopy (downloaded from https://www.enthought.com/downloads/) in Ubuntu. $ sudo bash canopy-1.0.1-rh5-32.sh Upon testing I don't see scipy in /usr/local/Canopy/appdata/canopy-1.0.0.1160.rh5-x86/lib/python2.7/site-packages : $ /usr/local/Canopy/appdata/canopy-1.0.0.1160.rh5-x86/bin/python Enthought Canopy Python 2.7.3 | 32-bit | (default, Mar 25 2013, 15:45:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2 Type "help", "copyright

Error: Line magic function

房东的猫 提交于 2019-12-04 03:16:52
I'm trying to read a file using python and I keep getting this error ERROR: Line magic function `%user_vars` not found. My code is very basic just names = read_csv('Combined data.csv') names.head() I get this for anytime I try to read or open a file. I tried using this thread for help. ERROR: Line magic function `%matplotlib` not found I'm using enthought canopy and I have IPython version 2.4.1. I made sure to update using the IPython installation page for help. I'm not sure what's wrong because it should be very simple to open/read files. I even get this error for opening text files. EDIT: I

HTML not rendering properly with Canopy 1.7.1.3323 / IPython 4.1.2

Deadly 提交于 2019-12-04 03:13:27
I've just upgraded to Canopy 1.7.1; I think this problem stems from the change in IPython version from 2.4.1 to 4.1.2. The issue I have is that calling a DataFrame object in Python seems to use the __print__ method, i.e. there's no difference between typing print df and df into the interpreter, and unfortunately this gives me an all-text output rather than the nice tables I normally get. So I get something that looks exactly like this when I call df rather than a table: date flag 1 20151102 0 98663 20151101 1 This happened immediately after the upgrade, and I also tried updating all my

Memory Error at Python while converting to array

时间秒杀一切 提交于 2019-12-03 21:57:27
My code is shown below: from sklearn.datasets import load_svmlight_files import numpy as np perm1 =np.random.permutation(25000) perm2 = np.random.permutation(25000) X_tr, y_tr, X_te, y_te = load_svmlight_files(("dir/file.feat", "dir/file.feat")) #randomly shuffle data X_train = X_tr[perm1,:].toarray()[:,0:2000] y_train = y_tr[perm1]>5 #turn into binary problem The code works fine until here, but when I try to convert one more object to an array, my program returns a memory error. Code: X_test = X_te[perm2,:].toarray()[:,0:2000] Error: -----------------------------------------------------------