问题
Following my question here: http://stackoverflow.com/questions/37844596/avoid-memory-error-when-dealing-with-large-arrays
, I was able to deal with the Memory Error
due to arrays operations by splitting them into several lines; thank to the guys responded. The problem now is it's throwing Memory Error
when fitting the data using Sklearn packages
; e.g when trying to do .fit(arr_3d[i])
to km
in the code below.
The array dimension is 3D, and I'm looping through it, so why I'm having this error? and how to fix it? note it doesn't happen all the time, sometimes it works fine with no error, not sure why either.
Whole code is:
def home(request):
if request.method=="POST":
img = UploadForm(request.POST, request.FILES)
no_clus = int(request.POST.get('num_clusters', 10))
if img.is_valid():
paramFile =io.TextIOWrapper(request.FILES['File'].file)
portfolio1 = csv.DictReader(paramFile)
users = []
users = [row["BASE_NAME"] for row in portfolio1]
my_list = users
vectorizer = CountVectorizer()
dtm = vectorizer.fit_transform(my_list)
lsa = TruncatedSVD(n_components=100)
dtm_lsa = lsa.fit_transform(dtm)
dtm_lsa = Normalizer(copy=False).fit_transform(dtm_lsa)
product= (np.dot(dtm_lsa, dtm_lsa.T))
dist1 = (1 - product)
k = len(my_list) ### length is 5362
data2 = np.asarray(dist1)
arr_3d = data2.reshape((1, k, k))
print(arr_3d) ### shown below
print(len(arr_3d))
no_cluster = number_cluster(request,len(my_list))
print(no_cluster)
for i in range(len(arr_3d)):
#km = AgglomerativeClustering(n_clusters=no_clus, linkage='ward')
#km = km.fit(arr_3d[i])
# km = KMeans(n_clusters=no_cluster, init='k-means++')
km = AgglomerativeClustering(n_clusters=no_cluster, linkage='complete')
km = km.fit(arr_3d[i])
#km = AgglomerativeClustering(n_clusters=no_cluster, linkage='average').fit(arr_3d[i])
# km = AgglomerativeClustering(n_clusters=no_clus, linkage='complete').fit(arr_3d[i])
# km = MeanShift()
# km = KMeans(n_clusters=no_clus, init='k-means++')
# km = MeanShift()
# km = km.fit(arr_3d[i])
# print km
labels = km.labels_
csvfile = settings.MEDIA_ROOT +'\\'+ 'images\\export.csv'
csv_input = pd.read_csv(csvfile, encoding='latin-1')
csv_input['cluster_ID'] = labels
csv_input['BASE_NAME'] = my_list
csv_input.to_csv(settings.MEDIA_ROOT +'/'+ 'output.csv', index=False)
arr_3d is:
[[[ 0.00000000e+00 9.87752905e-01 1.00070800e+00 ..., 8.93937985e-01
1.00352321e+00 1.00481892e+00]
[ 9.87752905e-01 -2.22044605e-16 1.00107768e+00 ..., 9.80156085e-01
1.00047940e+00 1.00059883e+00]
[ 1.00070800e+00 1.00107768e+00 -6.66133815e-16 ..., 9.97548342e-01
9.99890765e-01 1.00143594e+00]
...,
[ 8.93937985e-01 9.80156085e-01 9.97548342e-01 ..., -2.22044605e-16
2.34431311e-01 9.87267801e-01]
[ 1.00352321e+00 1.00047940e+00 9.99890765e-01 ..., 2.34431311e-01
-2.22044605e-16 1.00152421e+00]
[ 1.00481892e+00 1.00059883e+00 1.00143594e+00 ..., 9.87267801e-01
1.00152421e+00 3.33066907e-16]]]
来源:https://stackoverflow.com/questions/37890921/memory-error-when-fitting-the-data-using-sklearn-package