Based on this comment and the referenced documentation, Pickle 4.0+ from Python 3.4+ should be able to pickle byte objects larger than 4 GB.
However, using python 3
I also found this issue, to solve this problem i chunk the code into several iteration. Let say in this case i have 50.000 data which i have to calc tf-idf and do knn classfication. When i run and directly iterate 50.000 it give me "that error". So, to solve this problem i chunk it.
tokenized_documents = self.load_tokenized_preprocessing_documents()
idf = self.load_idf_41227()
doc_length = len(documents)
for iteration in range(0, 9):
tfidf_documents = []
for index in range(iteration, 4000):
doc_tfidf = []
for term in idf.keys():
tf = self.term_frequency(term, tokenized_documents[index])
doc_tfidf.append(tf * idf[term])
doc = documents[index]
tfidf = [doc_tfidf, doc[0], doc[1]]
tfidf_documents.append(tfidf)
print("{} from {} document {}".format(index, doc_length, doc[0]))
self.save_tfidf_41227(tfidf_documents, iteration)