pickle | 易学教程

How do I pickle the scrape data instead of printing the data?

阅读更多关于 How do I pickle the scrape data instead of printing the data?

问题 When I try to pickle the data I get a syntax error. File "C:\Users\Jeanne\Desktop\PYPDIT\untitled3.py", line 33 !mkdir transcripts ^ SyntaxError: invalid syntax import requests from bs4 import BeautifulSoup import pickle urls = ['http://feeds.nos.nl/nosnieuwstech', 'http://feeds.nos.nl/nosnieuwsalgemeen'] with requests.Session() as s: for url in urls: page = s.get(url).text soup = BeautifulSoup(page, "lxml") print(url) print([[i.text for i in desc.select('p')] for desc in soup.select(

Python 2.7 and 3.7.2 compatible django-redis serializer

阅读更多关于 Python 2.7 and 3.7.2 compatible django-redis serializer

问题 I'm trying to write a py2.7 - py3.7 compatible django-redis serializer. I'm using django-redis==4.8.0 with django==1.11.22 and the PickleSerializer . I saw this issue https://github.com/niwinz/django-redis/pull/279 on django-redis and wrote a serializer similar to what's said in the thread. However my object seems a little bit more complex? Not sure. My goal is to have 2 applications running at the same time, one with py2.7 and the other with py3.7. They have to be 100% compatible, and I'm

Loading XGBoost Model: ModuleNotFoundError: No module named 'sklearn.preprocessing._label'

阅读更多关于 Loading XGBoost Model: ModuleNotFoundError: No module named 'sklearn.preprocessing._label'

问题 I'm having issues loading a pretrained xgboost model using the following code: xgb_model = pickle.load(open('churnfinalunscaled.pickle.dat', 'rb')) And when I do that, I get the following error: ModuleNotFoundError Traceback (most recent call last) <ipython-input-29-31e7f426e19e> in <module>() ----> 1 xgb_model = pickle.load(open('churnfinalunscaled.pickle.dat', 'rb')) ModuleNotFoundError: No module named 'sklearn.preprocessing._label' I haven't seen anything online so any help would be much

Convert numpy array type and values from Float64 to Float32

阅读更多关于 Convert numpy array type and values from Float64 to Float32

问题 I am trying to convert threshold array(pickle file of isolation forest from scikit learn) of type from Float64 to Float32 for i in range(len(tree.tree_.threshold)): tree.tree_.threshold[i] = tree.tree_.threshold[i].astype(np.float32) Then Printing it for value in tree.tree_.threshold[:5]: print(type(value)) print(value) the output i am getting is : <class 'numpy.float64'> 526226.0 <class 'numpy.float64'> 91.9514312744 <class 'numpy.float64'> 3.60330319405 <class 'numpy.float64'> -2.0 <class

常用内置模块（一）——time、os、sys、random、shutil、pickle、json

阅读更多关于常用内置模块（一）——time、os、sys、random、shutil、pickle、json

常用内置模块一、time模块在python中，时间分为3种 1.时间戳： timestamp，从1970年1月1日到现在的秒数，主要用于计算两个时间的差　 2.localtime：本地时间表示的是计算机当前所在的位置　　3.UTC：世界协调时间 import time # 时间戳，以秒做单位 print (time.time()) # localtime 结构化时间 print (time.localtime()) # UTC时间 print (time.gmtime()) # 格式化时间 print (time.strftime( ' %Y-%m-%d %H:%M:%S ' , time.localtime())) # 时间戳--->结构化 print (time.localtime(time.time())) # 结构化--->时间戳 print (time.mktime(time.localtime())) time.sleep( 5 ) print ( ' weekup ' ) View Code 二、datetime模块 1、python实现的一个时间处理模块，time用起来不太方便，所以就有了datetime 2、优点：datetime相比time，更灵活 3、timedelta表示时间差　两个时间差可以加减乘除　时间差和datetime

写给程序员的机器学习入门 (四)

阅读更多关于写给程序员的机器学习入门 (四)

这篇将会着重介绍使用 pytorch 进行机器学习训练过程中的一些常见技巧，掌握它们可以让你事半功倍。使用的代码大部分会基于上一篇最后一个例子，即根据码农条件预测工资🙀，如果你没看上一篇请点击这里查看。保存和读取模型状态在 pytorch 中各种操作都是围绕 tensor 对象来的，模型的参数也是 tensor，如果我们把训练好的 tensor 保存到硬盘然后下次再从硬盘读取就可以直接使用了。我们先来看看如何保存单个 tensor，以下代码运行在 python 的 REPL 中： # 引用 pytorch >>> import torch # 新建一个 tensor 对象 >>> a = torch.tensor([1, 2, 3], dtype=torch.float) # 保存 tensor 到文件 1.pt >>> torch.save(a, "1.pt") # 从文件 1.pt 读取 tensor >>> b = torch.load("1.pt") >>> b tensor([1., 2., 3.]) torch.save 保存 tensor 的时候会使用 python 的 pickle 格式，这个格式保证在不同的 python 版本间兼容，但不支持压缩内容，所以如果 tensor 非常大保存的文件将会占用很多空间，我们可以在保存前压缩

写给程序员的机器学习入门 (四)

阅读更多关于写给程序员的机器学习入门 (四)

Git和Github的基本操作

阅读更多关于 Git和Github的基本操作

什么是GIT？ Git是一个免费、开源的版本控制软件，目前世界上最先进的分布式版本控制系统（没有之一）什么是版本控制系统？版本控制是一种记录一个或若干个文件内容变化，以便将来查阅特定版本修订情况得系统。系统具体功能　　　　记录文件的所有历史变化　　　　随时可恢复到任何一个历史状态　　　　多人协作开发或修改　　　　错误恢复版本控制的工具：　　- svn 集中式版本控制系统　　- git 分布式版本控制系统集中式vs分布式 CVS及SVN都是集中式的版本控制系统，而Git是分布式版本控制系统，集中式和分布式版本控制系统有什么区别呢：集中式版本控制系统，版本库是集中存放在中央服务器的，而干活的时候，用的都是自己的电脑，所以要先从中央服务器取得最新的版本，然后开始干活，干完活了，再把自己的活推送给中央服务器。中央服务器就好比是一个图书馆，你要改一本书，必须先从图书馆借出来，然后回到家自己改，改完了，再放回图书馆。集中式版本控制系统最大的毛病就是必须联网才能工作。分布式版本控制系统根本没有“中央服务器”，每个人的电脑上都是一个完整的版本库，这样，你工作的时候，就不需要联网了，因为版本库就在你自己的电脑上。既然每个人电脑上都有一个完整的版本库，那多个人如何协作呢？比方说你在自己电脑上改了文件A，你的同事也在他的电脑上改了文件A，这时

Python 关于列表字典的键值修改

阅读更多关于 Python 关于列表字典的键值修改

<h2 id="toc_0">list (修改列表的索引值)</h2> <h3 id="toc_1">循环一个列表时，最好不要对原列表有改变大小的操作，这样会影响你的最终结果。</h3> <pre class="line-numbers"><code class="language-python">#使用负索引进行修改列表 print('First') lis = [11, 22, 33, 44, 55] print(lis) for num in range(len(lis)-1,-1,-1): if num % 2 != 0: lis.pop(num) else: print(lis) </code></pre> <pre class="line-numbers"><code class="language-python">#使用步长进行修改列表 print('Second') lis = [11, 22, 33, 44, 55] print(lis) del lis[1::2] print(lis) </code></pre> <pre class="line-numbers"><code class="language-python">#添加新的列表进行修改 print('Third') lis = [11, 22, 33, 44, 55] print(lis) new

[Python]机器学习库scikit-learn实践

阅读更多关于 [Python]机器学习库scikit-learn实践

原文地址https://blog.csdn.net/zouxy09/article/details/48903179，但里面使用Python2 写的，有些库在Python3已经不能用了，做了稍微的改变 1.Python3 没有cPickle import cPickle as pickle 改成 import pickle 2. Python3，默认的编码格式是utf-8， sys . setdefaultencoding 不存在了，所以移除 reload(sys) sys.setdefaultencoding( ' utf8 ' ) 3. 把 train, val, test = pickle.load(f) 改成 train, val, test = pickle.load(f,encoding= ' bytes ' ) 否则会出现这样的错误 UnicodeDecodeError: ' ascii ' codec can ' t decode byte 0x90 in position 614: ordinal not in range(128) 安装一、概述机器学习算法在近几年大数据点燃的热火熏陶下已经变得被人所“熟知”，就算不懂得其中各算法理论，叫你喊上一两个著名算法的名字，你也能昂首挺胸脱口而出。当然了，算法之林虽大，但能者还是有限

订阅 pickle