How do I pickle the scrape data instead of printing the data?

痴心易碎 提交于 2020-05-17 06:04:50

问题


When I try to pickle the data I get a syntax error.

 File "C:\Users\Jeanne\Desktop\PYPDIT\untitled3.py", line 33
    !mkdir transcripts
    ^
SyntaxError: invalid syntax

import requests
from bs4 import BeautifulSoup
import pickle

urls = ['http://feeds.nos.nl/nosnieuwstech',
        'http://feeds.nos.nl/nosnieuwsalgemeen']

with requests.Session() as s:
    for url in urls:
        page = s.get(url).text
        soup = BeautifulSoup(page, "lxml")
        print(url)
        print([[i.text for i in desc.select('p')] for desc in soup.select('description')[1:]])
        print('--'*100)

Now I can scrape the text my next step is to be able to save the transcript into a seperate file

Also I want to order the text by place, city of origin

Cities = ['Amsterdam', 'Eindhoven', 'Nijmegen', 'Rotterdam', 'Veenendaal']

# Pickle files for later use

!mkdir transcripts

 for i, c in enumerate(cities):
     with open("transcripts/" + c + ".txt", "wb") as file:
         pickle.dump(transcripts[i], file)

来源:https://stackoverflow.com/questions/61558832/how-do-i-pickle-the-scrape-data-instead-of-printing-the-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!