word count from web text document result in 0

邮差的信 提交于 2021-02-10 18:14:13

问题


I tried the python codes from the article of Rasha Ashraf "Scraping EDGAR with Python". He used urllib2 which is now invalid in python 3, I guess. Thus, I changed it into urllib.

I could bring the following Edgar web page. However, the number of word counting resulted in 0 no matter how I tried to fix the codes. Please help me to fix this problem. FYI, I manually check on the URL page so that "ADDRESS", "TYPE", and "transaction" occur 5 times, 9 times, and 49 times each. Nevertheless, my faulty python result shows 0 results for these three words.

Here are the python codes of Rasha Ashraf amended by me (only the urllib part and web URL). The original URL contains vast text content. So I changed it into a more simple page of the web.

import time
import csv
import sys

CIK = '0001018724'
Year= '2013'
string_match1= 'edgar/data/1018724/000112760220028651/0001127602-20-028651.txt'
url3= 'http://www.sec.gov/Archives/'+string_match1

import urllib.request
 
response3= urllib.request.urlopen(url3)
#output = response3.read()
#print(output)
words=  ['ADDRESS','TYPE', 'transaction']
count= {}
for elem in words:
    count[elem]= 0
    
for line in response3:
    elements= line.split()
    for word in words:
       count[word]= count[word] + elements.count(word)

print (CIK)
print (Year)
print (url3)
print (count)

=> The result of my codes so far

0001018724

2013

http://www.sec.gov/Archives/edgar/data/1018724/000112760220028651/0001127602-20-028651.txt

{'ADDRESS': 0, 'TYPE': 0, 'transaction': 0}

回答1:


To get the correct count of the number of times each of your 3 strings (not words!) appear in the filing, try something like this:

import requests
url = "http://www.sec.gov/Archives/edgar/data/1018724/000112760220028651/0001127602-20-028651.txt"
req = requests.get(url)

words = ['address','type','transaction']
filing = req.text
for word in words:
    print(word,': ',filing.lower().count(word))

Output:

address :  5
type :  9
transaction :  49


来源:https://stackoverflow.com/questions/64812162/word-count-from-web-text-document-result-in-0

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!