问题
I'm trying to read the source code from a website 100 lines at a time
For example:
self.code = urllib.request.urlopen(uri)
#Get 100 first lines
self.lines = self.getLines()
...
#Get 100 next lines
self.lines = self.getLines()
My getLines code is like this:
def getLines(self):
res = []
i = 0
while i < 100:
res.append(str(self.code.readline()))
i+=1
return res
But the problem is that getLines() always returns the first 100 lines of the code.
I've seen some solutions with next() or tell() and seek(), but it seems that those functions are not implemented in HTTPResponse class.
回答1:
according to the documentation urllib.request.urlopen(uri) returns a file like object, so you should be able to do:
from itertools import islice
def getLines(self)
res = []
for line in islice(self.code,100):
res.append(line)
return res
there's more information on islice in the itertools documentation. Using iterators will avoid the while loop and manual increments.
If you absolutely must use readline(), it's advisable to use a for loop, i.e.
for i in xrange(100):
...
回答2:
This worked for me.
#!/usr/bin/env python
import urllib
def getLines(code):
res = []
i = 0
while i < 100:
res.append(str(code.readline()))
i+=1
return res
uri='http://www.google.com/'
code = urllib.urlopen(uri)
#Get 100 first lines
lines = getLines(code)
print lines
#Get 100 next lines
lines = getLines(code)
print lines
来源:https://stackoverflow.com/questions/10249673/python-read-lines-of-website-source-code-100-lines-at-a-time