问题
I have a following output after scraping a web page
text
Out[50]:
['\nAbsolute FreeBSD, 2nd Edition\n',
'\nAbsolute OpenBSD, 2nd Edition\n',
'\nAndroid Security Internals\n',
'\nApple Confidential 2.0\n',
'\nArduino Playground\n',
'\nArduino Project Handbook\n',
'\nArduino Workshop\n',
'\nArt of Assembly Language, 2nd Edition\n',
'\nArt of Debugging\n',
'\nArt of Interactive Design\n',]
I need to strip \n from above list while iterating over it. Following is my code
text = []
for name in web_text:
a = name.get_text()
text.append(a)
回答1:
Rather than calling .strip() explicitly, use the strip argument:
a = name.get_text(strip=True)
This would also remove the extra whitespace and newline characters in the children texts if any.
回答2:
Just like you would strip any other string:
text = []
for name in web_text:
a = name.get_text().strip()
text.append(a)
回答3:
You can use list comprehension:
stripedText = [ t.strip() for t in text ]
Which outputs:
>>> stripedText
['Absolute FreeBSD, 2nd Edition', 'Absolute OpenBSD, 2nd Edition', 'Android Security Internals', 'Apple Confidential 2.0', 'Arduino Playground', 'Arduino Project Handbook', 'Arduino Workshop', 'Art of Assembly Language, 2nd Edition', 'Art of Debugging', 'Art of Interactive Design']
回答4:
The following way helps you to strip \n from above list while iterating over it.
>>> web_text = ['\nAbsolute FreeBSD, 2nd Edition\n',
... '\nAbsolute OpenBSD, 2nd Edition\n',
... '\nAndroid Security Internals\n',
... '\nApple Confidential 2.0\n',
... '\nArduino Playground\n',
... '\nArduino Project Handbook\n',
... '\nArduino Workshop\n',
... '\nArt of Assembly Language, 2nd Edition\n',
... '\nArt of Debugging\n',
... '\nArt of Interactive Design\n',]
>>> text = []
>>> for line in web_text:
... a = line.strip()
... text.append(a)
...
>>> text
['Absolute FreeBSD, 2nd Edition', 'Absolute OpenBSD, 2nd Edition', 'Android
Security Internals', 'Apple Confidential 2.0', 'Arduino Playground',
'Arduino Project Handbook', 'Arduino Workshop', 'Art of Assembly Language,
2nd Edition', 'Art of Debugging', 'Art of Interactive Design']
来源:https://stackoverflow.com/questions/39870290/how-to-strip-line-breaks-from-beautifulsoup-get-text-method