How to strip line breaks from BeautifulSoup get text method

浪子不回头ぞ 提交于 2021-02-08 06:19:15

问题


I have a following output after scraping a web page

       text
Out[50]: 
['\nAbsolute FreeBSD, 2nd Edition\n',
'\nAbsolute OpenBSD, 2nd Edition\n',
'\nAndroid Security Internals\n',
'\nApple Confidential 2.0\n',
'\nArduino Playground\n',
'\nArduino Project Handbook\n',
'\nArduino Workshop\n',
'\nArt of Assembly Language, 2nd Edition\n',
'\nArt of Debugging\n',
'\nArt of Interactive Design\n',]

I need to strip \n from above list while iterating over it. Following is my code

text = []
for name in web_text:
   a = name.get_text()
   text.append(a)

回答1:


Rather than calling .strip() explicitly, use the strip argument:

a = name.get_text(strip=True)

This would also remove the extra whitespace and newline characters in the children texts if any.




回答2:


Just like you would strip any other string:

text = []
for name in web_text:
   a = name.get_text().strip()
   text.append(a)



回答3:


You can use list comprehension:

stripedText = [ t.strip() for t in text ]

Which outputs:

>>> stripedText
['Absolute FreeBSD, 2nd Edition', 'Absolute OpenBSD, 2nd Edition', 'Android Security Internals', 'Apple Confidential 2.0', 'Arduino Playground', 'Arduino Project Handbook', 'Arduino Workshop', 'Art of Assembly Language, 2nd Edition', 'Art of Debugging', 'Art of Interactive Design']



回答4:


The following way helps you to strip \n from above list while iterating over it.

>>> web_text = ['\nAbsolute FreeBSD, 2nd Edition\n',
... '\nAbsolute OpenBSD, 2nd Edition\n',
... '\nAndroid Security Internals\n',
... '\nApple Confidential 2.0\n',
... '\nArduino Playground\n',
... '\nArduino Project Handbook\n',
... '\nArduino Workshop\n',
... '\nArt of Assembly Language, 2nd Edition\n',
... '\nArt of Debugging\n',
... '\nArt of Interactive Design\n',]

>>> text = []
>>> for line in web_text:
...     a = line.strip()
...     text.append(a)
...
>>> text
['Absolute FreeBSD, 2nd Edition', 'Absolute OpenBSD, 2nd Edition', 'Android 
Security Internals', 'Apple Confidential 2.0', 'Arduino Playground', 
'Arduino Project Handbook', 'Arduino Workshop', 'Art of Assembly Language, 
2nd Edition', 'Art of Debugging', 'Art of Interactive Design']


来源:https://stackoverflow.com/questions/39870290/how-to-strip-line-breaks-from-beautifulsoup-get-text-method

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!