How to prettify HTML so tag attributes will remain in one single line?

强颜欢笑 提交于 2019-12-05 01:27:17

BeautifulSoup tried to preserve the newlines and multiple spaces you had in the attribute values in the input HTML.

One workaround here would be to iterate over the element attributes and clean them up prior to prettifying - removing the newlines and replacing multiple consecutive spaces with a single space:

for tag in soup():
    for attr in tag.attrs:
        tag.attrs[attr] = " ".join(tag.attrs[attr].replace("\n", " ").split())

print(soup.prettify())

Prints:

<html>
 <head>
 </head>
 <body>
  <h1 style="text-align: center;">
   Main site
  </h1>
  <div>
   <p style="color: blue; text-align: center;">
    text1
   </p>
   <p style="color: blueviolet; text-align: center;">
    text2
   </p>
  </div>
  <div>
   <p style="text-align:center">
    <img alt="Testing static images" src="./foo/test.jpg" style=""/>
   </p>
  </div>
 </body>
</html>

Update (to address the multi-valued attributes like class):

You just need to add a slight modification adding special handling for the case when an attribute is of a list type:

for tag in soup():
    tag.attrs = {
        attr: [" ".join(attr_value.replace("\n", " ").split()) for attr_value in value] 
              if isinstance(value, list)
              else " ".join(value.replace("\n", " ").split())
        for attr, value in tag.attrs.items()
    }

While BeautifulSoup is more commonly used, HTML Tidy may be a better choice if you're working with quirks and have more specific requirements.

After installing the library for Python (pip install pytidylib) try the following code:

from tidylib import Tidy
tidy = Tidy()
# assign string to text
config = {
    "doctype": "omit",
    # "show-body-only": True
}
print tidy.tidy_document(text, options=config)[0]

tidy.tidy_document returns a tuple with the HTML and any errors that may have occurred. This code will output

<html>
  <head>
    <title></title>
  </head>
  <body>
    <h1 style="text-align: center;">
      Main site
    </h1>
    <div>
      <p style="color: blue; text-align: center;">
        text1
      </p>
      <p style="color: blueviolet; text-align: center;">
        text2
      </p>
    </div>
    <div>
      <p style="text-align:center">
        <img src="./foo/test.jpg" alt="Testing static images" style="">
      </p>
    </div>
  </body>
</html>

By uncommenting the "show-body-only": True for the second sample.

<div id="dialer-capmaign-console" class="fill-vertically" style="flex: 1 1 auto;">
  <div id="sessionsGrid" data-columns="[ { field: 'dialerSession.startTime', format:'{0:G}', title:'Start time', width:122 }, { field: 'dialerSession.endTime', format:'{0:G}', title:'End time', width:122, attributes: {class:'tooltip-column'}}, { field: 'conversationStartTime', template: cty.ui.gct.duration_dialerSession_conversationStartTime_endTime, title:'Duration', width:80}, { field: 'dialerSession.caller.lastName',template: cty.ui.gct.person_dialerSession_caller_link, title:'Caller', width:160 }, { field: 'noteType',template:cty.ui.gct.nameDescription_noteType, title:'Note type', width:150, attributes: {class:'tooltip-column'}}, { field: 'note', title:'Note'} ]"></div>
</div>

See more configuration for further options and customization. There are wrapping options specific to attributes which may help. As you can see, empty elements will only take up one line, and html-tidy will automatically try to add things like DOCTYPE, head and title tags.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!