Best way to generate proper markup for inserting into WordPress from PHP (importing from another CMS)

偶尔善良 提交于 2019-12-12 01:14:59

问题


I was assigned to import a large amount of content from a certain database, which belongs to a proprietary CMS system, to a new installation of WordPress. After writing a nice PHP script to retrieve entries and insert them using the wp_insert_post() function, I'm now stuck with a problem.

What I want to do is to "filter" my input string, which is the source content, to fit the format used natively by WordPress when content is copy-pasted to the built-in editor. For instance, this is how it would look like:

<strong>UIR e OER</strong>

&nbsp;

Os verbos terminados em <strong>-uir</strong> e <strong>-oer</strong> terão as 2ª e 3ª pessoas do singular do presente do indicativo escritas com <strong>-i-</strong>:

<strong> </strong>

<strong>– tu possuis</strong>

<strong>– ele possui</strong>

<strong>– tu constróis</strong>

...

Now, this is how the original content is retrieved from the source database:

<p>&nbsp;<b style="line-height: 150%; text-align: center;"><span style="font-size:13.5pt;line-height:150%;  font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;mso-fareast-font-family:&quot;Times New Roman&quot;;  mso-fareast-language:PT-BR">UIR e OER</span></b></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">&nbsp;<o:p></o:p></span></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">Os verbos terminados em <b>-uir</b> e <b>-oer</b> ter&atilde;o as 2&ordf; e 3&ordf; pessoas do singular do presente do&nbsp;indicativo escritas com <b>-i-</b>:<b> <o:p></o:p></b></span></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">&nbsp;</span></b></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- tu possuis<o:p></o:p></span></b></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- ele possui<o:p></o:p></span></b></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- tu constr&oacute;is<o:p></o:p></span></b></p>  

At first it seemed that wp_insert_post() would process it automatically, and it actually does some processing, however it is not enough.

This is how the content is being stored by the import script:

<p>&nbsp;<b style="line-height: 150%; text-align: center;"><span style="font-size:13.5pt;line-height:150%;
font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;mso-fareast-font-family:&quot;Times New Roman&quot;;
mso-fareast-language:PT-BR">UIR e OER</span></b></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">&nbsp;<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">Os verbos terminados em <b>-uir</b> e <b>-oer</b> ter&atilde;o as 2&ordf; e 3&ordf; pessoas do singular do presente do&nbsp;indicativo escritas com <b>-i-</b>:<b> <o:p></o:p></b></span></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">&nbsp;</span></b></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- tu possuis<o:p></o:p></span></b></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- ele possui<o:p></o:p></span></b></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- tu constr&oacute;is<o:p></o:p></span></b></p>

My first idea was to implement a function myself, based on preg_replace() and html_entity_decode(), however it would seem to me that there is a much more elegant solution. Is there?

Edit: To put it another way, does PHP - or WordPress itself - provide a way to process the content like TinyMCE (which is the WordPress built-in editor) does? Naturally I can't rely on TinyMCE itself because it's a JavaScript tool.


回答1:


In my recent project, we needed to do the same. We used the following approaches:

  1. preg_replace for the simplest tasks.
  2. DOMDocument. This is an excellent PHP tool for parsing HTML.
  3. (non-PHP) The main import was done with node. With a couple of necessary tweaks, wp-cli node module is an excellent tool for manipulating WordPress environments. Then, we could use cheeriojs for parsing and modifying HTML.


来源:https://stackoverflow.com/questions/41619637/best-way-to-generate-proper-markup-for-inserting-into-wordpress-from-php-import

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!