Importing tables in Mathematica from web - empty cell problem

白昼怎懂夜的黑 提交于 2019-12-03 13:26:15

As lumeng points out, you can use FullData to get the HTML table element to fill out properly. Here's a simpler illustration of this.

in = ImportString["\<<html><table>
   <tr>
   <td>(1,1)</td>
   <td>(1,2)</td>
   <td>(1,3)</td>
   </tr>
   <tr>
   <td>(2,1)</td>
   <td></td>
   <td>(2,3)</td>
   </tr>
   </table></html>\>",
   {"HTML", "FullData"}];
Grid[in[[1, 1]]]

If you want more complete control of the output, I'd suggest that you Import the page as XML. Here's an example.

in = ImportString["\<<html><table>
    <tr>
    <td>(1,1)</td>
    <td>(1,2)</td>
    <td>(1,3)</td>
    </tr>
    <tr>
    <td>(2,1)</td>
    <td></td>
    <td>(2,3)</td>
    </tr>
    </table></html>\>", "XML"];
Column[Last /@ Cases[in,
   XMLElement["td", ___], Infinity]]

You'll need to read up a bit on XML in general and Mathematica's version, namely the XMLObject. It's a delight to work with, once you get the hang of it, though.

In[13]:= htmlcode = "<html><table border=\"1\">
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
<td>row 1, cell 3</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td></td>
<td>row 2, cell 3</td>
</tr>
</table><html>";

In[14]:= file = ToFileName[{$TemporaryDirectory}, "tmp.html"]
Out[14]= "/tmp/tmp.html"


In[15]:= OpenWrite[file]
WriteString[file,htmlcode]
Close[file]
FilePrint[file]
Out[15]= OutputStream[/tmp/tmp.html,18]
Out[17]= /tmp/tmp.html
During evaluation of In[15]:=
<html><table border="1">
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
<td>row 1, cell 3</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td></td>
<td>row 2, cell 3</td>
</tr>
</table><html>
In[23]:= Import[file,"Elements"]//InputForm
Out[23]//InputForm=
{"Data", "FullData", "Hyperlinks", "ImageLinks", "Images", "Plaintext", "Source", "Title", "XMLObject"}
In[22]:= Import[file,"FullData"]//InputForm
Out[22]//InputForm=
{{{{"row 1, cell 1", "row 1, cell 2", "row 1, cell 3"}, {"row 2, cell 1", "", "row 2, cell 3"}}}, {}}

Using Computist's sample, you could also do:

htmlcode = "<html><table border=\"1\">
  <tr>
  <td>row 1, cell 1</td>
  <td>row 1, cell 2</td>
  <td>row 1, cell 3</td>
  </tr>
  <tr>
  <td>row 2, cell 1</td>
  <td></td>
  <td>row 2, cell 3</td>
  </tr>
  </table><html>";

StringReplace[htmlcode, "<td></td>" -> "<td>###</td>"];

ImportString[%, "Data"] /. "###" -> Null
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!