HTML to RTF string using Python

前端 未结 3 1250
挽巷
挽巷 2021-02-19 11:21

I am looking for a way to convert HTML text to RTF string. Is there any libraries that does this job. I get html content dynamically in my project and need it to be rendered in

相关标签:
3条回答
  • 2021-02-19 11:52

    I see this question is over a year old, but figured I'd contribute anyway. I recently had a similar requirement, and turned to PyRTF, a small but powerful Python module that can construct RTF documents from a text file. You could use Beautiful Soup to scrape the HTML, going down the parse tree tag by tag, and use the PyRTF API to construct appropriate objects (table, cell, paragraph, section or document).

    The API itself is quite granular, and allows for a whole bunch of custom formatting (font text, alignment, color, headers, footers etc.)

    Hope this helps.

    0 讨论(0)
  • 2021-02-19 12:16

    There is a wonderful python library that comes as a tarball.

    You can download it at https://pypi.python.org/pypi/zopyx.convert2/2.4.5.

    Good luck!

    0 讨论(0)
  • 2021-02-19 12:17

    RTF seems a dicey format to convert from/to. I've tried cutting and pasting among applications on Mac OS X, for example, where RTF is something of a lingua franca. Some of those apps are Microsoft apps (relevant in that RTF is a Microsoft-developed format), others are not. Even basic formatting information like font size, font face, line spacing, and list styling (ordered or unordered) is jumbled when copying from one ostensibly RTF-speaking app to another. Simply put, it's a mess.

    I have searched for ways to programmatically read, write, and transform RTF, preferably from Python. I found a number of packages on PyPI, trying them out has been a disappointing experience. They would support RTF 1.5, say, when the current version is 1.9.1. RTF has been around a long time, but a 2005-vintage spec is not very recent. There were lots of gotchas and incompatibilities. LOTS.

    Now, I'm not saying it's impossible, or that there aren't other libraries out there that would do the trick. I have not tried the zopyx.convert mentioned by others here, for example. Maybe it's great. But looking at its dependencies--Java, FOP, etc.--it looks like a pretty complex (and thus likely fragile) toolchain. I read its code on github, and the Python is really only there as a coordination veneer. It organizes external tools XFC, XINC, FOP, and PrinceXML--three of the four of which are commercial software. That includes the key XFC part that deals with RTF. Color me skeptical.

    There are two converters that I've found are worth a look: If you're using a Mac, the textutil command line program is actually one of the better and simpler tools I've seen.

    textutil -convert html filename.rtf -output filename.html
    

    The other formatting engine that's worth considering is LibreOffice. It's free, open source, reasonably amenable to automation, and a decent foundation as an interoperability hub. That's not just a guess; I've built complex, multi-format document workflows around it.

    I would question why you're trying to get into RTF. That seems like a document format you'd be trying to escape from. But if you need to go there, textutil and LibreOffice are the least-worst mechanisms I've found.

    0 讨论(0)
提交回复
热议问题