I\'ve been using htmldoc for a while, but I\'ve run into some fairly serious limitations. I need the end solution to work on a Linux box. I\'ll be calling this library/utili
An alternative solution that hasn't been answered here is to use an API.
That advantage of them is that you externalize the resources needed for the job and have an up-to-date service that implements the recent features (no needs to update the code or install bugfixes).
For instance, with PDFShift, you can do that with a single POST request at:
POST https://api.pdfshift.io/v2/convert/
And passing the "source" (either an URL or a raw HTML code), and you'll get back a PDF in binary. (Disclaimer: I work at PDFShift).
Here's a code sample in Python:
import requests
response = requests.post(
'https://api.pdfshift.io/v2/convert/',
auth=('user_api_key', ''),
json={"source": "https://en.wikipedia.org/wiki/PDF", "landscape": False, "use_print": False}
)
response.raise_for_status()
with open('wikipedia.pdf', 'wb') as f:
f.write(response.content)
And your PDF will be located at ./wikipedia.pdf