How to create PDF containing Persian(Farsi) text with reportlab, rtl and bidi in python

问题

I've been trying to create a PDF file from content that can be English, Persian, digits or a combination of them.

there is some problems with Persian texts like: "این یک متن فارسی است"

۱- the text must be written from right to left

2- there is a difference between characters in different positions in the word (meaning that characters change their shape according to their surrounding characters)

3- because the sentence is read from right to left then the normal textwrap doesn't work correctly.

回答1:

After working for a while with Reportlab, we had some problems with organizing and formatting it. It took a lot of time and was kind of complicated. So we decided to work with pdfkit and jinja2. This way we can format and organize in html and CSS and we don't need to reformat Persian text too.

first we can design an html template file like the one below:

    &lt!DOCTYPE html&gt
        &lthtml&gt
        &lthead lang="fa-IR"&gt
            &ltmeta charset="UTF-8"&gt
            &lttitle&gt&lt/title&gt
        &lt/head&gt
        &ltbody &gt
            &ltp dir="rtl"&gtسوابق کاری&lt/p&gt
            &ltul dir="rtl"&gt
                {% for experience in experiences %}
                &ltli&gt&lta href="{{ experience.url }}"&gt{{ experience.title }}&lt/a&gt&lt/li&gt
                {% endfor %}
            &lt/ul&gt
        &lt/body&gt
        &lt/html&gt

and then we use jinja2 library to render our data into Template, and then use pdfkit to create a pdf from render result:

    from jinja2 import Template
    from pdfkit import pdfkit

    sample_data = [{'url': 'http://www.google.com/', 'title': 'گوگل'},
                   {'url': 'http://www.yahoo.com/fa/', 'title': 'یاهو'},
                   {'url': 'http://www.amazon.com/', 'title': 'آمازون'}]

    with open('template.html', 'r') as template_file:
        template_str = template_file.read()
        template = Template(template_str)
        resume_str = template.render({'experiences': sample_data})

        options = {'encoding': "UTF-8", 'quiet': ''}
        bytes_array = pdfkit.PDFKit(resume_str, 'string', options=options).to_pdf()
        with open('result.pdf', 'wb') as output:
            output.write(bytes_array)

回答2:

I used reportlab for creating PDf but unfortunately reportlab doesn't support Arabic and Persian alphabet so I used 'rtl' library by Vahid Mardani and 'pybidi' library by Meir Kriheli to make the text look right in PDF result.

first we need to add a font that supports Persian to reportlab:

in ubuntu 14.04:

copy Bahij-Nazanin-Regular.ttf into
/usr/local/lib/python3.4/dist-packages/reportlab/fonts folder

add font and styles to reportlab:

from reportlab.lib.enums import TA_RIGHT
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
pdfmetrics.registerFont(TTFont('Persian', 'Bahij-Nazanin-Regular.ttf'))
styles = getSampleStyleSheet()
styles.add(ParagraphStyle(name='Right', alignment=TA_RIGHT, fontName='Persian', fontSize=10))

in next step we need to reshape Persian text Letters to the right shape and make the direction of each word from right to left:

    from bidi.algorithm import get_display
    from rtl import reshaper
    import textwrap

    def get_farsi_text(text):
        if reshaper.has_arabic_letters(text):
          words = text.split()
          reshaped_words = []
          for word in words:
            if reshaper.has_arabic_letters(word):
              # for reshaping and concating words
              reshaped_text = reshaper.reshape(word)
              # for right to left    
              bidi_text = get_display(reshaped_text)
              reshaped_words.append(bidi_text)
            else:
              reshaped_words.append(word)
          reshaped_words.reverse()
         return ' '.join(reshaped_words)
        return text

and for adding bullet or wrapping the text we could use following function:

    def get_farsi_bulleted_text(text, wrap_length=None):
       farsi_text = get_farsi_text(text)
       if wrap_length:
           line_list = textwrap.wrap(farsi_text, wrap_length)
           line_list.reverse()
           line_list[0] = '{} &#x02022;'.format(line_list[0])
           farsi_text = '<br/>'.join(line_list)
           return '<font>%s</font>' % farsi_text
       return '<font>%s &#x02022;</font>' % farsi_text

for testing the code we can write:

    from reportlab.lib.pagesizes import letter
    from reportlab.platypus import SimpleDocTemplate, Paragraph
    from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle

    doc = SimpleDocTemplate("farsi_wrap.pdf", pagesize=letter,    rightMargin=72, leftMargin=72, topMargin=72,
                    bottomMargin=18)
    Story = []

    text = 'شاید هنوز اندروید نوقا برای تمام گوشی‌های اندرویدی عرضه نشده باشد، ولی اگر صاحب یکی از گوشی‌های نکسوس یا پیک' \
   'سل باشید احتمالا تا الان زمان نسبتا زیادی را با آخرین نسخه‌ی اندروید سپری کرده‌اید. اگر در کار با اندروید نوقا' \
   ' دچار مشکل شده‌اید، با دیجی‌کالا مگ همراه باشید تا با هم برخی از رایج‌ترین مشکلات گزارش شده و راه حل آن‌ها را' \
   ' بررسی کنیم. البته از بسیاری از این روش‌ها در سایر نسخه‌های اندروید هم می‌توانید استفاده کنید. اندروید برخلاف iOS ' \
   'روی گستره‌ی وسیعی از گوشی‌ها با پوسته‌ها و اپلیکیشن‌های اضافی متنوع نصب می‌شود. بنابراین تجویز یک نسخه‌ی مشترک برا' \
   'ی حل مشکلات آن کار چندان ساده‌ای نیست. با این حال برخی روش‌های عمومی وجود دارد که بهتر است پیش از هر چیز آن‌ها را' \
   ' بیازمایید.'
    tw = get_farsi_bulleted_text(text, wrap_length=120)
    p = Paragraph(tw, styles['Right'])
    Story.append(p)
    doc.build(Story)

回答3:

In case anyone wants to generate pdfs from html templates using Django, this is how it can be done:

template = get_template("app_name/template.html")
context = Context({'something':some_variable})
html = template.render(context)
pdf = pdfkit.from_string(html, False)
response = HttpResponse(pdf, content_type='application/pdf')
response['Content-Disposition'] = 'attachment; filename=output.pdf'
return response

来源：https://stackoverflow.com/questions/41345450/how-to-create-pdf-containing-persianfarsi-text-with-reportlab-rtl-and-bidi-in

标签

python-3.x

pdf

reportlab

persian

farsi