How to Generate PDF Documents in Python
RU / EN
Consultation

How to Generate PDF Documents in Python

This article is the result of my first experience with PDF generation tools. Here we are not even talking about Django, but about printing regulated documents from Python using templating engines. Perhaps my experience will help someone save time and eliminate the need to search for the necessary information on the developer pages. Because if you read up to the discussions 7 years ago, you will see that many problems are not solved today.

Below I will detail how to generate a PDF document using two different utilities as an example.

How to Generate PDF Documents in Python

By:

Igor Remsha

By:

Igor Remsha

How to Generate PDF Documents in Python

Tasks

The main task was to create a beautiful document with a description of the technical task, the details of which the client left in the form on our website.

How it looks step by step:

  • The user enters data into a web form
  • The server inserts this data into the template
  • Generates a PDF document based on the filled template
  • And gives it to the user

Tools

PDF creation tools

If you go to Google for "Generating pdf documents in Python", you will find that there are many tools for this, and each forum recommends a different tool.

I'll show an example using the two.

  1. WKhtmltopdf is a console-based HTML to PDF rendering utility for which there are many Python-Django wrappers.
  2. WeasyPrint is a visual HTML and CSS rendering engine that can be exported to PDF and PNG.

Advantages and disadvantages

WKhtmltopdf is great for solving problems when you need to quickly generate a document where pages of text will be presented as an image (text inside the file cannot be selected), and you want to use all modern approaches when writing CSS (Flexbox, Shadowbox).

WeasyPrint will work in the opposite case, the text in the resulting documents can be copied / selected, but, unfortunately, there are some restrictions on the capabilities of CSS, which must be adhered to so that the generation of the document does not take several minutes.

Layout

As an example, consider a portion of an HTML document provided by a colleague of mine.

As an example, consider a portion of an HTML document provided by a colleague of mine.

 
HTML
<!DOCTYPE html>
<html lang="ru-RU">
<head>
  <meta charset="utf-8">
  <link rel="stylesheet" href="style.css">
</head>
<body>
  <div class="main">
    <div class="title">
      <h1>Техническое задание</h1>
      <h2>На разработку мобильного приложения</h2>
    </div>
    <div class="section">
      <h1>Шаг 1 / 7</h1>
      <h2>1. Общие положения о проекте:</h2>
      <h3>1.1 Предмет разработки</h3>
      <div class="field">
        <span class="field__title">Предметом разработки является мобильное приложение:</span>
        <p class="field__description">{{ app_name }}</p>
      </div>
      <div class="field">
        <span class="field__title">Он включает в себя следующий комплекс работ: </span>
        <p class="field__description">
          {{ complex_works }}
        </p>
      </div>

      <h3>1.2 Функциональное назначение продукта</h3>
      <div class="field">
        <span class="field__title">Продукт / сервис предназначен для:</span>
        <p class="field__description">
          {{ mission }}
        </p>
      </div>
      <div class="field">
        <span class="field__title">Основная цель продукта:</span>
        <p class="field__description">{{ targer }}</p>
      </div>
      <div class="field">
        <span class="field__title">Он решает следующие проблемы пользователей:</span>
        <p class="field__description">
          {{ solve_problems }}
        </p>
      </div>
    </div
  </div>
</body>
</html>
CSS
@import url('https://fonts.googleapis.com/css2?family=Montserrat:wght@400;500;600;700&display=swap');

html, body {
    margin: 0;
    font-family: 'Visuelt Pro';
}

body {
    width: 790px;
    margin: 0 auto;
    position: relative;
    font-family: 'Montserrat', sans-serif;
}

p, h1, h2, h3, ul, li {
    margin: 0;
    padding: 0;
}

.main {
    margin: 0 auto;
    width: 80.6%;
}

.header__logo {
    position: absolute;
    right: 330px;
    top: 91px;
    z-index: 1;
}

.title {
    border-bottom: 1px solid #e8e8e8;
    width: 100%;
    padding: 33px 0 32px;
}

.title h1 {
    margin-bottom: 16px;
    font-size: 20px;
    line-height: 30px;
    font-weight: 600;
    text-align: center;
}

.title h2 {
    font-size: 14px;
    line-height: 20px;
    font-weight: normal;
    letter-spacing: 0.5px;
    text-align: center;
}

.section .field:first-of-type {
    margin-top: 30px;
}

.section h1 {
    margin-top: 32px;
    color: #d73f30;
    font-weight: 600;
    font-size: 16px;
    line-height: 25px;
    text-align: center;
}

.section h2 {
    margin-top: 10px;
    font-weight: 600;
    font-size: 16px;
    line-height: 25px;
    text-align: center;
}

.section h3 {
    margin: 10px 0 20px;
    font-weight: 500;
    font-size: 14px;
    line-height: 20px;
    letter-spacing: 0.5px;
    text-align: center;
}

.field {
    margin-bottom: 20px;
}

.field__title {
    display: inline-block;
    margin-bottom: 15px;
    font-weight: bold;
    font-size: 12px;
    line-height: 20px;
    letter-spacing: 0.5px;
}

.field__description {
    padding: 15px 13px 14px 16px;
    font-weight: normal;
    font-size: 12px;
    line-height: 20px;
    letter-spacing: 0.5px;
    border-radius: 6.31579px;
    border: solid 1px #e8e8e8;
}

.description__title {
    display: block;
    font-weight: 700;
}

.field__description img {
    display: block;
    max-width: 100%;
    margin: 0 auto;
}

.field__description img:not(:first-child) {
    margin-top: 30px;
}

.field__description ol {
    list-style: none;
    counter-reset: li;
    margin-top: 8px;
}

.field__description ol li:not(:last-child) {
    margin-bottom: 8px;
}

.field__description > ol {
    padding: 0;
    margin: 0;
}

.field__description > ol ol {
    padding-left: 32px;
}

 

Note: to be able to substitute a file from static in css, for example background-image: url ("{% static 'images / check.svg'%}"); we will need to put all the styles in the template in the <style> tag.

 
Final template
<!DOCTYPE html>
<html lang="ru-RU">
{% load static %}
<head>
  <meta charset="utf-8">
  <style>
    @page {
    size: A4;
    margin: 10mm 0;
    padding: 0;
    }

    @page :first {
      margin: 0 0 10mm 0; 
    }

    @import url('https://fonts.googleapis.com/css2?family=Montserrat:wght@400;500;600;700&display=swap');

    html, body {
      margin: 0;
      font-family: 'Visuelt Pro';
    }

    body {
      width: 790px;
      margin: 0 auto;
      position: relative;
      font-family: 'Montserrat', sans-serif;
    }

    p, h1, h2, h3, ul, li {
      margin: 0;
      padding: 0;
    }

    .main {
      margin: 0 auto;
      width: 80.6%;
    }

    .header__logo {
      position: absolute;
      right: 330px;
      top: 91px;
      z-index: 1;
    }

   .title {
      border-bottom: 1px solid #e8e8e8;
      width: 100%;
      padding: 33px 0 32px;
    }

    .title h1 {
      margin-bottom: 16px;
      font-size: 20px;
      line-height: 30px;
      font-weight: 600;
      text-align: center;
    }

    .title h2 {
      font-size: 14px;
      line-height: 20px;
      font-weight: normal;
      letter-spacing: 0.5px;
      text-align: center;
    }

    .section .field:first-of-type {
      margin-top: 30px;
    }

    .section h1 {
      margin-top: 32px;
      color: #d73f30;
      font-weight: 600;
      font-size: 16px;
      line-height: 25px;
      text-align: center;
    }

    .section h2 {
      margin-top: 10px;
      font-weight: 600;
      font-size: 16px;
      line-height: 25px;
      text-align: center;
    }

    .section h3 {
      margin: 10px 0 20px;
      font-weight: 500;
      font-size: 14px;
      line-height: 20px;
      letter-spacing: 0.5px;
      text-align: center;
    }

    .field {
      margin-bottom: 20px;
    }

    .field__title {
      display: inline-block;
      margin-bottom: 15px;
      font-weight: bold;
      font-size: 12px;
      line-height: 20px;
      letter-spacing: 0.5px;
    }

    .field__description {
      padding: 15px 13px 14px 16px;
      font-weight: normal;
      font-size: 12px;
      line-height: 20px;
      letter-spacing: 0.5px;
      border-radius: 6.31579px;
      border: solid 1px #e8e8e8;
    }

    .description__title {
      display: block;
      font-weight: 700;
    }

    .field__description img {
      display: block;
      max-width: 100%;
      margin: 0 auto;
    }

    .field__description img:not(:first-child) {
      margin-top: 30px;
    }

    .field__description ol {
      list-style: none;
      counter-reset: li;
      margin-top: 8px;
    }

    .field__description ol li:not(:last-child) {
      margin-bottom: 8px;
    }

    .field__description > ol {
      padding: 0;
      margin: 0;
    }

    .field__description > ol ol {
      padding-left: 32px;
    }
  </style>
  <link rel="preconnect" href="https://fonts.gstatic.com">
  <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@300;400;500;600;700&display=swap" rel="stylesheet"> 
</head>
<body>
{% autoescape off %} 
  <div class="main">
    <div class="title">
      <h1>Техническое задание</h1>
      <h2>На разработку мобильного приложения</h2>
    </div>
    <div class="section">
      <h1>Шаг 1 / 7</h1>
      <h2>1. Общие положения о проекте:</h2>
      <h3>1.1 Предмет разработки</h3>
      <div class="field">
        <span class="field__title">Предметом разработки является мобильное приложение:</span>
        <p class="field__description">{{ app_name}}</p>
      </div>
      <div class="field">
        <span class="field__title">Он включает в себя следующий комплекс работ: </span>
        <p class="field__description">
          {{ complex_works|linebreaksbr }}
        </p>
      </div>

      <h3>1.2 Функциональное назначение продукта</h3>
      <div class="field">
        <span class="field__title">Продукт / сервис предназначен для:</span>
        <p class="field__description">
          {{ mission|linebreaksbr }}
        </p>
      </div>
      <div class="field">
        <span class="field__title">Основная цель продукта:</span>
        <p class="field__description">{{ target|linebreaksbr }}</p>
      </div>
      <div class="field">
        <span class="field__title">Он решает следующие проблемы пользователей:</span>
        <p class="field__description">
          {{ solve_problems|linebreaksbr }}
        </p>
      </div>
  </div>
{% endautoescape %}
</body>
</html>

 

Example for WKhtmltopdf

 
Dockerfile
FROM python:3.6-alpine3.11

RUN mkdir -p /app/src/

RUN apk add --update --no-cache \
     # Common
     gcc libffi-dev musl-dev \
     # Pillow
     jpeg-dev zlib-dev \
     # Postgres
     postgresql-dev gdal-dev geos-dev \
     # Wkhtmltopdf
     wkhtmltopdf \
     xvfb

# install wkhtmltopdf dependencies
 RUN apk add --update ca-certificates openssl && update-ca-certificates

 COPY Pipfile* /app/
 RUN cd /app/ && \
     pip install pipenv && \
     pipenv install --system --deploy --ignore-pipfile

COPY src /app/src/
WORKDIR /app/src/

 CMD ["gunicorn", "-w", "3", "--bind", ":8000", "config.wsgi:application"]

 

Note: my attempts to install and use wkhtmltopdf on alpine: 3.12 failed with error code: ERROR: The command 'wkhtmltopdf index.html out.pdf' returned a non-zero code: 139.

For some reason in my practice wkhtmltopdf did not work on all versions of x alpine. Here are a few versions that did not have problems: 3.9.4 / 3.10.4 / 3.11.

 
Pipfile
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
django = "~=2.2"
pdfkit = "~=0.6"

[requires]
python_version = "3.6"

 

 
Code to generate PDF
from django.template.loader import render_to_string, get_template
import tempfile
import pdfkit


def generate_pdf(request, data):
    html_template = render_to_string('portable_document/technical_specification_template.html', data)
    pdfkit.from_string(html_template, 'out.pdf')

 

Result

Result for WKhtmltopdf

As mentioned above, you will not be able to interact with the text printed on this PDF file. It looks like every word here is an image that you can stretch and rotate.

Point3

Outcome

  1. Put the correct version of alpine.
  2. Use whatever wrapper we like.
  3. Use any style.
  4. Get PDF, where pages are pictures.

Example for WeasyPrint

Documentation and source

 
Dockerfile
FROM python:3.6-alpine3.11

RUN mkdir -p /app/src/

RUN apk add --update --no-cache \
    # Common
    gcc libffi-dev musl-dev \
    # Pillow
    jpeg-dev zlib-dev \
    # Postgres
    postgresql-dev gdal-dev geos-dev

RUN apk --update --upgrade add gcc musl-dev jpeg-dev zlib-dev libffi-dev cairo-dev pango-dev gdk-pixbuf-dev msttcorefonts-installer fontconfig
RUN update-ms-fonts


COPY Pipfile* /app/
RUN cd /app/ && \
    pip install pipenv && \
    pipenv install --system --deploy --ignore-pipfile

COPY src /app/src/
WORKDIR /app/src/

CMD ["gunicorn", "-w", "3", "--bind", ":8000", "config.wsgi:application"]

 

Note: first of all, we will install all the dependencies required for this library. Then, since we are using the lightweight version of alpine, we need to install the default fonts.

RUN apk --update --upgrade add gcc musl-dev jpeg-dev zlib-dev libffi-dev cairo-dev pango-dev gdk-pixbuf-dev msttcorefonts-installer fontconfig
RUN update-ms-fonts

Otherwise, we would get the following picture:

 
Pipfile
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
django = "~=2.2"
weasyprint = "~=52.1"

[requires]
python_version = "3.6"

 

 
Код для генерации PDF
from django.template.loader import render_to_string, get_template
from weasyprint import HTML, CSS
from weasyprint.fonts import FontConfiguration
import tempfile
import io


def generate_pdf(request, data):
    font_config = FontConfiguration()

    html_template = render_to_string('portable_document/technical_specification_template.html', data)
    
    html = HTML(string=html_template, base_url=request.build_absolute_uri())

    pdf = html.write_pdf(font_config=font_config, presentational_hints=True)
    # Используйте эту запись, чтобы просто получить PDF-фаил локально
    # html.write_pdf('out.pdf', font_config=font_config, presentational_hints=True)
    pdf_in_memory = io.BytesIO(pdf)

    return pdf_in_memory

 

Result

Result for WEASYPRINT

This is how the generated PDF looks like, where we can interact with the text. To the left of the document, there is a section break that was generated automatically.

Note

As I said before, when using WeasyPrint, you should avoid using Flexbox, it negatively affects performance. Which I will demonstrate next. More details.

Also, WeasyPrint does not currently support rendering of shadows. More details here.

Total

  1. Install standard fonts if using alpine
  2. We do not use Flexbox and Shadow in the layout (we are looking for alternatives):
    1. 'display: flex' -> 'display: inline-block';
    2. 'flex-direction: column' -> 'display: block';
  3. As a result, we get generated PDF documents in which you can interact with text

Performance comparison

I do not pretend to be the standard of these values, since execution time was measured as follows:

  1. start_time = time.clock ()
  2. generate_pdf ()
  3. print (time.clock () - start_time, "seconds")

When using the same template (it is presented in the "Layout" section) and when booting from the network, and not from statics, 5 tests run in a row showed the following results:

WkhtmltopdfWeasyPrintWeasyPrint (if you replace all inline with flexbox)
0.241 sec0.743 sec3.223 sec
0.138 sec0.522 sec3.294 sec
0.104 sec0.421 sec3.148 sec
0.106 sec0.463 sec3.109 sec
0.104 sec0.437 sec3.195 sec

Conclusion

The result of the above is obvious: to generate PDF, the tool must be selected based on specific tasks.

If speed of document generation and the ability to use modern CSS tools (for example, Flexbox and Shadowbox) are a priority, WKhtmltopdf will do. But the final text in the document cannot be selected and copied.

If you want the text to be interactive after generating the file, use WeasyPrint. But in this case, you will have to spend more time looking for alternatives to Flexbox and Shadowbox.

Discuss article on social media

Similar news

step 1: choose service

mobile development

web development

machine learning

design

audit

technical requirenment

consultation

animtaion