GitHub repository: pdfhandy

This is the first article where I finally got courage to share some code from my  first Test Automation Project. I started to learn Python and Selenium in November 2019. Since then I managed to write  a few test, but most importantly, think I’ve built some kind of foundation for scaling and improving my tests.   Probably most satisfying was seeing  a steady growth of short snippets which I was creating to understand new concepts and get my feet wet as I learned Python, Pytest,  Selenium and other areas.  

The code below is something that I tried to Google but found only a few responses on stackoverflow. I used them as a starting point and after 1-2 weeks of hard work and several “a-ha” moments I finally managed to get it work – my  own PDF generatorwhich I could use to create “unique” pdf files. 

The Goal

I got to a point where I wanted to test the web app file upload process. However the app it does not  allow to upload  identical  files.  I used to do it manually: googled pdf file, downloaded as many files as I could and used them  in manual tests. Sometimes I also opened file manually, added some short text so that file was then treated as new one. That saved some time but still was quite time-consuming activity. 

That’s why I decided to write a small script that would make my life easier. How to  automate this process so I could use a simple pdf generator function to create files for  my tests? 

With this goal in mind I drafted a list of requirements before getting to work:

  • -can be run as a fixture before test starts
  • -multiple pdf files can be generated and stored in dedicated directory
  • -filename is generated in the following format/pattern: “Test_pdf_0318_1.pdf”. Where “0318” is test number, “1” – pdf number within current test
  • -files automatically deleted after the test; 
  • -files automatically archived after test finished
  • -base pdf file is used to generate new pdf files; base file can be replaced
  • -pdf object (instance of custom PDF class) is returned. Pdf object contains useful information that could be used later in tests to upload file and use this info in assertions:  file_time (when pdf was generated), file_date, file_size, file_path, etc.

The structure

I know that my implementation is probably  far from  good coding practices. However, “practice makes perfect”. That’s why I decided to share it even if it’s ugly and missing some things. At least it does the job I needed. 

Here is the structure of the function that I called “pdf_factory”:

  • -cleanup
  • -generate one-page pdf file containing pdf name, time/date, test-caller
  • -merge  generated one-page pdf with with base pdf file
  • -create PDF instance using PDF class
  • -return single pdf or list of PDF objects

Clean up

Clean up simply deletes pdf files generated  for previous tests before new test starts. Ideally  the cleanup should be done after the test. The “yield” operator would be ideal for this purpose as it allows to run some parts of the script before and after the test. However  it didn’t work for me. If I am correct, that’s because in pdf_factory fixture I used the pattern when function returns reference to inner function (I used it to make it possible to use arguments with fixture).  

Generate one-page pdf

Once the clean up is finished one-page pdf is generated. The main purposes are to use it later to create “unique” pdf file and also display useful information so that you can easily identify it in you tests. I used “reportlab”  for pdf  generation. It has  lots of methods and it was quite easy to find examples and documentation.

Here is the screenshot:



Merge  two pdf files

Now we just need to merge base pdf file and generated one-page pdf with info. This time I used very popular PyPDF2 library. Here is the screenshot of my base pdf file which has 16 pages


Merge two pdf files using PyPDF2


Create PDF object

Now it’s time to create PDF object to store some useful information:

class PDF:
    file_path = None
    file_name = None
    file_dt = None
    file_date = None
    file_time = None
    file_num = None
    file_pages = None
    file_size = None
    file_tzoffset = None

Return pdf or list of pdf objects

By default “count=1”. This is the parameter in the fixture that defines number of pdf files generated for current test. If count >1 the fixture returns the list of PDF objects. Otherwise – single PDF object. Here is the body of the fixture. 

_pdf_factory body
        pdf_list = []
        pdf = PDF()

        cleanup(folder, fname_template, archive_num)
        for k in range(1, count+1):
            pdf = get_testid_pdf(node, folder, testid_filename, fname_template, k)
            pdf.file_path, pdf.file_size = write_merged_pdf(base_files, folder, testid_filename, fname_template, k)
        return pdf if count == 1 else pdf_list

Below is the example of the test that uses pdf_factory fixture. Among with “pdf_factory” fixture, few other fixtures are included in arguments of the test function because we use them in pdf_factory 

def test_pdf_factory_multiple(request, current_test_num, pdf_factory):
    pdfs = pdf_factory(request.node.nodeid, current_test_num, count=3)
    for k, pdf_obj in enumerate(pdfs):
        logging.info(f'ITERATION: {k}')
        logging.info(f'file_date: {pdf_obj.file_date}')
        logging.info(f'file_path: {pdf_obj.file_path}')
        logging.info(f'file_size: {pdf_obj.file_size}')
        logging.info(f'file_name: {pdf_obj.file_name}')


Here is the output:

-------------------------------- live log call ---------------------------------
19:54:35 INFO pdf.file_num: 0319_1
19:54:35 INFO pdf.file_name: Test_pdf_0319_1.pdf
19:54:35 INFO pdf.file_date: 2020-04-05
19:54:35 INFO pdf.file_time: 19:54:35.308970
19:54:35 INFO pdf.file_tzoffset: 11.0
19:54:35 INFO pdf.file_num: 0319_2
19:54:35 INFO pdf.file_name: Test_pdf_0319_2.pdf
19:54:35 INFO pdf.file_date: 2020-04-05
19:54:35 INFO pdf.file_time: 19:54:35.379654
19:54:35 INFO pdf.file_tzoffset: 11.0
19:54:35 INFO pdf.file_num: 0319_3
19:54:35 INFO pdf.file_name: Test_pdf_0319_3.pdf
19:54:35 INFO pdf.file_date: 2020-04-05
19:54:35 INFO pdf.file_time: 19:54:35.432892
19:54:35 INFO pdf.file_tzoffset: 11.0
19:54:35 INFO ITERATION: 0
19:54:35 INFO file_date: 2020-04-05
19:54:35 INFO file_path: /Users/maksim/repos/p4-python-aerofiler/data/Test_pdf_0319_1.pdf
19:54:35 INFO file_size: 262
19:54:35 INFO file_name: Test_pdf_0319_1.pdf
19:54:35 INFO ITERATION: 1
19:54:35 INFO file_date: 2020-04-05
19:54:35 INFO file_path: /Users/maksim/repos/p4-python-aerofiler/data/Test_pdf_0319_2.pdf
19:54:35 INFO file_size: 262
19:54:35 INFO file_name: Test_pdf_0319_2.pdf
19:54:35 INFO ITERATION: 2
19:54:35 INFO file_date: 2020-04-05
19:54:35 INFO file_path: /Users/maksim/repos/p4-python-aerofiler/data/Test_pdf_0319_3.pdf
19:54:35 INFO file_size: 262
19:54:35 INFO file_name: Test_pdf_0319_3.pdf


  • node – custom fixture that returns the name of the test so that you can always see  what test generated particular pdf file (e.g.”tests/dashboard/test_pdf.py::test_act_table”)
  • current_test_num – custom fixture that returns current test number. It uses  pytest’s built-in “cache” fixture to store previouse test number
  • count – number of pdf files generated for current test. Default value is “1”
  • folder – specifies folder used to store pdf files (“data” folder in my case)
  • testid_filename – default name for one-page pdf I described above
  • base_files – file names that pint to pdf files to merge with one-page pdf . It can be one file or multiple files
  • fname_template – pattern that will be used for the names of the generated pdf files
  • archive_num = argument that controls how many files will be archived. This can be handy when you do not want delete all files generated in previous tests
def _pdf_factory(node, current_test_num,
                      folder='data',        # Default folder: project_dir/data

Files and project folder structure

Here is how my project folder looks like. “Data” folder servs as a place to store generated pdf files

├── README.md
├── __pycache__
├── __requirements\ 2.txt
├── conftest.py
├── data
├── pages
├── pytest.ini
├── requirements.txt
├── snippets
├── tests
├── utils
└── venv

Contents of “data” folder

▶ tree -L 1
├── Test_pdf_0319_1.pdf
├── Test_pdf_0319_2.pdf
├── Test_pdf_0319_3.pdf
├── _Test_pdf_0317_1.pdf
├── _Test_pdf_0318_1.pdf
├── contract_template.pdf
└── test_id.pdf


GitHub repository: pdfhandy


Leave a Reply

Your email address will not be published. Required fields are marked *