13.12.2019

Splitting a pdf to single pages with PyPDF2

Splitting Pdfs with Python

In this post, we are going to have a look at how to split all pages from a single pdf into one-page pdf files. Splitting a pdf into several pages can easily be done with almost any pdf tool worth its salt. However, splitting a pdf into single pages is a manual operation, and if you have to do it on several pdfs an automated tool makes sense. This is where PyPDF2 comes in handy. If you just want the complete code without all the fancy explanations, you can find it at the end.

Preparations

If you haven't done so already, fire up your command prompt, PowerShell or terminal and install PyPDF2 with pip. 

pip install pypdf2

Currently I am running 32-bit Python 3.8 with PyPDF2 version 1.26.0 on Windows 10. The code works on this setup, and probably also for other OS'es. 

Code line by line

Imports

We start with importing PdfFileWriter and PdfFileReader so that we can read the existing pdf and later write new pdfs. We also need to import sys so that we can check what files we have have in our working directory.

from PyPDF2 import PdfFileWriter, PdfFileReader
import os

Getting the pdf files to split

First we do a list comprehension in os.listdir(".") if the provided path is a file os.path.isfile(f). After that we filter out all the pdf files from the list fileswith files = list(filter(lambda f: f.lower().endswith((".pdf")), files)).

files = [f for f in os.listdir(".") if os.path.isfile(f)]
files = list(filter(lambda f: f.lower().endswith((".pdf")), files))

Splitting and creating new pdf

Now it is time to process all our pdf files. We go through each of our pdf in files with a for loop for pdf in files:. We then open the pdf with open(pdf, "rb") as f: and load each pdf into a PdfFileReader object with inputpdf = PdfFileReader(f).

Now it is time to start the splitting. With another for loop, we loop through all pages in the pdf. You can get the number of pages with numPages. We create a PdfFileWriter object named output and add the first page with getPage(i). We name the output pdf with the original name, add -Page and the page number. name = pdf[:-4]+"-Page "+str(i)+".pdf". Finally, we save the output.

with open(name, "wb") as outputStream: 
output.write(outputStream)

Complete code

from PyPDF2 import PdfFileWriter, PdfFileReader
import os

files = [f for f in os.listdir(".") if os.path.isfile(f)]
files = list(filter(lambda f: f.lower().endswith((".pdf")), files))

for pdf in files:
    with open(pdf, "rb") as f:
        inputpdf = PdfFileReader(f)

        for i in range(inputpdf.numPages):
            output = PdfFileWriter()
            output.addPage(inputpdf.getPage(i))
            name = pdf[:-4]+"-Page "+str(i)+".pdf"
            with open(name, "wb") as outputStream:
                output.write(outputStream)

 

Leave a Reply

Your email address will not be published.

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram