Counting the number of PDF pages

Do you need to count the total number of pages in your pdf files?

Look no further, as PyPDF2 is here to help you!

Here is the documentation: https://pythonhosted.org/PyPDF2/

To count the number of pages in a PDF file, you need only four lines of code.

You can install PyPDF2 with pip (PyPi link):

py -m pip install PyPDF2

The code

Ok ok enough installing, what do we need to do to count the pages?

First, we want to import the PdfFileReader class from PyPDF2

from PyPDF2 import PdfFileReader

After that, we need to open our PDF file in binary reading mode.

with open("your_pdf_file.pdf", "rb") as pdf_file:

We then want to instantiate our PdfFileReader object

pdf_reader = PdfFileReader(pdf_file)

We then get the number of pages with the numPages property

print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")

That's it! We have now counted the number of pages in a PDF file with Python!

The complete code:

#!/usr/bin/env python3

"""
Extracting number of pages in the document

getNumPages()
Calculates the number of pages in this PDF file.

Returns:    number of pages
Return type:    int
Raises PdfReadError:
    if file is encrypted and restrictions prevent this action.
    
numPages
Read-only property that accesses the getNumPages() function.
"""

from PyPDF2 import PdfFileReader

# Load the pdf to the PdfFileReader object with default settings
with open("your_pdf_file.pdf", "rb") as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)
    print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")

Splitting Pdfs with Python

In this post, we are going to have a look at how to split all pages from a single pdf into one-page pdf files. Splitting a pdf into several pages can easily be done with almost any pdf tool worth its salt. However, splitting a pdf into single pages is a manual operation, and if you have to do it on several pdfs an automated tool makes sense. This is where PyPDF2 comes in handy. If you just want the complete code without all the fancy explanations, you can find it at the end.

Preparations

If you haven't done so already, fire up your command prompt, PowerShell or terminal and install PyPDF2 with pip. 

pip install pypdf2

Currently I am running 32-bit Python 3.8 with PyPDF2 version 1.26.0 on Windows 10. The code works on this setup, and probably also for other OS'es. 

Code line by line

Imports

We start with importing PdfFileWriter and PdfFileReader so that we can read the existing pdf and later write new pdfs. We also need to import sys so that we can check what files we have have in our working directory.

from PyPDF2 import PdfFileWriter, PdfFileReader
import os

Getting the pdf files to split

First we do a list comprehension in os.listdir(".") if the provided path is a file os.path.isfile(f). After that we filter out all the pdf files from the list fileswith files = list(filter(lambda f: f.lower().endswith((".pdf")), files)).

files = [f for f in os.listdir(".") if os.path.isfile(f)]
files = list(filter(lambda f: f.lower().endswith((".pdf")), files))

Splitting and creating new pdf

Now it is time to process all our pdf files. We go through each of our pdf in files with a for loop for pdf in files:. We then open the pdf with open(pdf, "rb") as f: and load each pdf into a PdfFileReader object with inputpdf = PdfFileReader(f).

Now it is time to start the splitting. With another for loop, we loop through all pages in the pdf. You can get the number of pages with numPages. We create a PdfFileWriter object named output and add the first page with getPage(i). We name the output pdf with the original name, add -Page and the page number. name = pdf[:-4]+"-Page "+str(i)+".pdf". Finally, we save the output.

with open(name, "wb") as outputStream: 
output.write(outputStream)

Complete code

from PyPDF2 import PdfFileWriter, PdfFileReader
import os

files = [f for f in os.listdir(".") if os.path.isfile(f)]
files = list(filter(lambda f: f.lower().endswith((".pdf")), files))

for pdf in files:
    with open(pdf, "rb") as f:
        inputpdf = PdfFileReader(f)

        for i in range(inputpdf.numPages):
            output = PdfFileWriter()
            output.addPage(inputpdf.getPage(i))
            name = pdf[:-4]+"-Page "+str(i)+".pdf"
            with open(name, "wb") as outputStream:
                output.write(outputStream)

 

Creating a simple file dialog box with Tkinter - Have you come to a point when you want to avoid running your Python scriot in the same folder as your files you want to work with? Do you find it annoying to input the file paths each time? Using Tkinter, you can create a native file dialog, that easily returns multiple files as a list or any other data type of your choosing. Check out the code below! (more…)

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram