Do you need to count the total number of pages in your pdf files?
Look no further, as PyPDF2 is here to help you!
Here is the documentation: https://pythonhosted.org/PyPDF2/
To count the number of pages in a PDF file, you need only four lines of code.
You can install PyPDF2 with pip (PyPi link):
py -m pip install PyPDF2
Ok ok enough installing, what do we need to do to count the pages?
First, we want to import the PdfFileReader class from PyPDF2
from PyPDF2 import PdfFileReader
After that, we need to open our PDF file in binary reading mode.
with open("your_pdf_file.pdf", "rb") as pdf_file:
We then want to instantiate our PdfFileReader object
pdf_reader = PdfFileReader(pdf_file)
We then get the number of pages with the numPages property
print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
That's it! We have now counted the number of pages in a PDF file with Python!
#!/usr/bin/env python3
"""
Extracting number of pages in the document
getNumPages()
Calculates the number of pages in this PDF file.
Returns: number of pages
Return type: int
Raises PdfReadError:
if file is encrypted and restrictions prevent this action.
numPages
Read-only property that accesses the getNumPages() function.
"""
from PyPDF2 import PdfFileReader
# Load the pdf to the PdfFileReader object with default settings
with open("your_pdf_file.pdf", "rb") as pdf_file:
pdf_reader = PdfFileReader(pdf_file)
print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
Sometimes we need simple and basic tools to get the job done. At work, we have people that use pdf files daily, on which they need to perform certain manual operations. One of these operations is rotating pages. Thinking of programming a pdf rotator can look quite massive at first, but is it really?
To build our tool, we need to be able to rotate the pdf in three ways, clockwise, counterclockwise and 180 degrees. For simplicity, we want to rotate all pdf files in our current working directory. The user shall also be able to use the finished script, without installing Python or any dependencies on Windows. Let us walk through the steps in creating our tool.
Rotating a pdf with PyPDF2 can be done with the PageObject class's method RotateClockwise. The method takes one Int parameter, angle
which defines the rotation degrees. Note that the angle have to be specified in incremetns of 90°. There is no possibility of rotating a PDF page for example 55°.
Ok, we know how to rotate our page, now we need to load our PDF file into memory. After that, we need to initialize our PdfFileReader and PdfFileWriter objects. We can then loop through our pages by using the readers numPages variable. We get the page, rotate it, write it to our new PDF and then save it to disc.
import PyPDF2
with open("test.pdf", "rb") as pdf_file: pdf_reader = PyPDF2.PdfFileReader(pdf_file) pdf_writer = PyPDF2.PdfFileWriter() print("Rotating", degrees) for page_num in range(pdf_reader.numPages): pdf_page = pdf_reader.getPage(page_num) pdf_page.rotateClockwise(degrees) pdf_writer.addPage(pdf_page) with open("test_rotated.pdf", "wb") as pdf_file_rotated: pdf_writer.write(pdf_file_rotated)
A command line interface might work for many users, but believe me, a Graphical USer Interface (GUI) beats a Command Line Interface (CLI) by lightyears for the average user. To easily create our interface, we use Tkinter. We need three Radiobuttons for specifying the rotations, Left, Right and 180 degrees. We also need some kind of descripte text to guide the user, as well as a Button for being able to start the rotation. See the code comments for further descriptions.
import tkinter as tk # Create our root widget, set title and size master = tk.Tk() master.title("PDF rotator") master.geometry("400x100") # Create a IntVar for getting our rotate values master.degrees = tk.IntVar() # Create a description label and a couple radiobuttons, add them to the widget tk.Label(master, text="Rotates all pdf in the current folder the selected degrees.").grid(row=0,columnspan=4) tk.Radiobutton(master, text="Right 90 degrees", variable=master.degrees, value=90).grid(row=1,column=1) tk.Radiobutton(master, text="Left 90 degrees", variable=master.degrees, value=-90).grid(row=1,column=2) tk.Radiobutton(master, text="180 degrees", variable=master.degrees, value=180).grid(row=1,column=3) # Create a button for calling our function master.ok_button = tk.Button(master, command=rotate_pdf, text="Rotate pdf files") master.ok_button.grid(row=2,column=1) # Run tk.mainloop()
We want to rotate all pdf files in the folder where our script is contained.
import os # Get all the files in current folder from where we are running the script files = [f for f in os.listdir('.') if os.path.isfile(f)] files = list(filter(lambda f: f.lower().endswith(('.pdf')), files))
Here is the complete code for rotating our pdf files. Enjoy!
#!/usr/bin/env python3 # -*- coding: <utf-8> -*- import PyPDF2 import tkinter as tk import os import sys # Get all the files in current folder from where we are running the script files = [f for f in os.listdir('.') if os.path.isfile(f)] files = list(filter(lambda f: f.lower().endswith(('.pdf')), files)) # main rotate pdf function def rotate_pdf(*args): degrees = master.degrees.get() pdf_rotator(files, degrees) # The pdf rotator def pdf_rotator(files, degrees): for filename in files: if degrees != 0 and degrees != "": with open(filename, "rb") as pdf_file: pdf_reader = PyPDF2.PdfFileReader(pdf_file) pdf_writer = PyPDF2.PdfFileWriter() print("Rotating", degrees) for page_num in range(pdf_reader.numPages): pdf_page = pdf_reader.getPage(page_num) pdf_page.rotateClockwise(degrees) pdf_writer.addPage(pdf_page) with open(filename[:-4]+"rotated_"+str(degrees)+".pdf", "wb") as pdf_file_rotated: pdf_writer.write(pdf_file_rotated) sys.exit() # Create our root widget, set title and size master = tk.Tk() master.title("PDF rotator") master.geometry("400x100") # Create a IntVar for getting our rotate values master.degrees = tk.IntVar() # Create a description label and a couple radiobuttons, add them to the widget tk.Label(master, text="Rotates all pdf in the current folder the selected degrees.").grid(row=0,columnspan=4) tk.Radiobutton(master, text="Right 90 degrees", variable=master.degrees, value=90).grid(row=1,column=1) tk.Radiobutton(master, text="Left 90 degrees", variable=master.degrees, value=-90).grid(row=1,column=2) tk.Radiobutton(master, text="180 degrees", variable=master.degrees, value=180).grid(row=1,column=3) # Create a button for calling our function master.ok_button = tk.Button(master, command=rotate_pdf, text="Rotate pdf files") master.ok_button.grid(row=2,column=1) # Run tk.mainloop()
Saving a finished report or table in Excel is easy. You choose SaveAs and save the sheet as Pdf. Doing this automatically with Python is a bit trickier though. In this post, we will take a closer look on how to do this with the win32 library. The full code is available at the bottom of the post. Note that you need Excel installed in order to run this script successfully.
install the win32 library first with: pip install pypiwin32
. This will install the Win32 Api library, which according to PyPi contains: Python extensions for Microsoft Windows Provides access to much of the Win32 API, the ability to create and use COM objects, and the Pythonwin environment.
To get the file paths we use pathlib. Pathlib was introduced in Python 3.4 so it is quite new (Using Python 3.8 during the writing of this article). We specify the name of the Excel workbook we want to make a pdf of, and also the output pdf's name.
excel_file = "pdf_me.xlsx"
pdf_file = "pdf_me.pdf"
We then create paths from our current working directory (cwd) with Pathlibs cwd() method.
excel_path = str(pathlib.Path.cwd() / excel_file)
pdf_path = str(pathlib.Path.cwd() / pdf_file)
Excel is next up. We start the Excel application and hide it.
excel = client.DispatchEx("Excel.Application")
excel.Visible = 0
We then open our workbook wb = excel.Workbooks.Open(excel_path)
and load our first sheet with ws = wb.Worksheets[1]
Now it is time to use the SaveAs to save our sheet as a pdf. wb.SaveAs(pdf_path, FileFormat=57)
Fileformat 57 is the pdf file format.
We then close our workbook and quit our Excel application. Our pdf is now saved in our working directory.
from win32com import client import win32api import pathlib ### pip install pypiwin32 if module not found excel_file = "pdf_me.xlsx" pdf_file = "pdf_me.pdf" excel_path = str(pathlib.Path.cwd() / excel_file) pdf_path = str(pathlib.Path.cwd() / pdf_file) excel = client.DispatchEx("Excel.Application") excel.Visible = 0 wb = excel.Workbooks.Open(excel_path) ws = wb.Worksheets[1] try: wb.SaveAs(pdf_path, FileFormat=57) except Exception as e: print("Failed to convert") print(str(e)) finally: wb.Close() excel.Quit()
In this post, we are going to have a look at how to split all pages from a single pdf into one-page pdf files. Splitting a pdf into several pages can easily be done with almost any pdf tool worth its salt. However, splitting a pdf into single pages is a manual operation, and if you have to do it on several pdfs an automated tool makes sense. This is where PyPDF2 comes in handy. If you just want the complete code without all the fancy explanations, you can find it at the end.
If you haven't done so already, fire up your command prompt, PowerShell or terminal and install PyPDF2 with pip.
pip install pypdf2
Currently I am running 32-bit Python 3.8 with PyPDF2 version 1.26.0 on Windows 10. The code works on this setup, and probably also for other OS'es.
We start with importing PdfFileWriter and PdfFileReader so that we can read the existing pdf and later write new pdfs. We also need to import sys so that we can check what files we have have in our working directory.
from PyPDF2 import PdfFileWriter, PdfFileReader
import os
First we do a list comprehension in os.listdir(".")
if the provided path is a file os.path.isfile(f)
. After that we filter out all the pdf files from the list files
with files = list(filter(lambda f: f.lower().endswith((".pdf")), files))
.
files = [f for f in os.listdir(".") if os.path.isfile(f)] files = list(filter(lambda f: f.lower().endswith((".pdf")), files))
Now it is time to process all our pdf files. We go through each of our pdf in files with a for loop for pdf in files:
. We then open the pdf with open(pdf, "rb") as f:
and load each pdf into a PdfFileReader object with inputpdf = PdfFileReader(f)
.
Now it is time to start the splitting. With another for loop, we loop through all pages in the pdf. You can get the number of pages with numPages
. We create a PdfFileWriter object named output and add the first page with getPage(i)
. We name the output pdf with the original name, add -Page and the page number. name = pdf[:-4]+"-Page "+str(i)+".pdf"
. Finally, we save the output.
with open(name, "wb") as outputStream:
output.write(outputStream)
from PyPDF2 import PdfFileWriter, PdfFileReader import os files = [f for f in os.listdir(".") if os.path.isfile(f)] files = list(filter(lambda f: f.lower().endswith((".pdf")), files)) for pdf in files: with open(pdf, "rb") as f: inputpdf = PdfFileReader(f) for i in range(inputpdf.numPages): output = PdfFileWriter() output.addPage(inputpdf.getPage(i)) name = pdf[:-4]+"-Page "+str(i)+".pdf" with open(name, "wb") as outputStream: output.write(outputStream)
Portable Document Format (PDF) is wonderful as long as you do just have to read the format, not work with it. The pdf format is not really meant to be tampered with, so that is why pdf editing is normally a hard thing to do. It is defacto a worldwide standard so you will most likely come across it when coding. Read along to see how to tackle the PDF format and how to do a search to find the information contained within them. (more…)