Do you need to count the total number of pages in your pdf files?
Look no further, as PyPDF2 is here to help you!
Here is the documentation: https://pythonhosted.org/PyPDF2/
To count the number of pages in a PDF file, you need only four lines of code.
You can install PyPDF2 with pip (PyPi link):
py -m pip install PyPDF2
Ok ok enough installing, what do we need to do to count the pages?
First, we want to import the PdfFileReader class from PyPDF2
from PyPDF2 import PdfFileReader
After that, we need to open our PDF file in binary reading mode.
with open("your_pdf_file.pdf", "rb") as pdf_file:
We then want to instantiate our PdfFileReader object
pdf_reader = PdfFileReader(pdf_file)
We then get the number of pages with the numPages property
print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
That's it! We have now counted the number of pages in a PDF file with Python!
#!/usr/bin/env python3
"""
Extracting number of pages in the document
getNumPages()
Calculates the number of pages in this PDF file.
Returns: number of pages
Return type: int
Raises PdfReadError:
if file is encrypted and restrictions prevent this action.
numPages
Read-only property that accesses the getNumPages() function.
"""
from PyPDF2 import PdfFileReader
# Load the pdf to the PdfFileReader object with default settings
with open("your_pdf_file.pdf", "rb") as pdf_file:
pdf_reader = PdfFileReader(pdf_file)
print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
Cool stuff... very fast. Thank you! What else can it do? Any quick way to integrate Tesserect OCR 4.0 with it? If not I'll just have to play around with it!
Thanks,
Andy
https://linkedin.com/in/andyoh365