Script that converts PDF into hand-written books

Script that converts PDF into hand-written books

I created a Python script that takes 1 PDF as an input and create another PDF with same content but text style is like it is a hand-written book.

Variables

So first I created a simply file “script.py”, opened it in Vim by running the command “nvim script.py” and created 3 variables in it:

input_pdf = "input.pdf"
output_pdf = "output/output.pdf"
font_path = "fonts/Satisfy-Regular.ttf"

I created 2 folders:

  • output: It will have all the temporary files created during processing and it will have the final output PDF as well.
  • fonts: It has all the fonts you can use. I am using Satisfy I downloaded it from Google fonts.

Libraries

I will also import all the libraries I am going to use in this script:

import fitz
from PIL import Image, ImageDraw, ImageFont
from fpdf import FPDF
import os
  • fitz: I am using this library to open the PDF file and to extract the text from it.
  • PIL: New PDF file will be generated by taking each page of input PDF as image and add it in output PDF.
  • FPDF: To create a new PDF file.
  • os: To list directories to find the total number of pages of a PDF. This library is also used to delete the temporary images generated during processing.

Main function

Then I created my main function that takes 3 arguments:

  • 1st: Input PDF path
  • 2nd: Path where output PDF will be saved
  • 3rd: Path of the font you want to use
def convert_pdf_to_handwritten(input_pdf_path, output_pdf_path, font_path):
    text = extract_text_from_pdf(input_pdf_path)
    
    image_path = "output/handwritten_page"
    create_handwritten_image(text, font_path, image_path)
    
    num_pages = len([f for f in os.listdir() if f.startswith(image_path) and f.endswith(".png")])

    create_pdf_from_images(image_path, num_pages, output_pdf_path)

    print(f"Handwritten PDF created at: {output_pdf_path}")

Extract text from PDF

After that, I will create a separate function that will extract text from PDF.

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page_num in range(len(doc)):
        page = doc.load_page(page_num)
        text += page.get_text("text") + "\n"
    return text

Create handwritten image

Next function I created is responsible for creating a new page for each image, load the mentioned font, set page margins. Starts from 1st page and loop through all lines for each page and write them on each image. It will keep creating new images and writing text on them until it reaches the end of PDF.

def create_handwritten_image(text, font_path, image_path):
    width, height = 2480, 3508  # A4 at 300 DPI
    img = Image.new('RGB', (width, height), color='white')
    
    font = ImageFont.truetype(font_path, size=60)
    draw = ImageDraw.Draw(img)
    
    x, y = 100, 100
    line_height = 80
    
    page_count = 1

    for line in text.split("\n"):
        # If the text reaches the bottom of the page, create a new page
        if y + line_height > height - 100:
            img.save(f"{image_path}_page{page_count}.png")
            img = Image.new('RGB', (width, height), color='white')
            draw = ImageDraw.Draw(img)
            y = 100
            page_count += 1

        draw.text((x, y), line, font=font, fill="black")
        y += line_height

    img.save(f"{image_path}_page{page_count}.png")

Then our main function “convert_pdf_to_handwritten” will loop through all the images generated and it will tell the number of pages this PDF has.

Create PDF from images

Next it will call a function that will loop through total number of pages, add a new page in PDF and add an A4 image in it. After all the images has been added in all the pages, it will save the new PDF in the output path.

def create_pdf_from_images(image_path_prefix, num_pages, output_pdf_path):
    pdf = FPDF()
    for page_num in range(1, num_pages + 1):
        pdf.add_page()
        pdf.image(f"{image_path_prefix}_page{page_num}.png", x = 0, y = 0, w = 210, h = 297)  # A4 size
    pdf.output(output_pdf_path)

At this point, when you run the script, you will see a lot of images gets created in your output folder and you will also see your output PDF file. But you only need the PDF file, not the images that are generated temporarily.

Delete Temporary Images

So we will add another function that will remove all those temporary image files.

def delete_png_files(image_path_prefix):
    files = os.listdir()
    
    png_files = [f for f in files if f.startswith(image_path_prefix) and f.endswith(".png")]
    
    for png_file in png_files:
        os.remove(png_file)
        print("Deleted file: " + png_file)

Usage

input_pdf = "input.pdf"
output_pdf = "output/output.pdf"
font_path = "fonts/Satisfy-Regular.ttf"

convert_pdf_to_handwritten(input_pdf, output_pdf, font_path)

delete_png_files("handwritten_page")

That’s how you can convert PDF into a hand-written book.