Maybe don't do this (encrypted PDFs)

I received an encrypted PDF via email. It was legitimate and expected, so this is not a post about how you shouldn’t open email attachments (you shouldn’t). Instead, this is about the “password” that was chosen, and email in general.

As the sender explains in their email, the password is six numbers that I should know… and I do. The problem is that the numbers are not hard for anyone else to find out. As it turns out, that doesn’t even matter.

Let’s just see how long it takes to brute force every six-digit number.

Enter “John the Ripper”. https://github.com/openwall/john

Generate the Hash

John the Ripper operates on hashes, so you need a tool to generate the appropriate hashes for your file type. Unsurprisingly, they provide many and you can find others online.

We’ll use pdf2john.py from their GitHub repo:

wget "https://github.com/openwall/john/raw/refs/heads/bleeding-jumbo/run/pdf2john.py"

If you run it, you’ll likely see an error message tell you some dependencies are missing:

$ python3 pdf2john.py 
pyhanko is missing, run 'pip install --user pyhanko==0.20.1' to install it!

We’ll set up a Python venv, and then pip those dependencies:

$ python3 -m venv .

$ bin/pip install pyhanko==0.20.1
Collecting pyhanko==0.20.1
  Downloading pyHanko-0.20.1-py3-none-any.whl.metadata (9.2 kB)
  ... lots of output ...
Successfully installed asn1crypto-1.5.1 certifi-2025.4.26 cffi-1.17.1 charset-normalizer-3.4.2 click-8.2.1 cryptography-45.0.3 idna-3.10 oscrypto-1.3.0 pycparser-2.22 pyhanko-0.20.1 pyhanko-certvalidator-0.24.1 pyyaml-6.0.2 qrcode-8.2 requests-2.32.3 tzlocal-5.3.1 uritools-5.0.0 urllib3-2.4.0

$ bin/python pdf2john.py 
usage: pdf2john.py [-h] [-d] pdf_files [pdf_files ...]
pdf2john.py: error: the following arguments are required: pdf_files

Looks promising. So let’s generate the hash for John the Ripper:

$ bin/python pdf2john.py encrypted.pdf 
$pdf$5*6*256*-1028*1*16*1bcfb102bda1aff0e8e0f463c0d6e7d09540e1d938d0e5dbf16935cb767b6ac00166ecafbd10b5a28eeda9473d0a1f

Nope, not the actual hash. Not even close.

Let’s get ripping…

Don’t use the Ubuntu package

Originally, I tried using John the Ripper straight out of the Ubuntu repositories. Unfortunately this was a dead end. The version from Ubuntu doesn’t have the code to handle PDF encryption. It will give you this misleading error message:

$ john encrypted.hash
No password hashes loaded (see FAQ)

If you try and force it to use the PDF format, it gives you a slightly better error message:

$ john --format=pdf encrypted.hash
Unknown ciphertext format name requested

So much for using the version from Ubuntu. But I had a virtual machine with John setup and connected to a low end Nvidia GPU, so let’s just use that. It was built from source, so it should have all the things.

From source

I spun up my AI/ML virtual machine, copied over the hash that I generated locally. There are plenty of instructions online for building John the Ripper from source, so I won’t cover that here.

Since we know the password consists of only numbers, we’ll tell John: --incremental=digits. This way, it doesn’t waste time on symbols that we know are not part of the password.

53 seconds later…

$ ./john ~/encrypted.hash --incremental=digits
Warning: detected hash type "PDF", but the string is also recognized as "pdf-opencl"
Use the "--format=pdf-opencl" option to force loading these as that type instead
Using default input encoding: UTF-8
Loaded 1 password hash (PDF, PDF encrypted document [MD5-RC4 / SHA2-AES 32/64])
Cost 1 (revision) is 6 for all loaded hashes
Cost 2 (key length) is 256 for all loaded hashes
Will run 4 OpenMP threads
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
0g 0:00:00:08
1g 0:00:00:53 DONE (2025-05-28 10:18) 
Use the "--show --format=PDF" options to display all of the cracked passwords reliably
Session completed. 

Some details we omitted. It will show the password in the output, but you can also ask for it explicitly:

$ ./john --show --format=PDF ~/encrypted.hash 
?:xxxxxx

1 password hash cracked, 0 left

Again, some details omitted.

Summary

Security is hard. I know the sender meant well, but I’m not sure that this was really any better than an unencrypted PDF. Especially since some details about the structure of the password were in the same email.

While email is mostly encrypted in transit at this point, it’s still processed and (likely) stored as plain text at each hop. Email also never really goes away… so if anyone, ever, gets access to the email they are only a few seconds away from decrypting the PDF and gaining access to some personally identifiable information (PII).

So what could have been done better?

A strong, random, unique password would have been better. It needs to be communicated out of band (not in the same, or even another email). An application such as Signal would be best, but even plain SMS would probably have been better. I would have preferred to choose my own password. So, had the sender reached out to me first, allowed me to generate and send a password out of band, this would have been more secure… but there are plenty of other options.

Even if a strong password had been communicated out of band, we still presume the underlying encryption in a PDF is secure and will remain secure for a reasonable amount of time (years, decades). This is a pretty big assumption, and lots of encryption schemes have failed over the years. Again, security is hard.