Proofpoint has released a new open-source tool called PDF Object Hashing that helps security teams detect and track malicious files distributed as PDFs.
The tool is now available on GitHub and represents a significant advancement in identifying suspicious documents used by threat actors in phishing campaigns, malware distribution, and business email compromise attacks.
PDFs have become a primary weapon in cyberattacks because they appear legitimate to users. Threat actors frequently send PDFs containing malicious URLs, QR codes, or fake banking details to trick people into clicking.


Traditional security tools often struggle to catch these threats because PDFs can be modified in countless ways while still looking identical to users.
The PDF format is complex and flexible, which works against security teams. The specification allows multiple ways to represent the same document, giving attackers many options to hide their tracks.
Some PDFs are encrypted, making it even harder to analyze what’s inside. When a PDF is encrypted, security tools cannot read the text, URLs, or images inside it.
Additionally, different parts of a PDF can be stored as plain text or compressed, and important details like domain names might be hidden in these compressed sections.
These variations make it nearly impossible to create simple detection rules that catch all malicious PDFs. When attackers change a URL or swap out a fake invoice image, the entire signature breaks, and the threat slips through.
How PDF Object Hashing Works
Proofpoint’s solution takes a different approach. Instead of focusing on what’s inside the PDF, the tool examines the document’s underlying structure.
It analyzes the types of objects in the PDF and the order they appear, ignoring the specific details within those objects. This creates a “skeleton” or template of the document that remains constant even when attackers modify images, URLs, or text.
The tool then hashes these object types into a unique fingerprint. This fingerprint stays the same even if the threat actor changes the lure image, updates the malicious URL, or modifies other contents.
The technique works even on encrypted PDFs because the document structure remains visible even when the details are hidden.
Proofpoint has already used this tool to track threat actors. The UAC-0050 group, which targets Ukraine, distributes encrypted PDFs containing malware.
Because the files are encrypted, traditional tools cannot extract the malicious URLs inside them. However, PDF Object Hashing allowed Proofpoint to identify these threats by analyzing their structure alone, regardless of encryption.
Another actor known as UNK_ArmyDrive, believed to operate from India, also relies on PDFs in their attack chains.


By using PDF Object Hashing alongside traditional detection methods, security teams can catch variations of their malicious documents that might otherwise be missed.
The tool significantly improves attribution and helps security teams understand when multiple PDF attacks are connected to the same threat actor or campaign.
By clustering PDFs with similar object structures, analysts can identify patterns and predict future attacks.
This development demonstrates how understanding file format intricacies can lead to more robust security solutions.
The open-source release allows the broader security community to integrate this detection method into their own tools and workflows.
Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.