Stripping Metadata from PDFs: Difference between revisions

Created page with "This guide will show you, from start to finish, how to: * Install and configure tools for scrubbing metadata * Show you how to use those tools to scrub metadata * Touch on some best practices for handling images and PDFs when you intend to publish them = Installation = == Windows == Before you begin you will need to install LibreOffice, Exiftool, and QPDF. While other document editors might be suitable, we recommend LibreOffice due to its open-source nature and '''hi..."
 
No edit summary
 
Line 5: Line 5:
* Touch on some best practices for handling images and PDFs when you intend to publish them
* Touch on some best practices for handling images and PDFs when you intend to publish them


= Installation =
== Installation ==


== Windows ==
=== Windows ===
Before you begin you will need to install LibreOffice, Exiftool, and QPDF.
Before you begin you will need to install LibreOffice, Exiftool, and QPDF.


While other document editors might be suitable, we recommend LibreOffice due to its open-source nature and '''highly suggest avoiding Microsoft Office'''.
While other document editors might be suitable, we recommend LibreOffice due to its open-source nature and '''highly suggest avoiding Microsoft Office'''.


=== Prerequisites ===
==== Prerequisites ====
 
* Ensure you have administrative privileges for installations and PATH edits.
* Ensure you have administrative privileges for installations and PATH edits.
* Winget must be available (included in Windows 10/11 via the App Installer; update it from the Microsoft Store if needed). If <code>winget</code> is not recognized in Command Prompt, install it manually from Microsoft's official GitHub repository: https://github.com/microsoft/winget-cli.
* Winget must be available (included in Windows 10/11 via the App Installer; update it from the Microsoft Store if needed). If <code>winget</code> is not recognized in Command Prompt, install it manually from Microsoft's official GitHub repository: https://github.com/microsoft/winget-cli.


=== LibreOffice ===
==== LibreOffice ====
To install LibreOffice on Windows, download it from their official download link below and run the installer: https://www.libreoffice.org/download/download-libreoffice/
To install LibreOffice on Windows, download it from their official download link below and run the installer: https://www.libreoffice.org/download/download-libreoffice/


=== Exiftool ===
==== Exiftool ====
To install <code>exiftool</code> on Windows, first open a command prompt window (type Win+R, type in <code>cmd.exe</code>, and hit enter to open a prompt window) then copy/paste the following command in and hit enter. Follow the on screen instructions for the installer:
To install <code>exiftool</code> on Windows, first open a command prompt window (type Win+R, type in <code>cmd.exe</code>, and hit enter to open a prompt window) then copy/paste the following command in and hit enter. Follow the on screen instructions for the installer:


<code>winget install --id OliverBetz.ExifTool -e</code>
<code>winget install --id OliverBetz.ExifTool -e</code>


=== QPDF ===
==== QPDF ====
Install <code>qpdf</code> by entering the following command into the Command Prompt:
Install <code>qpdf</code> by entering the following command into the Command Prompt:


Line 38: Line 37:
Search for "edit system variables" in the Windows search bar and open it. Click on the "Environment variables" button. Click on "PATH" under user variables, then click "Edit". A new window will pop up. Click the new button and then paste the folder path to the qpdf bin folder. Click OK to close both windows.
Search for "edit system variables" in the Windows search bar and open it. Click on the "Environment variables" button. Click on "PATH" under user variables, then click "Edit". A new window will pop up. Click the new button and then paste the folder path to the qpdf bin folder. Click OK to close both windows.


== Linux ==
=== Linux ===
Install <code>exiftool</code> and <code>qpdf</code> through your package manager.
Install <code>exiftool</code> and <code>qpdf</code> through your package manager.


= Scrubbing the Data =
== Scrubbing the Data ==
 
== Scrub Images ==
'''NOTE''': It is important to scrub images '''before''' embedding them into your document. Scrubbing a PDF does not scrub any images inside of it.
 
'''NOTE''': You should take backups of any images you perform this process on. Consider copying them into a new folder and then working on those copies.


These instructions focus on JPEGs but can be adapted for other formats, like PNG or TIFF. Use <code>-ext png</code> for PNGs, or specify multiple with <code>-ext jpg -ext png</code>. Scrub only the images you'll embed; if your document includes vectors or other embeds (e.g., from LibreOffice Draw), they must be scrubbed separately. Make sure that all of the images have been scrubbed before you proceed to creating your pdf.
=== Scrub Images ===
<blockquote>'''Note''': It is important to scrub images '''before''' embedding them into your document. Scrubbing a PDF does not scrub any images inside of it.</blockquote><blockquote>'''Note''': You should take backups of any images you perform this process on. Consider copying them into a new folder and then working on those copies.</blockquote>These instructions focus on JPEGs but can be adapted for other formats, like PNG or TIFF. Use <code>-ext png</code> for PNGs, or specify multiple with <code>-ext jpg -ext png</code>. Scrub only the images you'll embed; if your document includes vectors or other embeds (e.g., from LibreOffice Draw), they must be scrubbed separately. Make sure that all of the images have been scrubbed before you proceed to creating your pdf.


Open a <code>cmd.exe</code> window and change directory to the new image folder (you did read the notes at the top of this section, right?). The command to do this is below. Be sure to use the correct filepath:
Open a <code>cmd.exe</code> window and change directory to the new image folder (you did read the notes at the top of this section, right?). The command to do this is below. Be sure to use the correct filepath:
Line 58: Line 53:
<code>exiftool -overwrite_original -all= -r -ext jpg C:\[path to folder]\images</code>
<code>exiftool -overwrite_original -all= -r -ext jpg C:\[path to folder]\images</code>


== Create Your PDF ==
=== Create Your PDF ===
We recommend using '''LibreOffice''' to draft your documentation as we cannot fully trust Microsoft Office to not embed data into the resulting PDF in ways we have not yet discovered.
We recommend using '''LibreOffice''' to draft your documentation as we cannot fully trust Microsoft Office to not embed data into the resulting PDF in ways we have not yet discovered.


In LibreOffice, you should then export to PDF through the "File > Export As > Export" menu. For other word processing software, such as MS Word (which we do not recommend that you use), we recommend that you save the document as a PDF - do not "print to PDF".
In LibreOffice, you should then export to PDF through the "File > Export As > Export" menu. For other word processing software, such as MS Word (which we do not recommend that you use), we recommend that you save the document as a PDF - do not "print to PDF".


== Scrub Your PDF ==
=== Scrub Your PDF ===
Scrub all PDF metadata with the command below. Be sure to use the correct file path:
Scrub all PDF metadata with the command below. Be sure to use the correct file path: