I have some good PDF ebooks I’m willing to share, but I suspect the seller embeds some tracking data in them to link them to my account, as every time I download them from the official website they have a different hash while being visually identical. The same when checking against the copies a friend bought from the same seller. Since I dont wanna get banned, can you recommend a way to remove that stuff?

  • arr@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    9
    ·
    1 year ago

    You can of course remove the metadata, but you can’t really be sure you removed all watermarks hidden in the actual content, unless you can make two downloads from different sources have the same hash with whatever method you’re going to use. That way you’d know for certain that you caught whatever was inserted to identify you. Anything other than metadata will be very hard to find and remove in an automated way unless you already know exactly what you’re looking for though.

    That said, this is how I’ve cleaned up metadata in batches of PDF files using qpdf and exiftool in the past:

    for file in *.pdf; do
        exiftool -all:all= -overwrite_original "$file"
        qpdf --linearize --replace-input "$file"
    done