So one of my pdfs has a page number and a link at the bottom of every page. It’s around 500 pages so I dont want to edit it manually. Is there any way I can delete those things all at once from all pages of the pdf?

Maybe ghost script or python script can do this?

I also notice there isn’t a PDF community in Lemmy, maybe somebody should create one.

Thanks a lot in advance.

  • Cyberflunk@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    1 day ago

    Claude code wrote and opencv python app to remove every other word from the Declaration of Independence.

    Not really, but you wondered

  • nexguy@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    2 days ago

    Just a thought, put a white square over the desired position on every page… quicker? (Not sure how to do it though)

  • thevoidzero@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    2 days ago

    I don’t know how comfortable you are writing your own, but pdf saves the components with coordinates, bounding box etc so you should be able to automate it with a small script that reads pdf components directly.

    Also try qpdf to convert pdf into qdf format, then you can open it in a text editor, find the element you want to remove. Look at examples of few pages, find the pattern and do regex replace. Make sure to keep a copy and check the diff before accepting it.

  • Anon518@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    edit-2
    1 day ago

    Why did you link to reddit? Can a mod/admin do something about this? People keep advertising reddit here.

  • beepbooprobot@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    2 days ago

    PDFGear might be able to handle this if your link is available in the page footer.

    Open the file, click edit, header & footer then update and make your changes globally across all 500 pages.

  • VoxAliorum@lemmy.ml
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    2 days ago

    I tend towards: No. I can’t know for sure, but given how pdfs are structured this sounds very difficult.

    A workaround might be to automatically place white boxes over those, but you can probably still select the text underneath afterwards.

  • gi1242@lemmy.world
    link
    fedilink
    arrow-up
    1
    arrow-down
    2
    ·
    2 days ago

    pdftk can split it into pages, and then recombine the pages. not sure how to automatically remove an element from each page unfortunately.