So one of my pdfs has a page number and a link at the bottom of every page. It’s around 500 pages so I dont want to edit it manually. Is there any way I can delete those things all at once from all pages of the pdf?

Maybe ghost script or python script can do this?

I also notice there isn’t a PDF community in Lemmy, maybe somebody should create one.

Thanks a lot in advance.

  • nexguy@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    2 months ago

    Just a thought, put a white square over the desired position on every page… quicker? (Not sure how to do it though)

  • Cyberflunk@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    2 months ago

    Claude code wrote and opencv python app to remove every other word from the Declaration of Independence.

    Not really, but you wondered

  • VoxAliorum@lemmy.ml
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    2 months ago

    I tend towards: No. I can’t know for sure, but given how pdfs are structured this sounds very difficult.

    A workaround might be to automatically place white boxes over those, but you can probably still select the text underneath afterwards.

  • beepbooprobot@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    2 months ago

    PDFGear might be able to handle this if your link is available in the page footer.

    Open the file, click edit, header & footer then update and make your changes globally across all 500 pages.

  • thevoidzero@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    2 months ago

    I don’t know how comfortable you are writing your own, but pdf saves the components with coordinates, bounding box etc so you should be able to automate it with a small script that reads pdf components directly.

    Also try qpdf to convert pdf into qdf format, then you can open it in a text editor, find the element you want to remove. Look at examples of few pages, find the pattern and do regex replace. Make sure to keep a copy and check the diff before accepting it.

  • gi1242@lemmy.world
    link
    fedilink
    arrow-up
    1
    arrow-down
    2
    ·
    2 months ago

    pdftk can split it into pages, and then recombine the pages. not sure how to automatically remove an element from each page unfortunately.

  • Anon518@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    edit-2
    2 months ago

    Why did you link to reddit? Can a mod/admin do something about this? People keep advertising reddit here.

      • Anon518@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        Use an archive site. The OP asked a question. He’s linking to someone on reddit asking the same question??

    • Vendetta9076@sh.itjust.works
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      2 months ago

      Like it or not, reddit isnt an illegal site or anything. Asking mods to do something about linking to reddit is like asking them to do something about linking to twitter. Its not even an “advertisement”.