• fjordo@feddit.uk
    link
    fedilink
    English
    arrow-up
    44
    arrow-down
    2
    ·
    2 months ago

    I wish these companies would realise that acting like this is a very fast way to get scraping outlawed altogether, which is a shame because it can be genuinely useful (archival, automation, etc).

  • klu9@lemmy.ca
    link
    fedilink
    English
    arrow-up
    41
    ·
    2 months ago

    The Linux Mint forums have been knocked offline multiple times over the last few months, to the point where the admins had to block all Chinese and Brazilian IPs for a while.

    • deeferg@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 months ago

      This is the first I’ve heard about Brazil in this type of cyber attack. Is it re-routed traffic going there or are there a large number of Brazilian bot farms now?

      • klu9@lemmy.ca
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 months ago

        I don’t know why/how, just know that the admins saw the servers were being overwhelmed by traffic from Brazilian IPs and blocked it for a while.

  • grue@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    ·
    2 months ago

    ELI5 why the AI companies can’t just clone the git repos and do all the slicing and dicing (running git blame etc.) locally instead of running expensive queries on the projects’ servers?

    • zovits@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      2 months ago

      Takes more effort and results in a static snapshot without being able to track the evolution of the project. (disclaimer: I don’t work with ai, but I’d bet this is the reason and also I don’t intend to defend those scraping twatwaffles in any way, but to offer a possible explanation)

  • Fijxu@programming.dev
    link
    fedilink
    English
    arrow-up
    18
    ·
    2 months ago

    AI scrapping is so cancerous. I host a public RedLib instance (redlib.nadeko.net) and due to BingBot and Amazon bots, my instance was always rate limited because the amount of requests they do is insane. What makes me more angry, is that this fucking fuck fuckers use free, privacy respecting services to be able to access Reddit and scrape . THEY CAN’T BE SO GREEDY. Hopefully, blocking their user-agent works fine ;)

    • enrich@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      I posted on your guestbook but the link was broken.

      I’d say be wary of Anubis author.

      I noticed you started using Anubis recently. Take a look here https://github.com/Xe/x/issues/701 also PRs 702, 703, 704

      -She made GNOME say something the project doesn’t agree
      -She tried to push her beliefs where it was unnecessary and disrespectful
      -She still refuses to remove things in her code that is disrespectful, some are mere comments and serves no real purpose
      -She refused to accept PRs, discuss changes, refuse dictionary definitions seemingly because of her ego
      -After all that she locked conversations in the issue/PRs as a result nobody else can show support now

      If she has a belief, there are other mediums/ways to express it, why like this?

      This is unwelcoming and definitely not FOSS spirit.

  • MonkderVierte@lemmy.ml
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    1
    ·
    edit-2
    2 months ago

    Assuming we could build a new internet from the ground up, what would be the solution? IPFS for load-balancing?

  • 𝕸𝖔𝖘𝖘@infosec.pub
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    Failtoban should add all those scraper IPs, and we need to just flat out block them. Or send them to those mazes. Or redirect them to themselves lol