As someone who’s been on the web since the 90s I hate this.
The web was designed to be user agent agnostic. Desktop, phone, fridge, ai agents, curl, python script - whatever agent you are using shouldn’t matter for access. That’s the whole point of open internet, period.
Same. And just because page size is “low” doesn’t mean shit when they’re flooding requests. Try having public research data and watch how much your costs go up just due to load balancer throughput.
They did have a lot of concerns with abuse though and you can see that in the way the cookies debate went before they were supported in their current form. I think AI crawlers tanking bandwidths for websites and misusing the data they scrape would 100% be something the Mozilla from back then would’ve had concerns over allowing or encouraging.
You’re conflating two different issues. The topic is “for whom the web is for?” not banwidth distribution and optimization.
If LLM bot is being abusive then that’s no different from any other user agent behaving like this and we should expand these protections from intentional/unintentional ddos irrelevant of user agent.
I think your starting point (allowing bot user agents to crawl the web has overlooked benefits) is a good one, but things aren’t black and white–there are clear drawbacks, too. Bots obviously have an orders of magnitude higher potential for abuse; to the point where bot traffic–as it currently stands in the real world–is qualitatively different from human traffic.
we should expand these protections from intentional/unintentional ddos irrelevant of user agent.
Sure, but targeted regulation based on heuristics (in this case, user agent) is also a widely accepted practice. DUI laws exist, even though the goals (fewer murders and safer roads) are already separately regulated.
Would it be nice if we didn’t have to do this? Or there were some other solution? Sure, but I have no idea where to even start, unfortunately.
As someone who’s been on the web since the 90s I hate this.
The web was designed to be user agent agnostic. Desktop, phone, fridge, ai agents, curl, python script - whatever agent you are using shouldn’t matter for access. That’s the whole point of open internet, period.
Open until your server is down because LLM are overloading it
At my company, we had to implement all sorts of WAF rules precisely for that reason. Those things are fucking aggressive.
Same. And just because page size is “low” doesn’t mean shit when they’re flooding requests. Try having public research data and watch how much your costs go up just due to load balancer throughput.
overloading from 200kb of html? We’re not in dialup era anymore
They did have a lot of concerns with abuse though and you can see that in the way the cookies debate went before they were supported in their current form. I think AI crawlers tanking bandwidths for websites and misusing the data they scrape would 100% be something the Mozilla from back then would’ve had concerns over allowing or encouraging.
You’re conflating two different issues. The topic is “for whom the web is for?” not banwidth distribution and optimization.
If LLM bot is being abusive then that’s no different from any other user agent behaving like this and we should expand these protections from intentional/unintentional ddos irrelevant of user agent.
I think your starting point (allowing bot user agents to crawl the web has overlooked benefits) is a good one, but things aren’t black and white–there are clear drawbacks, too. Bots obviously have an orders of magnitude higher potential for abuse; to the point where bot traffic–as it currently stands in the real world–is qualitatively different from human traffic.
Sure, but targeted regulation based on heuristics (in this case, user agent) is also a widely accepted practice. DUI laws exist, even though the goals (fewer murders and safer roads) are already separately regulated.
Would it be nice if we didn’t have to do this? Or there were some other solution? Sure, but I have no idea where to even start, unfortunately.
Instructions unclear, built whole site with nested tables.
Each one had better be in its own iframe.