• FauxLiving@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    22
    ·
    edit-2
    4 days ago

    The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

    Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

    There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

    Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

    Which is the point of the article and the article’s title.

    It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.

    • snooggums@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      3
      ·
      4 days ago

      But a user initiated operation isn’t the same as a bot.

      Oh fuck off with that AI company propaganda.

      The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

      Web crawlers for search engines don’t scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn’t matter as much as the fact that they do things very differently and only one of the two respects robots.txt.

      • FauxLiving@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        10
        ·
        4 days ago

        There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

        The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

        You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.

        This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.

        Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.

        This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.

        I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.

        This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.

        • ubergeek@lemmy.today
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          3 days ago

          There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

          There is, in scale.

        • snooggums@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          4 days ago

          There is no difference between emptying a glass of water and draining swimming pool either if you ignore the total volume of water.

          • FauxLiving@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            4
            ·
            edit-2
            4 days ago

            I, too, can make any argument sound silly if I want to argue in bad faith.

            A user cannot physically generate as much traffic as a bot.

            Just like a glass of water cannot physically contain as much water as a swimming pool, so pretending the two are equal is ignorant in both cases.

            • snooggums@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              2
              ·
              4 days ago

              A user cannot physically generate as much traffic as a bot.

              You are so close to getting it!

                • snooggums@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  2
                  ·
                  edit-2
                  4 days ago

                  The AI doesn’t just do a web search and display a page, in grabs the search results and scrapes multiple pages far faster than a person could.

                  It doesn’t matter whether a human initiated it when the load on the website is far, far higher and more intrusive in a shorter period of time with AI compared to a human doing a web search and reading the cobtent themselves.

                  • FauxLiving@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    3
                    arrow-down
                    2
                    ·
                    4 days ago

                    It creates web requests faster than a human could. It does not create web requests as fast as possible like a crawler does.

                    Websites can handle a lot of human user traffic, even if some human users are making 5x the requests of other users due to using automation tools (like LLM summarization).

                    A website cannot handle a single bot which can, by itself, can generate tens of millions of times as much traffic as a human.

                    Cloudflare’s method of detecting bots is to attempt to fingerprint the browser and user behavior to detect automations which are usually run in environments that can’t render the content. They did this because, until now, users did not use automation tools so detecting and blocking automation tools was a way to get most of the bots.

                    Now, users do use automation tools and so this method of classification is dated and misclassifying human generated traffic.

    • ubergeek@lemmy.today
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      3 days ago

      Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted

      Except, they don’t. It’s a toggle, available to users, and by default, allows Perplexity’s scraping.

    • OmgItBurns@discuss.online
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 days ago

      I think part of the issue is that it does act more like a search engine crawler than a traditional user. A lot of sites rely on real human traffic for revenue (serving ads, requests to sign up for Patreon, using affiliate links, etc) that gets bypassed by these bots. Hell in some cases the people running the sites are just looking for interaction. So while there is a spike in traffic, and potentially cost, the people running these sites aren’t getting the benefit of that traffic.

      Basically these have the same issues as the summaries that Google does in their search results but, potentially, have much larger impact on the host’s bandwidth

    • _cryptagion [he/him]@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      4 days ago

      Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation.

      Well I suppose it’s a good thing then that the anti-AI shield is opt-in, and Cloudflare isn’t making any decisions for anyone on whether or not AI scrapers get to visit their pages. That little bit of context makes your entire argument fall apart.

      • FauxLiving@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        3
        ·
        4 days ago

        It isn’t opt in.

        You can block all bot page scraping, and also block user initiated AI tools or you can block no traffic.

        There isn’t an option to block bot page scraping but allow user initiated AI tools.

        Because, as the article points out, Cloudflare is not able to distinguish between the two

        • ubergeek@lemmy.today
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 days ago

          Thats not true, I just viewed my panel in CF, and Perplexity is an optional block, which by default is off.

          • FauxLiving@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            2
            ·
            4 days ago

            There’s a pretty significant difference in request rate. A tool trying to search and summarize will hit a search engine once, and each website maybe 5 times (if every search engine link points to the site).

            A bot trying to scrape content from a website can generate thousands or tens of thousands of requests per second.

    • HarkMahlberg@kbin.earth
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      4 days ago

      In a better timeline, we wouldn’t need to cheer the victory of one megacorporation over another, they would both be the losers. But also people are still capable of holding two thoughts simultaneously.

      For instance, we’d all be happy to see Apple lose the Epic Games lawsuit and be forced out of their monopoly on app stores on iOS. But those same people are aware it would allow Epic to continue being a disgusting company.

      bait the anti-ai crowd

      Oh I see lol

      • FauxLiving@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        3
        ·
        4 days ago

        What does any of that have to do with the fact that Cloudflare isn’t able to classify traffic in order to distinguish between human user generated traffic and mass scraping bot traffic?

        If they’re incapable of distinguishing the two, then their customers are having legitimate user requests blocked by Cloudflare with no ability to opt out.

        Oh I see lol

        Yeah, I think people who’re unable to think rationally about a problem because they made up their mind before knowing any of the details are intellectually lazy.

    • unpossum@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      5
      ·
      4 days ago

      Thank you for trying to fight the irrational anti-AI brainrot on lemmy! It’s probably a lost cause, but your efforts are appreciated :)