The one-liner:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

This is brilliant.

    • 👍Maximum Derek👍@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 months ago

      Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 months ago

        That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?

        • 👍Maximum Derek👍@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 months ago

          My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).