- 1 Post
- 2 Comments
Joined 3 years ago
Cake day: November 2nd, 2022
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.
blob42@lemmy.mlto
Selfhosted@lemmy.world•Anubis is awesome! Stopping (AI)crawlbotsEnglish
1·8 months agoI am planning to try it out, but for caddy users I came up with a solution that works after being bombarded by AI crawlers for weeks.
It is a custom caddy CEL expression filter coupled with caddy-ratelimit and caddy-defender.
Now here’s the fun part, the defender plugin can produce garbage as response so when a matching AI crawler fits it will poison their training dataset.
Originally I only relied on the rate limiter and noticed that AI bots kept trying whenever the limit was reset. Once I introduced data poisoning they all stopped :)
git.blob42.xyz { @bot <<CEL header({'Accept-Language': 'zh-CN'}) || header_regexp('User-Agent', '(?i:(.*bot.*|.*crawler.*|.*meta.*|.*google.*|.*microsoft.*|.*spider.*))') CEL abort @bot defender garbage { ranges aws azurepubliccloud deepseek gcloud githubcopilot openai 47.0.0.0/8 } rate_limit { zone dynamic_botstop { match { method GET # to use with defender #header X-RateLimit-Apply true #not header LetMeThrough 1 } key {remote_ip} events 1500 window 30s #events 10 #window 1m } } reverse_proxy upstream.server:4242 handle_errors 429 { respond "429: Rate limit exceeded." } }If I am not mistaken the 47.0.0.0/8 ip block is for Alibaba cloud

Solution:
Use a VPS somewhere that will act as a proxy to your home server.
Connect the VPS to your home with wireguard/tailscale and do reverse proxy to your game server.
Now the public IP will be your VPS. Host it in an other close by country.
Of the IP does get flagged, change the VPS provider and keep the exact same setup.
Read /r/selfhosted or use GPT for a step by step guide