Funny thing is ‘Local ML’ tinkerers largely can’t afford GPUs in the US either.
The 5090 is ludicrously expensive for its VRAM pool. So is the 4090, which is all but OOS. Nvidia will only sell you a decent-sized pool for $10K. Hence non-techbros here have either been building used RTX 3090 boxes (the last affordable compute GPU Nvidia ever sold), EPYC homelabs for CPU offloading, or have been trying to buy those modded 48GB 4090s back.
The insane supply chain is something like this:
Taiwan GPUs -> China
China GPU boards -> US
US GPU Boards -> Smuggled back into China
Deneutered GPU Boards -> Sold back to US
All because Nvidia is playing VRAM cartel and AMD, inexplicably, is uninterested in competing with it when they could sell 48GB 7900s basically for free.
You could also buy the Apple Studio with its large amount of unified ram for a similar price of a 5090. Of course it’s not as fast but it could run a model that needs more ram.
The pricing for memory is still pretty bad. $4K for 96GB, $5.6K for 256GB, $10K for 512GB. One can get 128GB on the M4 Max for $3.5K, at the cost of a narrower bus so it’s even slower, but generally, EPYC + a 3090 or 4090 makes a lot more sense.
SOTA quantization for these are mostly DIY. There aren’t many MLX DWQs or trellis-quantized GGUFs floating around.
But if you want to finetune or tinker instead of just run, you’re at an enormous disadvantage there. AMD’s Strix Halo boards are way more compatible, but not standalone yet and kinda rare at this point.
Funny thing is ‘Local ML’ tinkerers largely can’t afford GPUs in the US either.
The 5090 is ludicrously expensive for its VRAM pool. So is the 4090, which is all but OOS. Nvidia will only sell you a decent-sized pool for $10K. Hence non-techbros here have either been building used RTX 3090 boxes (the last affordable compute GPU Nvidia ever sold), EPYC homelabs for CPU offloading, or have been trying to buy those modded 48GB 4090s back.
The insane supply chain is something like this:
Taiwan GPUs -> China
China GPU boards -> US
US GPU Boards -> Smuggled back into China
Deneutered GPU Boards -> Sold back to US
All because Nvidia is playing VRAM cartel and AMD, inexplicably, is uninterested in competing with it when they could sell 48GB 7900s basically for free.
You could also buy the Apple Studio with its large amount of unified ram for a similar price of a 5090. Of course it’s not as fast but it could run a model that needs more ram.
The pricing for memory is still pretty bad. $4K for 96GB, $5.6K for 256GB, $10K for 512GB. One can get 128GB on the M4 Max for $3.5K, at the cost of a narrower bus so it’s even slower, but generally, EPYC + a 3090 or 4090 makes a lot more sense.
SOTA quantization for these are mostly DIY. There aren’t many MLX DWQs or trellis-quantized GGUFs floating around.
But if you want to finetune or tinker instead of just run, you’re at an enormous disadvantage there. AMD’s Strix Halo boards are way more compatible, but not standalone yet and kinda rare at this point.