Motherboard sales are now collapsing amid unprecedented shortages fueled by AI

floofloof@lemmy.ca · 1 day ago

Motherboard sales are now collapsing amid unprecedented shortages fueled by AI

SabinStargem@lemmy.today · 16 hours ago

You can use something like KoboldCPP on Linux, which allows both RAM and VRAM combined to run a model. O’course, not as fast when compared to pure VRAM or the Mac approach, but it is an option. I use my 128gb RAM with some GPUs for running models.

boonhet@sopuli.xyz · 8 hours ago

Ollama and llama.cpp allow it too but it’s super slow in my experience.

SabinStargem@lemmy.today · 1 minute ago

Speed depends on how much of the model is on VRAM, and the dense/MoE architecture of that model. The RAM’s benefit is more about having the ability to run the model in the first place. In any case, a dense Qwen3.6 27b would take up about 27-33gb-ish of memory, plus whatever context size you set.

Upcoming implementation of MTP will increase the size of models, but in exchange, they will also run faster. About a 30%ish boost for dense models, a bit less for Mixture of Expert varieties, from the looks of it.