• Norah (pup/it/she)@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    9
    ·
    19 days ago

    LLMs don’t benefit from economies of scale. Usually, each successive generation of a technology is cheaper to produce, or stays the same but with much greater efficiency/power/efficacy/etc. For LLMs, each successive generation costs much more to produce for lesser and lesser benefits.

    • humanspiral@lemmy.ca
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      19 days ago

      LLMs don’t benefit from economies of scale.

      For training, compute and memory scale does matter, including networked large scale clusters (of GPUs). No money is made in training. Inference (where money is made/charged or benefits obtained), memory more important, but compute still extremely important. At Skynet level, models over 512gb are used. But consumer level, and every level smaller models are much faster. 16gb, 24gb, 32gb, 96gb, 128gb, and 512gb are each somewhat approachable. But each of these thresholds are some version of scale.

      each successive generation of a technology is cheaper to produce, or stays the same but with much greater efficiency/power/efficacy/etc.

      The roadmaps for GPU makers are, well for nvidia only for simplicity, Rubin will have 5 times the bandwidth, double the memory and at least double the compute. For what is likely 2x the cost, less than 2x the power. A big issue with bubble status is a fairly sharp depreciation in existing leading edge devices. Bigger memory alone is always a faster overall solution than networking/connections.

      For LLMs, each successive generation costs much more to produce for lesser and lesser benefits.

      Bigger parameter models are slower for same training data sets than smaller parameter models. Skynet ambitions do involve ever larger parameters, and sure more training data is added rather than any removed. There is innovation in generations on the smaller/efficiency side too, though Skynet funding is for the former.