This is the technology worth trillions of dollars huh

HarkMahlberg@kbin.earth · 6 months ago

This is the technology worth trillions of dollars huh

fading_person@lemmy.zip · 6 months ago

Thank you very much for taking your time to explain this. if you don’t mind, do you recommend some reference for further reading on how llms work internally?

cyberwolfie@lemmy.ml · 6 months ago

You could look up 3Blue1Brown’s explainers on YouTube, they are pretty good and shows a lot of visual examples. He has a lot of other videos on other areas of math.

fading_person@lemmy.zip · 6 months ago

I’ll check it later, thanks

JustTesting@lemmy.hogru.ch · 6 months ago

For the byte pair encoding (how those tokens get created) i think https://bpemb.h-its.org/ does a good job at giving an overview. after that i’d say self attention from 2017 is the seminal work that all of this is based on, and the most crucial to understand. https://jtlicardo.com/blog/self-attention-mechanism does a good job of explaining it. And https://jalammar.github.io/illustrated-transformer/ is probably the best explanation of a transformer architecture (llms) out there. Transformers are made up of a lot of self attention.

it does help if you know how matrix multiplications work, and how the backpropagation algorithm is used to train these things. i don’t know of a good easy explanation off the top of my head but https://xnought.github.io/backprop-explainer/ looks quite good.

and that’s kinda it, you just make the transformers bigger, with more weight, pluck on a lot of engineering around them, like being able to run code and making it run more efficientls, exploit thousands of poor workers to fine tune it better with human feedback, and repeat that every 6-12 month for ever so it can stay up to date.

fading_person@lemmy.zip · 6 months ago

Thank you very much