Because imagine spending billions on training it specifically to produce useful answers and then not even trusting it to not randomly start answering with something completely unrelated.
If that were true, then anyone with any sense would have recognized a long time ago that deterministically incorrect is a lot more valuable than nondeterministically correct occasionally, and given up on all this language model nonsense.
A deterministic system that produces wrong output can be fixed. A nondeterministic system that produces wrong output cannot be fixed in any way that can be demonstrated conclusively.
Nondeterministic software is basically worthless in any case where accuracy or reliability are required.
I play Go, and AI tools have allowed computers to leave humans completely in the dust, while more deterministic approaches had gotten nowhere close to top level play.
Go has an extremely large number of variations which overwhelms the straightforward, traditional approach. Machine learning allows the computer to get better through experience, by having a bunch of games in its training data that it can pull from to evaluate possible board positions. It also benefits from the fact that, unlike language, every game has a definitive win-lose outcome. This allows AI to get stronger by playing games against itself, even starting from purely random moves.
“So what, I don’t play Go,” sure, but it’s the principle. Given a sufficiently large “probability space” and an objective “win condition” to evaluate itself against, ML algorithms can and do outperform traditional, deterministic algorithms.
The fact that people are trying to put AI on your toaster and shit doesn’t make it completely worthless. But it is massively over hyped and not applicable to most of the applications people are trying to shove it into.
I think using chess and go as analogies rather than misses the point. They’re not trying to get a system to automate playing a game, not really.
They are trying to get it to make intelligent decisions about complex real-world problems, Go has a very simple set of rules that are always true, never change, and are always in play. None of the complexities of real life are replicated. So it’s ability to play Go or Chess or even a more complicated game like a first person shooter are not demonstrations of its ability in the domains in which AI is being advertised for.
I think a far better test of whether a system is actually useful is what it does if it is given no input at all. Does it just sit there forever or does it actually start doing things and currently every single AI system in existence would just stay idle in that scenario.
> i open a repl
> i type in nothing
> nothing happens
> shocked_pikachu.jpg
> i open a window
> i click nothing
> nothing happens
> shocked_pikachu.png
> i buy a computer
> i do not turn it on
> it does nothing
> shocked_pikachu.jxl
demonstrations of its ability in the domains in which AI is being advertised for.
I am absolutely not claiming that AI is useful “in the domains in which it’s being advertised for.” I’m saying that it’s not entirely useless. Despite being overhyped, there are a handful of useful applications.
I think a far better test of whether a system is actually useful is what it does if it is given no input at all.
What? That’s not true at all. My toaster doesn’t go out and do things on its own initiative but it’s still very useful for making toast when I tell it to.
Maybe instead of usefulness, you mean like consciousness or actual intelligence? But that’s pure hype and bullshit. Anyone claiming that a word generator is conscious is either trying to scam you or is being scammed.
Just because someone says (as they do), “This oil will allow you to unlock the hidden power of the 90% of your brain you don’t use, thanks to our new quantum formula, now only $300 a bottle” that doesn’t mean that quantum mechanics isn’t also a real thing that has actual applications. Machine learning is the same way. It attracts all the snake oil salesmen who spout complete and utter bullshit about it, but it is a real technology that has legitimate uses despite all that.
If I have to verify the output of an AI then unless I can do the verification in 30 seconds but work would somehow take me hours then it’s not useful. I can’t think of many scenarios in which verification is fast but the work itself is slow.
This can be the case for coding. A good example is when the change is simple but involves a library you’re unfamiliar with. You can set it off and not have to read any docs, and it will be easy to check if it got the API right.
Elsewhere I gave the example of copyediting. It’s a lot quicker to check the output than to refine it yourself.
Easy-to-verify tasks are everywhere I think. Not at the scale of seconds versus hours, but seconds versus minutes
Comment would seem to make a lotta sense so perhaps the VC money was the wildcard…
Inflection point may have hit for some though? It’s been out just long enough and has been good just long enough (kinda garbage before December 2025) that people we all respect are on board.
Head Linux dude Linus
Wolfram Alpha’s founder Stephen Wolfram
Many others now but big caveat is these folks presumably Do It Right unlike, have to guess, a huge majority of users. Plenty will experience skill atrophy - dangerous for society at large.
Technically all LLMs are somewhat non-deterministic because token fuzzing is basically required to prevent node collapse, though this is tuned so that you should get the same general “answer” even if it isn’t verbatim every run.
this is the most damning fucking part of it. Oh, it’s kind ok sometimes. Fucking hell.
It could be a shitload better, but that would be difficult to source accurate data instead of everything off github and stack overflow and let it fuckin rip bud. This fucking problem has existed since the LITERAL dawn of computing, garbage in, garbage out.
On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Pray tell, Mr Altman, if you were to feed the AI incorrect information, will the AI generate correct results?
There is no magically reliable source of data that will make everything in one LLMs consistently accurate because their underlying design requires some randomization to reflect human conversation.
Dedicated models for specific use purposes where terminology is defined and they are designed to be deterministic would make them a lot better for actual use. We have had those models for years, just without the pretending to be conversational crap and they were constantly improving and actually useful.
because their underlying design requires some randomization to reflect human conversation.
That’s just false. Although the first step of creating an LLM from scratch is to generate a gaussian distribution, which is randomized, those matrices get overwritten multiple times throughout the process of pre-training and fine-tuning, when parametric weights are finely adjusted based on the training data.
During inferencing, tokens pass through various layers along specific embedded vectors weighted for relevance. It’s not random at all. It’s non-deterministic, but that’s not the same thing as random.
If the training data all came from JSTOR or DevDocs or even WikiPedia, it’s going to make much more accurate inferences than if it was trained on Reddit, Quora, and Yahoo Answers.
I’m not defending AI here, but lets keep our criticisms factual.
Except if you make the output token temperature too cold, it has a higher tendency to get stuck in loops and the like. A little bit of actual randomness is important.
It’s not unique to AI, no, but no one said it was. My point is that the noise is important to the functioning of the AI - and makes it even less deterministic, which also makes it poorly suited to automation in critical systems.
I’m not really here to tell you why you should care - you’re free to care about whatever you want to care about. But to explain why other people might care, it’s because it can do things a Google search can’t do. Google search can’t copy-edit your CV or cover letter. Google search can’t synthesise a bunch of different Stackoverflow answers and fit them to the exact scenario you’re talking about. LLMs can and do.
And those are two examples where the cost of an error is low: if your CV comes out with made up shit in it, you can just read through it and check (but you may not have the ability to re-write it better). If the code example doesn’t work, you’re going to run it and check anyway. (It may have a subtle bug, but so can Stackoverflow answers, and that never stopped people from using them)
Could be anything. The point is if I don’t have the skill to write my own CV well. Then I also don’t have the skill to determine if an AI generated CV is written well
Because imagine spending billions on training it specifically to produce useful answers and then not even trusting it to not randomly start answering with something completely unrelated.
What matters is the outcome, not how it is achieved.
And is the outcome good? Eh, sometimes.
If that were true, then anyone with any sense would have recognized a long time ago that deterministically incorrect is a lot more valuable than nondeterministically correct occasionally, and given up on all this language model nonsense.
A deterministic system that produces wrong output can be fixed. A nondeterministic system that produces wrong output cannot be fixed in any way that can be demonstrated conclusively.
Nondeterministic software is basically worthless in any case where accuracy or reliability are required.
“Worthless” is going a bit far.
I play Go, and AI tools have allowed computers to leave humans completely in the dust, while more deterministic approaches had gotten nowhere close to top level play.
Go has an extremely large number of variations which overwhelms the straightforward, traditional approach. Machine learning allows the computer to get better through experience, by having a bunch of games in its training data that it can pull from to evaluate possible board positions. It also benefits from the fact that, unlike language, every game has a definitive win-lose outcome. This allows AI to get stronger by playing games against itself, even starting from purely random moves.
“So what, I don’t play Go,” sure, but it’s the principle. Given a sufficiently large “probability space” and an objective “win condition” to evaluate itself against, ML algorithms can and do outperform traditional, deterministic algorithms.
The fact that people are trying to put AI on your toaster and shit doesn’t make it completely worthless. But it is massively over hyped and not applicable to most of the applications people are trying to shove it into.
I think using chess and go as analogies rather than misses the point. They’re not trying to get a system to automate playing a game, not really.
They are trying to get it to make intelligent decisions about complex real-world problems, Go has a very simple set of rules that are always true, never change, and are always in play. None of the complexities of real life are replicated. So it’s ability to play Go or Chess or even a more complicated game like a first person shooter are not demonstrations of its ability in the domains in which AI is being advertised for.
I think a far better test of whether a system is actually useful is what it does if it is given no input at all. Does it just sit there forever or does it actually start doing things and currently every single AI system in existence would just stay idle in that scenario.
> i open a repl
> i type in nothing
> nothing happens
> shocked_pikachu.jpg
> i open a window
> i click nothing
> nothing happens
> shocked_pikachu.png
> i buy a computer
> i do not turn it on
> it does nothing
> shocked_pikachu.jxl
I am absolutely not claiming that AI is useful “in the domains in which it’s being advertised for.” I’m saying that it’s not entirely useless. Despite being overhyped, there are a handful of useful applications.
What? That’s not true at all. My toaster doesn’t go out and do things on its own initiative but it’s still very useful for making toast when I tell it to.
Maybe instead of usefulness, you mean like consciousness or actual intelligence? But that’s pure hype and bullshit. Anyone claiming that a word generator is conscious is either trying to scam you or is being scammed.
Just because someone says (as they do), “This oil will allow you to unlock the hidden power of the 90% of your brain you don’t use, thanks to our new quantum formula, now only $300 a bottle” that doesn’t mean that quantum mechanics isn’t also a real thing that has actual applications. Machine learning is the same way. It attracts all the snake oil salesmen who spout complete and utter bullshit about it, but it is a real technology that has legitimate uses despite all that.
Non-deterministic software is fine and we’ve been using it for ages. It’s usable when:
That rules out several applications of current LLMs, but it rules in several others.
If I have to verify the output of an AI then unless I can do the verification in 30 seconds but work would somehow take me hours then it’s not useful. I can’t think of many scenarios in which verification is fast but the work itself is slow.
This can be the case for coding. A good example is when the change is simple but involves a library you’re unfamiliar with. You can set it off and not have to read any docs, and it will be easy to check if it got the API right.
Elsewhere I gave the example of copyediting. It’s a lot quicker to check the output than to refine it yourself.
Easy-to-verify tasks are everywhere I think. Not at the scale of seconds versus hours, but seconds versus minutes
Comment would seem to make a lotta sense so perhaps the VC money was the wildcard…
Inflection point may have hit for some though? It’s been out just long enough and has been good just long enough (kinda garbage before December 2025) that people we all respect are on board.
Head Linux dude Linus
Wolfram Alpha’s founder Stephen Wolfram
Many others now but big caveat is these folks presumably Do It Right unlike, have to guess, a huge majority of users. Plenty will experience skill atrophy - dangerous for society at large.
Technically all LLMs are somewhat non-deterministic because token fuzzing is basically required to prevent node collapse, though this is tuned so that you should get the same general “answer” even if it isn’t verbatim every run.
this is the most damning fucking part of it. Oh, it’s kind ok sometimes. Fucking hell.
It could be a shitload better, but that would be difficult to source accurate data instead of everything off github and stack overflow and let it fuckin rip bud. This fucking problem has existed since the LITERAL dawn of computing, garbage in, garbage out.
https://en.wikipedia.org/wiki/Garbage_in,_garbage_out#History
Pray tell, Mr Altman, if you were to feed the AI incorrect information, will the AI generate correct results?
There is no magically reliable source of data that will make everything in one LLMs consistently accurate because their underlying design requires some randomization to reflect human conversation.
Dedicated models for specific use purposes where terminology is defined and they are designed to be deterministic would make them a lot better for actual use. We have had those models for years, just without the pretending to be conversational crap and they were constantly improving and actually useful.
That’s just false. Although the first step of creating an LLM from scratch is to generate a gaussian distribution, which is randomized, those matrices get overwritten multiple times throughout the process of pre-training and fine-tuning, when parametric weights are finely adjusted based on the training data.
During inferencing, tokens pass through various layers along specific embedded vectors weighted for relevance. It’s not random at all. It’s non-deterministic, but that’s not the same thing as random.
If the training data all came from JSTOR or DevDocs or even WikiPedia, it’s going to make much more accurate inferences than if it was trained on Reddit, Quora, and Yahoo Answers.
I’m not defending AI here, but lets keep our criticisms factual.
Except if you make the output token temperature too cold, it has a higher tendency to get stuck in loops and the like. A little bit of actual randomness is important.
That’s just adding noise, it’s not unique to AI. It’s also used in audio and visual design, and even cryptography.
It’s not unique to AI, no, but no one said it was. My point is that the noise is important to the functioning of the AI - and makes it even less deterministic, which also makes it poorly suited to automation in critical systems.
If the outcome burns the resources needed to power a small town in order to generate, but the outcome is good, it’s still bad
It’s about 10x more power intensive than a Google search. It’s not trivial, but it doesn’t take megawatts to power a single person’s query.
Ok, but then explain why I would care about a technology that’s 10 times less efficient than an existing, 25 year old technology
I’m not really here to tell you why you should care - you’re free to care about whatever you want to care about. But to explain why other people might care, it’s because it can do things a Google search can’t do. Google search can’t copy-edit your CV or cover letter. Google search can’t synthesise a bunch of different Stackoverflow answers and fit them to the exact scenario you’re talking about. LLMs can and do.
And those are two examples where the cost of an error is low: if your CV comes out with made up shit in it, you can just read through it and check (but you may not have the ability to re-write it better). If the code example doesn’t work, you’re going to run it and check anyway. (It may have a subtle bug, but so can Stackoverflow answers, and that never stopped people from using them)
If you don’t have the ability to write it better what would make you think someone would have the ability to recognize and fix the errors in their CV?
What does an error in a CV look like, to you?
Could be anything. The point is if I don’t have the skill to write my own CV well. Then I also don’t have the skill to determine if an AI generated CV is written well