The vibecoders are becoming sentient

aizakku@waterloolemmy.ca · 2 days ago

The vibecoders are becoming sentient

merc@sh.itjust.works · 19 hours ago

I think storyboards is a great example of how it could be used properly.

Storyboards are a great way for someone to communicate “this is how I want it to look” in a rough way. But, a storyboard will never show up in the final movie (except maybe fun clips during the credits or something). It’s something that helps you on your way, but along the way 100% of it is replaced.

Similarly, the way I think of generative AI is that it’s basically a really good props department.

In the past, if a props / graphics / FX department had to generate some text on a computer screen that looked like someone was Hacking the Planet they’d need to come up with something that looked completely realistic. But, it would either be something hand-crafted, or they’d just go grab some open-source file and spew it out on the screen. What generative AI does is that it digests vast amounts of data to be able to come up with something that looks realistic for the prompt it was given. For something like a hacking scene, an LLM can probably generate something that’s actually much better than what the humans would make given the time and effort required. A hacking scene that a computer security professional would think is realistic is normally way beyond the required scope. But, an LLM can probably do one that is actually plausible for a computer security professional because of what that LLM has been trained on. But, it’s still a prop. If there are any IP addresses or email addresses in the LLM-generated output they may or may not work. And, for a movie prop, it might actually be worse if they do work.

When you’re asking an AI something like “What does a selection sort algorithm look like in Rust?”, what you’re really doing is asking “What does a realistic answer to that question look like?” You’re basically asking for a prop.

Now, some props can be extremely realistic looking. Think of the cockpit of an airplane in a serious aviation drama. The props people will probably either build a very realistic cockpit, or maybe even buy one from a junkyard and fix it up. The prop will be realistic enough that even a pilot will look at it and say that it’s correctly laid out and accurate. Similarly, if you ask an LLM to produce code for you, sometimes it will give you something that is realistic enough that it actually works.

Having said that, fundamentally, there’s a difference between “What is the answer to this question?” and “What would a realistic answer to this question look like?” And that’s the fundamental flaw of LLMs. Answering a question requires understanding the question. Simulating an answer just requires pattern matching.

Adalast@lemmy.world · 9 hours ago

deleted by creator

Adalast@lemmy.world · 9 hours ago

See, I agree with everything up to the end. There you are getting into the philosophy of cognition. How do humans answer a question? I would argue, for many, the answer for most topics would be "I am repeating what I was taught/learned/read. An argument could be made that your description of responding with “What would a realistic answer to this question look like?” is fundamentally symmetric with “This is what I was taught.” Both are regurgitating information fed to them by someone who presumably (hopefully) actually had a firm understanding of the material themselves. As an example: we are all taught that 2+2=4, but most people are not taught WHY 2+2=4. Even fewer are taught that 2+2=11 in base 3 or how to convert bases at all. So do people “know” that 2+2=4 or are they just repeating the answer that they were told was correct?

I am not saying that LLMs understand or know anything, I am saying that most humans don’t either for most topics.

merc@sh.itjust.works · 8 hours ago

How do humans answer a question? I would argue, for many, the answer for most topics would be "I am repeating what I was taught/learned/read.

Even children aren’t expected to just repeat verbatim what they were taught. When kids are being taught verbs, they’re shown the pattern: “I run, you run, he runs; I eat, you eat, he eats.” They’re are told that there’s a pattern, and it’s that the “he/she/they” version has an “s” at the end. They now understand some of how verbs work in English, and can try to apply that pattern. But, even when it’s spotting a pattern and applying the right rule, there’s still an element of understanding involved. You have to recognize that this is a “verb” situation, and you should apply that bit about “add an ‘s’ if it’s he/she/it/they”.

An LLM, by contrast, never learns any rules. Instead it ingests every single verb that has ever been recorded in English, and builds up a probability table for what comes next.

but most people are not taught WHY 2+2=4

Everybody is taught why 2+2=4. They normally use apples. They say if I have 2 apples and John has 2 apples, how many apples are there in total? It’s not simply memorizing that when you see the token “2” followed by “+” then “2” then “=” that the next likely token is a “4”.

If you watch little kids doing that kind of math, they do understand what’s happening because they’re often counting on their fingers. That signals that there’s a level of understanding that’s different from simply pattern matching.

Sure, there’s a lot of pattern matching in the way human brains work too. But, fundamentally there’s also at least some amount of “understanding”. One example where humans do pattern matching is idioms. A lot of people just repeat the idiom without understanding what it really means. But, they do it in order to convey a message. They don’t do it just because it sounds like it’s the most likely thing that will be said next in the current conversation.

Adalast@lemmy.world · 6 hours ago

I wasn’t attempting to attack what you said, merely pointing out that once you cross the line into philosophy things get really murky really fast.

You assert that LLMs aren’t taught the rules, but every word is not just a word. The tokenization process includes part of speech tagging, predicate tagging, etc. The ‘rules’ that you are talking about are actually encapsulated in the tokenization process. The way the tokenization process for LLMs, at least as of a few years ago when I read a textbook on building LLMs, is predicated on the rules of the language. Parts of speech, syntax information, word commonality, etc. are all major parts of the ingestion process before training is done. They may not have had a teacher giving them the ‘rules’, but that does not mean it was not included in the training.

And circling back to the philosophical question of what it means to “learn” or “know” something, you actually exhibited what I was talking about in your response on the math question. Putting to piles of apples on a table and counting them to find the total is a naïve application of the principals of addition to a situation, but it is not describing why addition operates the way it does. That answer does not get discussed until Number Theory in upper division math courses in college. If you have never taken that course or studied Number Theory independently, you do not know ‘why’ adding two numbers together gives you the total, you know ‘that’ adding two numbers together gives you the total, and that is enough for your life.

Learning, and by extension knowledge, have many forms and processes that certainly do not look the same by comparison. Learning as a child is unrecognizable when compared directly to learning as an adult, especially in our society. Non-sapient animals all learn and have knowledge, but the processes for it are unintelligible to most people, save those who study animal intelligence. So to say the LLM does or does not “know” anything is to assert that their “knowing” or “learning” will be recognizable and intelligible to the lay man. Yes, I know that it is based on statistical mechanics, I studied those in my BS for Applied Mathematics. I know it is selecting the most likely word to follow what has been generated. The thing is, I recognize that I am doing exactly the same process right now, typing this message. I am deciding what sequence of words and tones of language will be approachable and relatable while still conveying the argument I wish to levy. Did I fail? Most certainly. I’m a pedantic neurodivergent piece of shit having a spirited discussion online, I am bound to fail because I know nothing about my audience aside from the prompt to which you gave me to respond. So I pose the question, when behaviors are symmetric, and outcomes are similar, how can an attribute be applied to one but not the other?