"Just ask ChatGPT" (Art by Shave_your_eyebrows)

ThefuzzyFurryComrade@pawb.social · 3 months ago

"Just ask ChatGPT" (Art by Shave_your_eyebrows)

Retail4068@lemmy.world · 3 months ago

It amazes me that you all loathe AI search but will happily deal with regular search which always has given you garbage. AI search just makes it easier to start checking sources and going through larger amounts of material.

zurohki@aussie.zone · 3 months ago

It’s because AI gives you complete bollocks which looks like a correct answer complete with made-up sources.

One gives you shit, the other gives you shit that looks like tasty food.

undeffeined@lemmy.ml · 3 months ago

Plus normal search is not creating the biggest financial bubble in modern history while destroying the environment

zurohki@aussie.zone · 3 months ago

Remember back when you could buy RAM? Those were good times.

undeffeined@lemmy.ml · 3 months ago

I hope thats the worst of the problems from this massive bubble.

Retail4068@lemmy.world · 3 months ago

Which is why you click on the links, read and verify. This just provides organization and context to verify and to dig in to vs a list and summary. You have to validate and read either way. LLM/RAGs just provide tools on top.

Tigeroovy@lemmy.ca · 3 months ago

Yeah, so essentially just doing a regular google search but with more unnecessary steps starting with a dogshit chat bot.

Retail4068@lemmy.world · 3 months ago

Skull issue. Sorry.

Tigeroovy@lemmy.ca · 3 months ago

Using chat bots to search instead of doing it yourself is indeed a skill issue, yes.

Retail4068@lemmy.world · 3 months ago

Tigeroovy@lemmy.ca · 3 months ago

Retail4068@lemmy.world · 3 months ago

FauxLiving@lemmy.world · 3 months ago

Which AI are you talking about? There isn’t just one AI, it’s an entire category of technology.

Just asking an LLM to answer a question a give you sources would be an incompetent way to use an LLM. The models will happily hallucinate anything that they don’t have an answer for.

But most AI systems that are setup for searching and research use Retrieval Augmented Generation. There are non-AI parts of the system handle the document retrieval and source list preparation. The LLM only uses the reference tags and then a completely non-AI system creates the source list. The LLM can’t hallucinate a source list because no competently designed system would trust the LLM to not hallucinate when they could simply program a system to handle that aspect without wasting LLM tokens on simple to solve problems.

So, for example, if you were to ask ask, ‘Who won the 1984 Olympics’ the system does a search of documents (or websites, as here) and then passes the results to the LLM which only summarizes the documents that were given to it in response to the user’s question.

Jack Riddle[Any/All]@lemmy.dbzer0.com · 3 months ago

so we are using the “regular search which has always given you garbage” and taking that garbage automatically to get summarised by the hallucinator and we are supposed to trust the output somehow?

FauxLiving@lemmy.world · 3 months ago

No, you don’t trust the output. You shouldn’t trust the output of search either. This is just search with summarization.

That’s why there are linked sources so that you can verify yourself. The person’s contention was that you can’t trust citations because they can be hallucinated. That’s not how these systems work, the citations are not handled by LLMs at all except as references, the actual source list is entirely a regular search program.

The LLM’s summarization and sources are like the Google Results page, they’re not information that you should trust by themselves they are simply a link to take you to information that’s responsive to your search. The LLM provides a high level summary so you can make a more informed decision about which sources to look at.

Anyone treating LLMs like they’re reliable is asking for trouble, just like anyone who believes everything they read on Facebook or cite Wikipedia directly.

Jack Riddle[Any/All]@lemmy.dbzer0.com · 3 months ago

so I fail to see why I should be using an LLM at all then. If I am going to the webpages anyway, why shouldn’t I just use startpage/searx/yacy/whatever?

FauxLiving@lemmy.world · 3 months ago

Yeah, if you already know where you’re going then sure, add it to Dashy or make a bookmark in your browser.

But, if you’re going to search for something anyway. Then why would you use regular search and skim the tiny amount of random text that gets returned with Google’s results? In the same amount of time, you could dump the entire contents of the pages into an LLM’s context window and have it tailor the response to your question based on the text.

You still have to actually click on some links to get to the real information, but a summary generated from the contents of the results is more likely to be relevant than the text presented in Google’s results page. In both cases you still have a list of links, generated by a search engine and not AI, which are responsive to your query.

Jack Riddle[Any/All]@lemmy.dbzer0.com · 3 months ago

see, the problem is that I am not going to be reading that text because I know it is unreliable and ai text makes my eyes glaze over, so I will be clicking on all those links until I find something that is reliable. On a search engine I can just click through every link or refine my search with something like site:reddit.com site:wikipedia.org or format:pdf or something similar. With a chatbot, I need to write out the entire question, look at the four or so links it provided and then reprompt it if it doesn’t contain what I’m looking for. I also get a limited amount of searches per day because I am not paying for a chatbot subscription. This is completely pointless to me.

FauxLiving@lemmy.world · 3 months ago

I’m not sure what standards you’re saying unreliable.

You can see in the example that I provided it correctly answered the question and also correctly cited the place where the answer came from in the exact same amount of time as it would take to type the query into Google.

Yes, LLMs by themselves can hallucinate and do so at a high rate so that they’re unreliable sources of information. That is 100% true. It will never be fixed, because LLMs are trained to be an autocorrect and produce syntactically correct language. You should never depend on raw LLM generated text from an empty context, like from a chatbot.

The study of this in academia (example: https://arxiv.org/html/2312.10997v5) has found that LLMs hallucination rate can be dropped to almost nothing (less than a human) if given text containing the information that it is being asked about. So, if you paste a document into the chat and ask it a question about the document the hallucination rate drops significantly.

This finding created a technique called Retrieval Augmented Generation where you use some non-AI means of finding data, like a search engine, and then put the documents into the context window along with the question. This makes it so that you can create systems that use LLMs for the tasks that they’re accurate and fast at (like summarizing text that is in the context window) and non-AI tools to do things that require accuracy (like searching databases for facts and tracking citation).

You can see in the images I posted that it both answered the question and also correctly cited the source which was the entire point of contention.

TheSeveralJourneysOfReemus@lemmy.world · edit-2 2 months ago

deleted by creator

FauxLiving@lemmy.world · 3 months ago

You’re arguing against the use of AI to do actual research. I agree with you that using AI to do research is wrong. I’m not sure where you got any other idea.

My entire point, the statement that I was responding to, was a claim that LLMs hallucinate sources. That’s only true of naive uses of LLMs, if you just ask a model to recite a fact it will hallucinate a lot of the time. This is why they are used in RAG systems and, in these systems, the citations are tracked through regular software because every AI researcher knows that LLMs hallucinate. That hasn’t been new information for 5+ years now.

Systems that do RAG search summarizations, as in my example, both increase the accuracy of the response (by inserting the source documents in to the context window) and avoid relying on LLMs to handle citations.

It’s one thing to hate the damage that billionaires are doing to the world in order to chase some pipedream about AI being the holy grail of technology. I’m with you there, fuck AI.

It’s a whole other thing to pretend that machine learning is worthless or incapable of being a good tool in the right situations. You’ve been relying on machine learning tools for a long time, you say ‘learn to search properly’. The search results that you receive are entirely built on ancestors of the PageRank machine learning algorithm which is responsible for creating Google.

The only reason that AI is even on your radar (assuming you’re not in academia) is because a bunch of rich assholes are exploiting people’s amazement at this new technology to sell impossible dreams to people in order to cash in on the ignorance of others. Those people are scammers with MBAs, but their scam doesn’t change the usefulness of the underlying technology of Transformer neural networks or Machine Learning in general.

Fighting against ‘AI’ is pointless if your target is LLMs and not billionaires.

Armok_the_bunny@lemmy.world · 3 months ago

Search didn’t used to give “output”. It used to give links to a wide variety of sources such as detailed and exact official documentation. There was nothing to “trust”.

Now it’s all slop bullshit that needs to be double checked, a process that frankly takes just as long as finding the information youself using the old system, and even that still can’t be trusted in case it missed something.

FauxLiving@lemmy.world · 3 months ago

Search didn’t used to give “output”. It used to give links to a wide variety of sources such as detailed and exact official documentation. There was nothing to “trust”.

If you search on Google, the results are an output. There’s nothing AI about the term output.

You get the same output here and, as you can see, the sources are just as easily accessible as a Google search and are handled by non-LLM systems so they cannot be hallucinations.

The topic here is about hallucinating sources, my entire position is that this doesn’t happen unless you’re intentionally using LLMs for things that they are not good at. You can see that systems like this do not use the LLM to handle source retrieval or citation.

Now it’s all slop bullshit that needs to be double checked, a process that frankly takes just as long as finding the information youself using the old system, and even that still can’t be trusted in case it missed something.

This is true of Google too, if you’re operating on the premise that you can trust Google’s search results then you should know about Search Engine Optimization (https://en.wikipedia.org/wiki/Search_engine_optimization), an entire industry that exists specifically to manipulate Google’s search results. If you trust Google more than AI systems built on search then you’re just committing the same error.

Yes, you shouldn’t trust things you read on the Internet until you’ve confirmed them from primary sources. This is true of Google searches or AI summarized results of Google searches.

I’m not saying that you should cite LLM output as facts, I’m saying that the argument that ‘AIs hallucinate sources’ isn’t true of these systems which are designed to not allow LLMs to be in the workflow that retrieves and cites data.

It’s like complaining that live ducks make poor pool toys… if you’re using them for that, the problem isn’t the ducks it’s the person who has no idea what they’re doing.

Retail4068@lemmy.world · 3 months ago

You are just speaking to a brick wall. It’s taking all the jobs AND garbage. Can’t be a tool in between that has pros and cons.

FauxLiving@lemmy.world · 3 months ago

True, nuance is dead on social media. Especially in high propaganda places where people treat bad faith arguments like a virtue.

It is weird how the position is both that AI is simultaneously incapable of producing any work of any quality and also an existential threat to all human labor on the planet.

It really sounds like they have two arguments that they’re smashing together and treating like one.

First, AI system do produce poor quality output a lot of the time. Much like any other technology, the first few years are not exactly an example of what is possible.

For example, the first jet aircraft could only operate for a few hours or their engines would literally melt. People are sitting here looking at these prototype jet aircraft and claiming that there will never be commercially viable jet travel. (and yet, in this same metaphor, somehow jets will also take over all forms of travel imminently).

LLMs and Image generators are not AI, they’re simply the easiest and cheaptest to train, which is why you have all of these capitalist vultures jumping on these products as if they’re the future.

That’s really the core of the second part of the argument which is essentially: “Capitalists have too much money and have decided to gamble that money on the AI industry, resulting in unsustainable spending and growth that harms real people and communities”.

By itself, this is a good argument also. People are starting to understand the sides, we’re on the bottom and the people on the top who have the power often make horrible decisions in order to chase profit and the result is that regular people are being hurt by those decisions.

The red herring is that they’re blaming these problems on AI instead of the billionaire humans who are actually choosing to put in these data centers and fire workers, etc. A language model or diffusion model isn’t choosing to fly in natural gas generators to power datacenters and pollute communities. Elon Musk chose that.

Getting angry at AI is a useless distraction. There are human beings that are making these decisions and the ones that bear responsibility for the damages, not a few Terabytes of spicy linear algebra.

Retail4068@lemmy.world · 3 months ago

Well written

FauxLiving@lemmy.world · 3 months ago

Thank you

SendMePhotos@lemmy.world · 3 months ago

Regular search wasn’t always garbage. We used to be able to use symbols to refine our searches and would be able to find exactly what we are looking for within seconds. Random part number for a specific model? Random debug issue? Fuckin weird-ass error on the software?

Now days you can search for a string which you fuckin know exists and it’s like, “there’s nothing on the web for that.” absolute fuckin lies. The shit is still indexed somewhere.

BillyTheKid@lemmy.ca · 3 months ago

It’s crazy how Google got so much worse. More ads, worse results, and the text copy is even buggy now

SendMePhotos@lemmy.world · 3 months ago

That’s a feature, not a bug. The goal is to push you to use ai

expr@piefed.social · 3 months ago

AI “search”. Give me a fucking break.

Retail4068@lemmy.world · 3 months ago

👌👍

mrgoosmoos@lemmy.ca · 3 months ago

AI search is only useful because of crippled search engines prioritizing making money over usefulness

mudkip@lemdro.id · 3 months ago

It makes up sources all the time. And when it does give a legitimate source, it makes up a quote or hallucinates nonexistent detail from it. Use your brain, not AI.