Lemmy may be heading down the path of LLMs

ell1e@leminal.space · edit-2 12 hours ago

Lemmy may be heading down the path of LLMs

Rentlar@lemmy.ca · 11 hours ago

Code written with the help of LLM and being reviewed is different than like what was happening with Lutris where the developer decided to obfuscate their use of AI-generated code.

The approach you suggest to totally ban it, while in principle can agree and I think that’s noble, it could lead to people accusing each other of using AI code where it may or may not have happened, or others just hiding it and trying to submit anyway without the reviewers knowing, which is just counter-productive.

I’ve followed Lemmy development now for 3 years, the devs approach is slow and steady, to a fault in some people’s views. I think it’s a better use of open source resources if we encourage candor and honesty. If the repo gets spammed with AI-generated PRs, then it will probably be blanket banned, but contributors accurately documenting and reporting their usage of AI will help direct reviewers attention to ensure the code is not slop quality or full of hallucinations.

ell1e@leminal.space · edit-2 8 hours ago

In my opinion, this argument is exactly the same as saying “we can’t enforce people not stealing GPL-licensed code and copy&pasting it into our project, so we might as well allow it and ask them to disclose it.”

You can try to argue AI may actually be useful, which seems like what they did, and that would more fairly inform a policy in my opinion. I think your argument doesn’t.

Rentlar@lemmy.ca · 2 hours ago

My argument is that a total ban on AI use is more comparable to saying “Code from any other coding project is not allowed”. It will start unproductive arguments over boilerplate, struct definitions and other commonly used code.

The broadness and vaagueness of “no AI whatsoever” or “no code from any other projects whatsoever” will be more confusing than saying, “if you do copy any code from another project, let us know where from”. Then the PR can be evaluated, rejected if it’s nonfree or just poor quality, rather than incentivizing people to pretend other people’s code is their own, risking bigger consequences for the whole project. People can be honest if they got inspiration from stackoverflow, a reference book, or another project, if they are allowed to be.

I’m not saying AI should be blanket allowed, the submitter needs to understand the code, enough to be able to revise it for errors themselves if the devs point out something. They can’t just say “I asked AI and it’s confident that the code does this and is bug free”.

ell1e@leminal.space · edit-2 9 minutes ago

Then the PR can be evaluated, rejected if it’s nonfree or just poor quality

I don’t get the difficulty of rejecting “if it’s nonfree or just poor quality or known LLM code”. I don’t think it’s a vague criterion.

And for many projects, if you admit it’s from a StackOverflow post, unless you can show it’s not a direct copy they will reject it as well. This isn’t commonly taken as incentivizing people to lie.

Now whether you think LLMs are worth the trouble to use is a different discussion, but the enforcement point doesn’t convince me.

There is also a responsibility and liability question here. If something turns out to be a copyright issue and the contributor skirted a known rule, the moral judgement may look different than if you knew and included it anyway. (I can’t comment on the legal outcomes since I’m not a lawyer.)

Rentlar@lemmy.ca · 1 hour ago

To be specific, the jump you are making is likening LLM output to non-free code, while on the surface level it makes sense, it’s much closer to making stuff based on copied code. In the US at least, there’s clear legal precedent that LLM fabrications are not copyrightable.

Blanket AI bans are enforceable, I’m not arguing against that, it’s just that I don’t think it’s worth instituting, that it’s not a good fit for this project. My argument is that a Lemmy development policy of “please mark which parts of your code are AI-generated and how you used LLMs, and we will evaluate accordingly” is better than “if you indicate anywhere that your code is AI/LLM-generated, we will automatically reject it”.

ell1e@leminal.space · edit-2 8 minutes ago

My opinion is that the data disagrees with you: 1. https://www.psu.edu/news/research/story/beyond-memorization-text-generators-may-plagiarize-beyond-copy-and-paste 2. https://dl.acm.org/doi/10.1145/3543507.3583199 3. https://www.sciencedirect.com/science/article/pii/S2949719123000213#b7 4. https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/ 5. Related high profile incident that is very telling: https://www.pcgamer.com/software/ai/microsoft-uses-plagiarized-ai-slop-flowchart-to-explain-how-github-works-removes-it-after-original-creator-calls-it-out-careless-blatantly-amateuristic-and-lacking-any-ambition-to-put-it-gently/

In the US at least, there’s clear legal precedent that LLM fabrications are not copyrightable.

I see many people doubt this says anything about training data copyright, beyond AI user copyright.

This isn’t legal advice, I’m not a lawyer.

wheezy@lemmy.ml · edit-2 9 hours ago

Great perspective and response. Far too many “fuck AI” people are literally advocating for the equivalent of “fuck computers” and “more tedious labor please!”

The reason you should hate AI should be related to it’s exploitation of labor and it’s over use leading to energy and environmental impacts. Trying to ban AI for all applications is just counter productive and impossible. If the anti AI crowd is just filled with people that want it banned outright for everything, well, then the pro AI crowd that wants to slam it into anything and everything will win out.

We need to be pointing to good applications of AI that can benefit open source projects in a responsible way as examples of how it should be used. Not spamming them with hate comments because “AI bad”

ell1e@leminal.space · edit-2 7 hours ago

far too many “fuck AI” people are literally advocating for the equivalent of “fuck computers” and “more tedious labor please!”

Not what I’m advocating for.

We need to be pointing to good applications of AI

Freel free to do so, but studies are not on your side. Edit: this is a reminder we’re talking about LLMs for code and documentation.

The only somewhat clearly useful use case appear to be code reviews, but then you don’t need to actually allow submitting any LLM rewritten code or text since code reviews can be done using natural language. And if you use server-side LLMs, you’ll probably agree to ToS that they steal your data.

And LLMs seem to be amazing at plagiarism.

FauxLiving@lemmy.world · 8 hours ago

We need to be pointing to good applications of AI Freel free to do so, but studies are not on your side.

The only somewhat clearly useful use case appear to be code reviews, but then you don’t need to actually allow submitting any LLM rewritten code or text since code reviews can be done using natural language. And if you use server-side LLMs, you’ll probably agree to ToS that they steal your data.

And they seem to be amazing at plagiarism.

You, like a large portion of the ‘fuck AI’ community are angry at LLMs or image/video generation models and their associated capitalist bubble. Yes, LLMs produce poor quality output compared to humans and yes the current marketing and capital explosion is bad for everyone involved that isn’t otherwise independently wealthy.

The reason that these are the AI that you’re aware of is that AI needs a lot of data to train and the only source of a huge amount of data, the Internet, is primarily text, images and video. So the first large transformer-based neural networks were trained on that dataset.

ChatGPT and Sora are toys, they were just the toys that were easiest to make given the data available when transformers were discovered.

If you train neural networks on different kinds of data you get different models. For example, if you train neural networks on protein folding data, you get neural networks that can predict protein folding based on an amino acid sequence. This is a thing that human-created software has not had great success at.

People may be familiar with Folding@Home, a project which attempts to leverage donated computing resources to brute force the problem. These projects have consumed thousands of person-hours of our best scientists and engineers and the results are pretty poor.

However, since we now know how to train neural networks on data, we can train an AI to predict the protein structures and the resulting networks such as AlphaFold (https://en.wikipedia.org/wiki/AlphaFold) produce results much higher than human engieered software.

In addition to predicting the structure, other scientists have used diffusion models (similar to how consumer AI products generate images) to go the other way. Now a scientist can describe a protein’s properties in a prompt and instead of generating a picture the network outputs the sequence of amino acids that are most likely to fold into a shape with those properties.

Robotics are another field where AI is making an impact unseen to the public. There isn’t an Internet full of bipedal motion or limb-positioning data, so it is much harder to train an AI to operate robotics. There are many projects which are working to create that data and the results are pretty impressive. This is a bipedal robot which has been trained on human motion: https://www.youtube.com/watch?v=I44_zbEwz_w compare that to pre-AI motion: https://www.youtube.com/watch?v=LikxFZZO2sk

Weather forecasting is another field where AI is useful. Predicting weather requires identifying patterns in huge amounts of data and AI is uniquely able to deal with that level of complexity.

None of these uses of AI can talk to you, or produce pictures. They cannot understand sentences or write e-mails or generate code. They’re trained on data generated specifically for their purpose, not on public data scrapped from the Internet. Their output allows us to develop medicines faster, automate dangerous jobs and predict weather disasters.

I’m with anyone who’s concerned about the capitalist frenzy over LLMs and image/video generation products. This is clearly another dotcom bubble and the spending frenzy and disruption in the job markets is damaging the economy and hurting workers at a large scale.

I do not lay the blame for this at the feet of neural networks. The blame lies with the human beings making the decision to take a promising technology and to dump trillions of dollars into it without any endgame other than market dominance.

The community should but ‘fuck AI executives’, AI has many uses outside of LLMs and image generation and people are completely missing all of the amazing things that this technology is making possible.

baggachipz@sh.itjust.works · 7 hours ago

Thank you so much for taking the time to put into words what I’ve been too lazy to enunciate. Transformer-based tools are a great development with some fantastic uses. I think the problem is one of nomenclature and extremely aggressive marketing by grifters. The reason I’m in this community isn’t to outright banish anything related to transformer-based tech, but to rail against the insanely overhyped, economy-wrecking shitshow that has commandeered the nebulous term “AI” when it’s really just LLMs.

FauxLiving@lemmy.world · 7 hours ago

The reason I’m in this community isn’t to outright banish anything related to transformer-based tech, but to rail against the insanely overhyped, economy-wrecking shitshow that has commandeered the nebulous term “AI” when it’s really just LLMs.

Same, I’m here because capitalism is doing serious damage to the world by taking a promising technology and massively over investing.

I’m not here to side with the Luddites who reflexively downvote anything that says ‘AI’.

Though, I will say that this is a nuanced opinion and so I understand that I’m going to be dog piled by the people who’re only here for low effort performative activism.

ell1e@leminal.space · edit-2 7 hours ago

We were talking about lemmy and LLMs. They’re not part of any use case you’re listing.

But my apologies if I missed something here.

FauxLiving@lemmy.world · 7 hours ago

My point was that people are using the term ‘AI’ when they mean LLMs and/or Image generation.

You asked for good AI uses, when you mean good LLM uses which is the only point I wanted to make.

Yes, LLMs are pretty bad at most things. They’re usefulness is basically around that of a search engine or Stack Overflow. They’re often used as a crutch for junior coders, which damages their training and vibe coding is just a novelty… not a production-ready tool.

I don’t disagree that LLMs are massively over hyped, just that they’re only a tiny portion of the AI technologies. Most of which people should be excited about.

That’s why it’s frustrating seeing the confusion. LLMs suck, image generation is terrible for many reasons… but AI has many other uses than making 6 fingered people and shitty code.

commonmarmoset@reddthat.com · 5 hours ago

Adding perhaps an additional layer of nuance - You’re totally right that there is an nomenclature issue around AI, and that the technology (basically like most technology) is value neutral. But, I think it remains a valid decision to make a choice to personally avoid it, and to engage with services and communities accordingly.

I’m perfectly happy to agree that there is “AI” use which is groovy. Maybe as result of narrowing the definition or using it conscientiously. I understand the difference forms it can come in. But me, personally, I want to use a service that strives for no AI, regardless of if it is good, bad, or neutral. Searching for a niche like this is actually why I started using lemmy (pretty recently).

I don’t begrudge lemmy taking an approach like “AI must be disclosed and reviewed” as suggested here (https://github.com/LemmyNet/lemmy-docs/pull/414/changes). Let Lemmy party however it wants! Honestly, I appreciate the disclosure, because it lets me know upfront that this isn’t the niche I was looking for. No shade, but I’m out. Nothing but peace and love to everybody who remains.

ell1e@leminal.space · 5 hours ago

I was asking for good uses of LLMs since we were talking about those. Sorry for being unclear.

ell1e@leminal.space · 5 hours ago

deleted by creator

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · 12 hours ago

I mean the lead dev is literally agreeing that LLM code shouldn’t be in the project at all as the first reply to the issue. I’m not seeing how it’s headed toward integration from what you’ve linked.

ell1e@leminal.space · edit-2 12 hours ago

You’ll see farther below, sadly the lemmy team seem to have reversed their opinion immediately after. See also here: https://github.com/LemmyNet/lemmy-docs/pull/414/changes

Zetta@mander.xyz · 35 minutes ago

Better stop using the internet. I always say this, but in the next five years, every single piece of software you use is going to have generated code in it. You may not like it, but it’s happening, so sorry.

ell1e@leminal.space · edit-2 2 minutes ago

There is a growing list of projects to collaborate with that reject LLM code: Asahi Linux, elementaryOS, Gentoo, GIMP, GoToSocial, Löve2D, Loupe, NetBSD, postmarketOS, Qemu, RedoxOS, Servo, stb libraries, Zig.

in_my_honest_opinion@piefed.social · 12 hours ago

Piefed

uuj8za@piefed.social · 3 hours ago

Yeah! I was ok with Lemmy, but recently (unrelated) decided to try Piefed. I’m liking Piefed better. Lots of nice UI/UX improvements over Lemmy. Didn’t realize what I was missing.

hankskyjames777@thebrainbin.org · 10 hours ago

…and MBin

Lost_My_Mind@lemmy.world · 12 hours ago

Well…dang it! I guess I’ll have to start using my piefed account more. But also, that doesn’t solve the issue of it being the same content, you know, because of the whole concept of the fediverse and how it works.

Also this whole community is on Lemmy. How’s THAT going to work???

in_my_honest_opinion@piefed.social · 42 minutes ago

Divest the content from the platform. The power of the Fediverse is that communities and people can move freely between platforms.

uuj8za@piefed.social · 3 hours ago

that doesn’t solve the issue of it being the same content, you know, because of the whole concept of the fediverse

Isn’t that a feature, not a bug? Lemmy can’t single-handedly ruin the fediverse. Piefed, MBin, can lead a different direction and Lemmy can’t hold all the content hostage.

ell1e@leminal.space · edit-2 12 hours ago

It’s sad. I’m hoping perhaps some well-reasoned comments might still have some impact, but I admit that it might be a long shot.

hankskyjames777@piefed.social · 10 hours ago

deleted by creator