Lemmy may be heading down the path of LLMs

ell1e@leminal.space · edit-2 14 hours ago

Lemmy may be heading down the path of LLMs

ell1e@leminal.space · edit-2 9 hours ago

In my opinion, this argument is exactly the same as saying “we can’t enforce people not stealing GPL-licensed code and copy&pasting it into our project, so we might as well allow it and ask them to disclose it.”

You can try to argue AI may actually be useful, which seems like what they did, and that would more fairly inform a policy in my opinion. I think your argument doesn’t.

MrLLM@ani.social · 45 minutes ago

Yeah, and of top of that all the reasons why we hate AI,

It’s a plagiarism machine
It still hallucinates which might end up in borked projects
it has and will continue to fuck up RAM and storage market
It consumes a shit ton of energy
It’s ruining everything with poor quality products
…

Rentlar@lemmy.ca · 4 hours ago

My argument is that a total ban on AI use is more comparable to saying “Code from any other coding project is not allowed”. It will start unproductive arguments over boilerplate, struct definitions and other commonly used code.

The broadness and vaagueness of “no AI whatsoever” or “no code from any other projects whatsoever” will be more confusing than saying, “if you do copy any code from another project, let us know where from”. Then the PR can be evaluated, rejected if it’s nonfree or just poor quality, rather than incentivizing people to pretend other people’s code is their own, risking bigger consequences for the whole project. People can be honest if they got inspiration from stackoverflow, a reference book, or another project, if they are allowed to be.

I’m not saying AI should be blanket allowed, the submitter needs to understand the code, enough to be able to revise it for errors themselves if the devs point out something. They can’t just say “I asked AI and it’s confident that the code does this and is bug free”.

ell1e@leminal.space · edit-2 2 hours ago

Then the PR can be evaluated, rejected if it’s nonfree or just poor quality

I don’t get the difficulty of rejecting “if it’s nonfree or just poor quality or known LLM code”. I don’t think it’s a vague criterion.

And for many projects, if you admit it’s from a StackOverflow post, unless you can show it’s not a direct copy they will reject it as well. This isn’t commonly taken as incentivizing people to lie.

Now whether you think LLMs are worth the trouble to use is a different discussion, but the enforcement point doesn’t convince me.

There is also a responsibility and liability question here. If something turns out to be a copyright issue and the contributor skirted a known rule, the moral judgement may look different than if you knew and included it anyway. (I can’t comment on the legal outcomes since I’m not a lawyer.)

Rentlar@lemmy.ca · 3 hours ago

To be specific, the jump you are making is likening LLM output to non-free code, while on the surface level it makes sense, it’s much closer to making stuff based on copied code. In the US at least, there’s clear legal precedent that LLM fabrications are not copyrightable.

Blanket AI bans are enforceable, I’m not arguing against that, it’s just that I don’t think it’s worth instituting, that it’s not a good fit for this project. My argument is that a Lemmy development policy of “please mark which parts of your code are AI-generated and how you used LLMs, and we will evaluate accordingly” is better than “if you indicate anywhere that your code is AI/LLM-generated, we will automatically reject it”.

ell1e@leminal.space · edit-2 2 hours ago

My opinion is that the data disagrees with you: 1. https://www.psu.edu/news/research/story/beyond-memorization-text-generators-may-plagiarize-beyond-copy-and-paste 2. https://dl.acm.org/doi/10.1145/3543507.3583199 3. https://www.sciencedirect.com/science/article/pii/S2949719123000213#b7 4. https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/ 5. Related high profile incident that is very telling: https://www.pcgamer.com/software/ai/microsoft-uses-plagiarized-ai-slop-flowchart-to-explain-how-github-works-removes-it-after-original-creator-calls-it-out-careless-blatantly-amateuristic-and-lacking-any-ambition-to-put-it-gently/

In the US at least, there’s clear legal precedent that LLM fabrications are not copyrightable.

I see many people doubt this says anything about training data copyright, beyond AI user copyright.

This isn’t legal advice, I’m not a lawyer.

Rentlar@lemmy.ca · 58 minutes ago

I don’t mean in any way to imply that your opinion isn’t sound, but simply that I don’t agree with it here in the context of whether the Lemmy devs should accept or not PRs with any reported LLM usage.