Wikipedia has banned AI-generated text, with two exceptions

I thought that was a very interesting read, because it’s so much better than the usual AI ragebait that led to people getting pilloried over the fact that they actually know how to use em dashes. You can’t detect LLM use just by the fact that someone uses em dashes. It’s a complicated stylistic issue that usually boils down to “well, you know what ChatGPT output looks like when you see it”.

amateurcrastinator@lemmy.world · 2 hours ago

Ok but surely there must be an automated way. You can’t throw manpower at this because they will loose

Rose@slrpnk.net · 60 minutes ago

There are no reliable automated LLM output detectors. Anyone who says otherwise is either trying to sell you snake oil (or is unwittingly helping someone to sell snake oil to someone else, I guess).

amateurcrastinator@lemmy.world · 3 minutes ago

so the question still stands. how do they detect AI use? i am all for it btw. it is absolutely necessary but I am afraid it is impossible to do or implement.

infeeeee@lemmy.zip · 1 day ago

Saved you a click:

After much debate, the new policy is in effect: Wikipedia authors are not allowed to use LLMs for generating or rewriting article content. There are two primary exceptions, though.

First, editors can use LLMs to suggest refinements to their own writing, as long as the edits are checked for accuracy. In other words, it’s being treated like any other grammar checker or writing assistance tool. The policy says, “ LLMs can go beyond what you ask of them and change the meaning of the text such that it is not supported by the sources cited.”

The second exemption for LLMs is with translation assistance. Editors can use AI tools for the first pass at translating text, but they still need to be fluent enough in both languages to catch errors. As with regular writing refinements, anyone using LLMs also has to check that incorrect information hasn’t been injected.

Goodlucksil@lemmy.dbzer0.com · 14 hours ago

To save you another few clicks: this is the discussion (RfC) that implemented the changes, and the policy is linked at the top.

Rioting Pacifist@lemmy.world · 1 day ago

AIbros: we’re creating God!!!

AI users: it can do translation & reformating pretty well but you got to check it’s not chatting shit

halcyoncmdr@piefed.social · 1 day ago

The takeaway from all LLM-based AI is the user needs to be smart enough to do whatever they’re asking anyway. All output needs to be verified before being used or relied upon.

The “AI” is just streamlining the process to save time.

Relying on it otherwise is stupid and just proves instantly that you are incompetent.

rumba@lemmy.zip · 4 hours ago

This is absolutely the case, and honestly, at least for now how it needs to be across the board.

Noone should be using AI to do things you’re incapable of doing (or undoing).

7101334@lemmy.world · 5 hours ago

Relying on it otherwise is stupid and just proves instantly that you are incompetent.

Relying on it in any circumstances (though medical stuff is understandable if you’re simply too poor or don’t have access) while it is exhausting water supplies and polluting the planet is stupid and instantly proves that you are stupid and inconsiderate.

Zagorath@quokk.au · 22 hours ago

the user needs to be smart enough to do whatever they’re asking anyway

I’m gonna say that’s ideal but not quite necessary. What’s needed is that the user is capable of properly verifying the output. Which anyone who could do it themselves definitely can, but it can be done more broadly. It’s an easier skill to verify a result than it is to obtain that result. Think: how film critics don’t necessarily need to be filmmakers, or the P=NP question in computer science.

Aralakh@lemmy.ca · 47 minutes ago

This is where domain expertise would come in, no? It’s speeding up the work but it usually outputs generic content, and whatever else it injects while hallucinating. Therefore the validation part holds up I’d say.

Pyro@programming.dev · 21 hours ago

But if the output has issues, what’re you going to do, prompt it again? If you are only able to verify but not do the task, you cannot correct the AI’s mistakes yourself.

WhiskyTangoFoxtrot@lemmy.world · 17 hours ago

I can’t draw, but I could probably photoshop out some minor issues in an AI-generated image.

Zagorath@quokk.au · 20 hours ago

At the risk of sounding like an overly obsequious AI… You know what, you’re completely right. I’m honestly not sure what use case I was imagining when I wrote that last comment.

Redjard@reddthat.com · 20 hours ago

Making text flow naturally, grouping and ordeeing information, good writing.

You can verify two textst have the same facts and information, yet one reads way better than the other. But writing a text that reads well is quite hard.

Redjard@reddthat.com · 20 hours ago

If you don’t habe the ability then you would do what you would have 5 years ago: not do it
Either submit without, or not submit at all.

youcantreadthis@quokk.au · 1 day ago

Fucking hate those anti human filth pushing slop into everything. I want to take one apart with power tools.

Paranoid Factoid@lemmy.world · 1 day ago

Scrollone@feddit.it · 22 hours ago

Damn that movie was funny. I need to rewatch it.

SocialMediaRefugee@lemmy.world · 4 hours ago

Yaaah, but I’ll need you to come in this weekend though. Yaaaahhhh…

onlyhalfminotaur@lemmy.world · 22 hours ago

It holds up better than any movie from the late 90s that I can think of.

XLE@piefed.social · 1 day ago

I don’t think AI users would say it does reformatting either (if they’re honest): If you tell a chatbot to reformat text without changing it, it will change the text, because it does not understand the concept of not changing text. It should only take one time for someone to get burned for them to learn that lesson.

MissesAutumnRains@lemmy.blahaj.zone · 1 day ago

Seems pretty reasonable to use it as a grammar checker. As long as it’s not changing content, just form or readability, that seems like a pretty decent use for it, at least with a purely educational resource like Wikipedia.

ji59@hilariouschaos.com · 1 day ago

So, it should be used reasonably, as it should have always been.

🌞 Alexander Daychilde 🌞@lemmy.world · 1 day ago

Liar. I already read the article before opening the comments. YOU SAVED ME NOTHING.

;-)

errer@lemmy.world · 1 day ago

Wikipedia probably wants to sell access to LLMs to train. It’s only valuable if Wikipedia remains a high-quality, slop-free source.

I think even AI zealots think there should be silos of content to train from that are fully human generated. Training slop on slop makes the slop even worse.

Grimy@lemmy.world · 1 day ago

Sell licenses of what? It’s already all in the creative commons iirc.

Zagorath@quokk.au · 22 hours ago

The content is CC licensed, but they are trying to block AI scraping because it overloads their servers. They have a paid API that uses a lot less compute for both Wikipedia and the AI, as well as being a revenue source for Wikipedia.

ricecake@sh.itjust.works · 3 hours ago

Yes, but…

https://en.wikipedia.org/wiki/Wikipedia%3ADatabase_download

That’s because viewing the page uses server resources, as done API access. If you want the data you can download the database directly.

SuspciousCarrot78@lemmy.world · 1 day ago

AI already trains on Wikipedia.

https://commoncrawl.org/

MountingSuspicion@reddthat.com · 1 day ago

This was only done because the editors pushed to minimize AI involvement. There’s a comment here already mentioning that: https://lemmy.world/comment/22826863

FauxPseudo @lemmy.world · 24 hours ago

Seems like there should be a third exception. For those occasions where the article is about LLM generated text. They should be able to quote it when it’s appropriate for an article.

Zagorath@quokk.au · 21 hours ago

That is a reasonable exception to no-AI policies in research papers and newspaper articles, but not for Wikipedia. As a tertiary source, Wikipedia has a strict “no original research” policy. Using AI to provide examples of AI output would be original research, and should not be done.

Quoting AI output shared in primary and secondary sources should be allowed for that reason, though.

ricecake@sh.itjust.works · 3 hours ago

Eh, that’s not quite original research. There are plenty of other examples of images and sound files created for Wikipedia. A representative example isn’t research, it’s just indicating what something is.

The Wikipedia article on AI slop and generative AI has a few instances of content that’s representative to illustrate a sourced statement, as opposed to being evidence or something.

It’s similar to the various charts and animations.

SpaceNoodle@lemmy.world · edit-2 1 day ago

An extremely measured and level-headed response. Kudos to Wikipedia for maintaining high standards.

kazerniel@lemmy.world · 1 day ago

It has to be said, they originally changed their stance due to the considerable editor pushback when they tried to introduce LLM summaries on the top of articles. So kudos to the editor community’s resistance! ✊

ricecake@sh.itjust.works · 2 hours ago

Just for more clarity: they workshoped for ideas on how to improve clarity and accessibility from some editors at an event. They did some small experiments, and they then developed a plan to trial some of them and presented the plan to a wider audience for feedback. After they got feedback they decided not to.

It’s not quite the editors pushing back on Wikipedia. Or rather, it’s not the “rebellion” people want to make it out to be.

https://www.mediawiki.org/wiki/Readers/2024_Reader_and_Donor_Experiences/Content_Discovery/Wikimania_2024,_"Written_by_AI"_How_do_editors_and_machines_collaborate_to_create_content

https://www.mediawiki.org/wiki/Reading/Web/Content_Discovery_Experiments/Simple_Article_Summaries

It rubs me the wrong way when the process going how it should go gets cast as controversial and dramatic. Asking the community if you should do something and listening to them is how it’s supposed to go. It’s not resistance, it’s all of them being on the same team and talking.

kazerniel@lemmy.world · 2 hours ago

Thanks for the reframe! From what I’ve seen in Village Pump comments at the time, editors (including me) were upset bc putting LLMs into Wikipedia articles seems like an idea so obviously clashing with Wikipedia’s values and strengths, that it was a shock to see it taken as far as it got before the wider backlash. (Also put into wider context, the whole world seemed to be jumping onto the LLM bandwagon at the time, so it was dismaying to see Wikipedia do the same.)

banshee@lemmy.world · 3 hours ago

Does anyone like LLM summaries in pages? This seems like a better fit for a browser extension to generate a summary on demand instead of wasting resources generating it for everyone. Google’s documentation is absolutely littered with the mess.

SpaceNoodle@lemmy.world · edit-2 1 day ago

Good point. The real strength of Wikipedia truly lies in the editors .

Mwa@thelemmy.club · 22 hours ago

W Wikipedia,would be better to remove the exceptions but its fine tbh.

yucandu@lemmy.world · 1 day ago

Banned the people who openly admit it, anyway.

aliser@lemmy.world · 23 hours ago

there are ai detectors, although Im not sure about accuracy of those

Aatube@thriv.social · 3 hours ago

very bad

Sunless Game Studios@lemmy.world · edit-2 1 day ago

I know at least one writing major who won an award from his volunteer work at Wikipedia. He did it as a hobby. They don’t really need AI, they need people like him.

antonim@lemmy.world · edit-2 15 hours ago

How do you win an award from editing Wikipedia?

albert_inkman@lemmy.world · 10 hours ago

Removed by mod

The Velour Fog @lemmy.world · 8 hours ago

You’re not working on anything, clanker.

For those wondering, check the timestamps this accounts comment history, especially comments from 4 days ago or longer. Fully formatted multi-paragraph comments made 10-30 seconds apart. This is an LLM-controlled account.

luciferofastora@feddit.org · 4 hours ago

I can’t even write a two-sentence comment in 30s without overthinking. I do like to use formatting, but that doesn’t make it quicker…

Echo Dot@feddit.uk · 4 hours ago

Yeah you can tell because the comment doesn’t really say anything. It’s just a lot of text but no actual meaning.

The Velour Fog @lemmy.world · 3 hours ago

Yup, one of the main hallmarks of AI generated slop that’s often hard to explain unless you have an example like the above in front of you. A lotta words, but very little substance.

webp@mander.xyz · 1 day ago

Why do they need AI at all? Wikipedia had existed long before it and was doing fine.

AmbitiousProcess (they/them)@piefed.social · 1 day ago

You could make that argument about any tool Wikipedia editors use. Why should they need spellcheck? They were typing words just fine before.

…except it just makes it easier to spot errors or get little suggestions on how you could reword something, and thus makes the whole process a little smoother.

It’s not strictly necessary, but this could definitely be helpful to people for translation and proofreading. Doesn’t have to be something people are wholly reliant on to still be beneficial to their ability to edit Wikipedia.

fuckwit_mcbumcrumble@lemmy.dbzer0.com · 1 day ago

Why should we use (insert tool) when we did just fine before?

Because when used correctly it can be great for helping you be more productive, and find errors/make improvements. The two exceptions are for grammar which AI does a surprisingly good job with. Would you have gotten mad if they used Grammarly >5 years ago? Having it rewrite an entire article is gonna be a bad idea, but asking it to rephrase a sentence, or check your phrasing for potential issues is a much safer thing. Not everyone who speaks Spanish uses it the same way. Some words are innocuous in some regions, but offensive in others.

REDACTED@infosec.pub · 1 day ago

Why fire, berries fine

webp@mander.xyz · 1 day ago

Try using fire in a library.

Warl0k3@lemmy.world · 14 hours ago

Like, say, a candle?

Luminous5481 "Lawless Heathen" [they/them]@anarchist.nexus · 1 day ago

wikipedia isn’t a library.

webp@mander.xyz · 24 hours ago

Neither is AI fire. 🙄

Luminous5481 "Lawless Heathen" [they/them]@anarchist.nexus · 23 hours ago

You’re the one that implied it was.

webp@mander.xyz · 1 day ago

Call me mad, call me crazy. AI shouldn’t be altering databases of knowledge, especially when it is so inconsistent. If there is a question on whether certain words are appropriate why can’t you ask another human being, they have forums for a reason, or someone else comes along and fixes it. Or look at a dictionary. The amount of energy spent for dubious information, holy. It’s not like there is a shortage of human beings on earth.

Qwel@sopuli.xyz · edit-2 24 hours ago

https://en.wikipedia.org/wiki/Wikipedia:Writing_articles_with_large_language_models

https://en.wikipedia.org/wiki/Wikipedia:LLM-assisted_translation

The two related “policies” are rather short, you should read them if you haven’t.

AI shouldn’t be altering databases of knowledge, especially when it is so inconsistent

The policy only allows usage as an auto-translater (a task at which they are not worst than old-style auto-translaters that were always allowed) and as spellcheck/grammarcheck (where it is also not worst than other allowed options).

None of those tools were previously seen as altering Wikipedia by themselves. The goal is that LLMs should be used and considered like they were.

To be clear they always were articles for creation submitted from clearly google-translated text, and they always were dismissed as slop. To get an autotranslated article accepted, you need to clean it up until all the information is correct and the grammar is good enough. This is a rather standard workflow for translations. The same thing should apply to LLMs.

The new issue here is that LLMs can “organically” change informations while asked to translate. When a classic autotranslate changes the information, it often (not always) leaves a notable mess in the grammar. LLMs will insert their errors much more cleanly. This is acknowledged by both texts and, well, texts will change if that becomes a reocurring issue.

fuckwit_mcbumcrumble@lemmy.dbzer0.com · 23 hours ago

AI isn’t altering databases or knowledge. AI is telling the writer there’s a better way to do this, and the writer has to explicitly change their wording.

You only know to look at a dictionary for alternative wordings if you know there’s a problem. How do you know there’s a problem?

If you ask someone else what if that same someone else uses your regional dialect and not the one that has problems? Your average writer can review every single word used in the dictionary for every single article they edit. But AI can, and that’s something it’s actually good at. You may only know 5 Spanish speakers, but AI knows everything it was trained on.

davidgro@lemmy.world · edit-2 1 day ago

I hoped the exceptions would be like “Quoted example text of LLM output, when it’s clearly labeled and styled separately from the article text.”

baltakatei@sopuli.xyz · 23 hours ago

That exception probably would be twisted into permission to add an “AI summary” section to each article.

davidgro@lemmy.world · 22 hours ago

Ugh. Yeah, it would have to be worded carefully, you’re right

Phoenixz@lemmy.ca · 1 day ago

So in other words, when used responsibly as a tool with limitations, AI has it’s uses? Though very environmentally unfriendly uses?

Slashme@lemmy.world · 24 hours ago

*its

hperrin@lemmy.ca · 23 hours ago

Good news. Hopefully they’ll get rid of those two exceptions in the future.

JohnEdwa@sopuli.xyz · edit-2 22 hours ago

Would be pretty shitty to make sure every time you are editing Wikipedia to disable any AI based grammar/spellcheckers (e.g Grammarly), and not being allowed to use translation tools.

Because those are the two exceptions.

antonim@lemmy.world · 14 hours ago

Spell- and grammar-checking is useless anyway. If you don’t have at least one word underlined with red in every sentence, you’re not writing anything intellectually serious. 🧐

hperrin@lemmy.ca · 22 hours ago

Why? That’s how they’ve been doing it for 25 years.

Warl0k3@lemmy.world · edit-2 14 hours ago

Spelling/grammar checking and machine translation have been in use for decades on wikipedia, the only difference is that AI has improved the usefulness of the tools for first-pass editing. I don’t believe the policy has even changed - you still had to be fluent in the language if you were using the old style MTL tools, too.

Aside from generating videos of young girls with gigantic titties, this is the only thing generative AI is actually useful for.

hperrin@lemmy.ca · 5 hours ago

I still think it should be banned. It’s prone to just making shit up. Therefore, it’s not useful for any sort of professional work. If you had just a guy named Al, who would work for free, but sometimes would just make stuff up to make you happy, would you let Al work on important things?

Warl0k3@lemmy.world · edit-2 4 hours ago

Yeah, and that tendency is directly addressed in the policy.