LLMs can unmask pseudonymous users at scale with surprising accuracy

return2ozma@lemmy.world · 12 hours ago

LLMs can unmask pseudonymous users at scale with surprising accuracy

nutsack@lemmy.dbzer0.com · 9 minutes ago

I theorized about this a long time ago. pretty sure I’m basically fucked

ExLisper@lemmy.curiana.net · edit-2 16 minutes ago

I think this will only work with people narrating their lives on social media.

“Got coffee from my favorite Granier at La Rambla! Ready of new day of work designing hats for dogs”

“Me and Bobby heading to Madrid to see my friend Concepcion. Do you like his new hat?”

“Just got nominated for ‘best business-casual hat’ at this year’s Barkies! So proud”

And so on…

Because how are you going to de-anonymize some random ramblings about Linux and beans? Everyone likes Linux and beans.

doesit@sh.itjust.works · edit-2 55 minutes ago

Kind of obvious. If you’re a highschool teacher and you used to be a photographer. You also volunteer as a fireman. You live in France. You have 2 daughters. In 2022 you asked about repairs on your honda civic.
All off this can be amassed from different posts on facebook or reddit. There’ll be just a few people that fit this profile.

FauxPseudo @lemmy.world · 8 hours ago

From a Facebook post I made on February 17th:

There are giant AI data firms that promise they can go through massive troves of data and pull out general and specific information from them. Information that is actionable and accurate. Give it 6 million data points and it’ll find all the links and organize them for you and unmask hidden details that aren’t visible to the naked eye.

Not one of those companies is stepping up to go through the publicly released Epstein files.

Mubelotix@jlai.lu · 57 minutes ago

We wouldn’t want that tbh. Justice needs to be precise and backed up by tangible facts

FauxPseudo @lemmy.world · 32 minutes ago

You can use the results of the AI analysis to identify people and then use that to do a proper investigation. Right now none of that is happening. No speculation. No tangibles. No investigation. No indictment.

Trying to unmask people is a step in the right direction.

Randomgal@lemmy.ca · 8 hours ago

This is what I find crazy. Where are the AI bros chewing through the Epstein files?

osaerisxero@kbin.melroy.org · 7 hours ago

I would be shocked if someone hasn’t shoved them into a local model somewhere, but all the big ones would filter them to death with content restrictions

jballs@sh.itjust.works · 10 hours ago

As a registered Republican woman from Texas with five children and two dogs, let me just say that I am astonished!

pivot_root@lemmy.world · 8 hours ago

Me too. I thought I was safe as a Ottoman Empire expatriate living in Arrakis! I don’t want LLMs to connect this account to my pseudonymous mommy blog where I write about my three children who might exist but could be delusions of my untreated schizophrenia.

potoooooooo ✅️@lemmy.world · 48 minutes ago

Oh, WE EXIST, mommy! Let me assure you, as one of said imaginary schizophrenia babies. Currently shacking up in Miami with my new wife I just met cranking my hog at Sturgis.

Bigfishbest@lemmy.world · 1 hour ago

I don’t believe this! As a fumgrian living as a would be dead camoose off Mt. Kabul, I am overjizzed that AI is reading all my pornhub comments.

CheesyFingers@piefed.social · 4 hours ago

It seems that i, the original Unidan, will unfortunately need to create even more alts to escape being found out. Blast!

meco03211@lemmy.world · 3 hours ago

It seems that i, also the original Unidan, will unfortunately need to create even more alts to escape being found out. Blast!

Whostosay@sh.itjust.works · 7 hours ago

You forgot to list your favorite brands

MinnesotaGoddam@lemmy.world · 3 hours ago

Kegel One

lumpenproletariat@quokk.au · 5 hours ago

Kleenex and Jergens

goatinspace@feddit.org · 9 hours ago

That was surprisingly accurate. Meep meep.

tal@lemmy.today · 11 hours ago

Of course, another option is for people to dramatically curb their use of social media, or at a minimum, regularly delete posts after a set time threshold.

Deletion won’t deal with someone seriously-interested in harvesting stuff, because they can log it as it becomes available. And curbing use isn’t ideal.

I mentioned before the possibility of poisoning data, like, sporadically adding some incorrect information about oneself into one’s comments. Ideally something that doesn’t impact the meaning of the comments, but would cause a computer to associate one with someone else.

There are some other issues. My guess is that it’s probably possible to fingerprint someone to a substantial degree by the phrasing that they use. One mole in the counterintelligence portion of the FBI, Robert Hanssen, was found because on two occasions he used the unusual phrase “the purple-pissing Japanese”.

FBI investigators later made progress during an operation where they paid disaffected Russian intelligence officers to deliver information on moles. They paid $7 million to KGB agent Aleksander Shcherbakov[48] who had access to a file on “B”. While it did not contain Hanssen’s name, among the information was an audiotape of a July 21, 1986, conversation between “B” and KGB agent Aleksander Fefelov.[49] FBI agent Michael Waguespack recognized the voice in the tape, but could not remember who it was from. Rifling through the rest of the files, they found notes of the mole using a quote from George S. Patton’s speech to the Third Army about “the purple-pissing Japanese”.[50] FBI analyst Bob King remembered Hanssen using that same quote. Waguespack listened to the tape again and recognized the voice as Hanssen’s. With the mole finally identified, locations, dates, and cases were matched with Hanssen’s activities during the period. Two fingerprints collected from a trash bag in the file were analyzed and proved to be Hanssen’s.[51][52][53]

That might be defeated by passing text through something like an LLM to rewrite it. So, for example, to take a snippet of my above comment:

Respond with the following text rephrased sentence by sentence, concisely written as a British computer scientist might write it:

Deletion won’t deal with someone seriously-interested in harvesting stuff, because they can log it as it becomes available. And curbing use isn’t ideal.

I mentioned before the possibility of poisoning data, like, sporadically adding some incorrect information about oneself into one’s comments. Ideally something that doesn’t impact the meaning of the comments, but would cause a computer to associate one with someone else.

I get:

The deletion of data alone will not prevent a determined party from gathering information, as they may simply record the information as it becomes available prior to its deletion. Moreover, restricting usage is not an ideal solution to the problem at hand.

I previously mentioned the possibility of introducing deliberate errors or misinformation into one’s own data, such as periodically inserting inaccurate details about oneself within comments. The goal would be to include information that does not significantly alter the meaning of the comment, but which would cause automated systems to incorrectly associate that individual with another person.

That might work. One would have to check the comment to make sure that it doesn’t mangle the thing to the point that it is incorrect, but it might defeat profiling based on phrasing peculiarities of a given person, especially if many users used a similar “profile” for comment re-writing.

A second problem is that one’s interests are probably something of a fingerprint. It might be possible to use separate accounts related to separate interests — for example, instead of having one account, having an account per community or similar. That does undermine the ability to use reputation generated elsewhere (“Oh, user X has been providing helpful information for five years over in community X, so they’re likely to also be doing so in community Y”), which kind of degrades online communities, but it’s better than just dropping pseudonymity and going 4chan-style fully anonymous and completely losing reputation.

zerofk@lemmy.zip · 57 minutes ago

Your above average use of the word “one” and variations like “one’s” could be quite telling.

As could my correction of “it’s” in the above sentence.

Yliaster@lemmy.world · 5 hours ago

Why is curbing use unideal?

HyperfocusSurfer@lemmy.dbzer0.com · edit-2 9 hours ago

Regarding the last point: it’s more of a bias, tho, so reducing it may even be a good thing. E.g. asking Kent Overstreet’s opinion on your bcachefs setup is probably useful, while getting relationship advice from him is ill-advised.

regenwetter@piefed.social · 4 hours ago

Advice being right or wrong isn’t necessarily the big issue for online communities (unless most other users are also wrong). What really degrades them is users acting like assholes, and someone who acts like that in a tech community is fairly likely to also do that in a political or relationship community.

maplesaga@lemmy.world · 9 hours ago

Average people download gamed and apps and their phone is loaded to the tilt with bloatware. You think they care?

SupraMario@lemmy.world · 8 hours ago

The average person puts their entire lives on Facebook or linkedin with their real names…they don’t give a shit.

Art3mis@lemmy.world · 7 hours ago

“WeLl I hAvE nOtHiNg To HiDe”

SupraMario@lemmy.world · 6 hours ago

The number of times I’ve heard this from people in the secops field is frighteningly high.

DarkCloud@lemmy.world · 10 hours ago

Great, we’re at a point where “researchers” are helping tech bros hurt the public interest. Could they just NOT publish this shit? Stop giving helpful tips to tyrannical oligarchs!

Academics can be stupid idiots sometimes.

zerofk@lemmy.zip · 52 minutes ago

Researchers’ work has always been abused by others. The advancement and free distribution of knowledge should not be curtailed for fear of malicious parties.

ToTheGraveMyLove@sh.itjust.works · edit-2 10 hours ago

Who am I? No forreal, WHO AM I? Last I remember I was on a cruise around the Caribbean. I blacked out one night while at the casino and when I came to I was on a beach in the middle of nowhere with a toothless man who spoke a language I couldn’t comprehend, unable to remember my name or anything from before the cruise. Thankfully he still has a dial up connection somehow in the year of our lord 2026, but I’ve been on this island for two years now. SOMEONE COME GET ME!

FenrirIII@lemmy.world · 7 hours ago

Your wife is much happier with me now and the children are already calling me dad. It’s time to move on.

rnercle@sh.itjust.works · 10 hours ago

somebody should inform EU that they no longer need chatControl

:/

workgood@lemmy.dbzer0.com · 8 hours ago

no it cant

corsicanguppy@lemmy.ca · edit-2 4 hours ago

I kinda think I want it to try. I make little effort to hide my location or identity, and I think I’d like to see the results.

…just without saying who I am before I get those results. And my desire to stay anonymous-ish and not give it a chance to cheat means I can’t satisfy I have the right to the identify of myself if it finds who I am.

Quite seriously, I cannot prove I have the right to make it search for me, for myself, without giving it too much information or without risking the leak of private info to a so-far unidentified stranger if it finds anything.

Catch-22