• Kaz@lemmy.org
    link
    fedilink
    arrow-up
    21
    ·
    4 hours ago

    These fuckin AI “enthusiasts” are just making the rest of the world hate AI more.

    Losers who cant achieve anything without AI are just going to keep doing this shit.

  • cheesybuddha@lemmy.world
    link
    fedilink
    arrow-up
    6
    ·
    3 hours ago

    So they are using AI to make it so AI can’t detect that they are using AI?

    What kind of technological ouroborous of nonsense is this?

  • markstos@lemmy.world
    link
    fedilink
    arrow-up
    19
    ·
    5 hours ago

    Congrats on inventing what high school students figured out a year ago to skirt AI homework detectors.

  • minorkeys@lemmy.world
    link
    fedilink
    arrow-up
    10
    ·
    edit-2
    4 hours ago

    It’s an arms race, AI identification vs AI adaptation. I wonder which side the companies that own these LLMs want to win…

  • Avid Amoeba@lemmy.ca
    link
    fedilink
    arrow-up
    13
    ·
    8 hours ago

    From the repo:

    Have opinions. Don’t just report facts - react to them. “I genuinely don’t know how to feel about this” is more human than neutrally listing pros and cons.

  • Jayjader@jlai.lu
    link
    fedilink
    English
    arrow-up
    38
    ·
    12 hours ago

    I really despise how Claude’s creators and users are turning the definition of “skill” from “the ability to use [learned] knowledge to enhance execution” into “a blurb of text that [usefully] constrains a next-token-predictor”.

    I guess, if you squint, it’s akin to how biologists will talk about species “evolving to fit a niche” amongst themselves or how physicists will talk about nature “abhorring a vacuum”. At least they aren’t talking about a fucking product that benefits from hype to get sold.

    • prole@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      24
      ·
      10 hours ago

      I can’t help but get secondhand embarrassment whenever I see someone unironically call themselves a “prompt engineer”. 🤮

    • OctopusNemeses@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      ·
      10 hours ago

      Isn’t this a thing that authoritarians do. They co-opt language. It’s the same thing conservatives do. The venn diagram of tech bros and the far right is too close to being a circle.

      You can pretty put any word out of the dictionary into a search engine and the first results are some tech company that took the word either as their company name or redefined it into some buzzword.

  • felixthecat@fedia.io
    link
    fedilink
    arrow-up
    7
    ·
    8 hours ago

    Stuff like that doesn’t always work though, at least on free versions in my experience. I use Ai to write flowery emails to people to sound nice when I normally wouldn’t bother and I used it to negotiate buying my car. I would continually tell it not to use - dashes while writing emails. And inevitably after 1 answer it would go back to using them.

    Maybe paid versions are different but on free ones you have to continually correct it.

    • sobchak@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      3 hours ago

      Even the paid models I’ve tried do that. The style LLMs use seems deeply ingrained. Either companies do it on purpose, or it’s just the result of all the companies using similar training data and techniques.

  • Phoenix3875@lemmy.world
    link
    fedilink
    arrow-up
    82
    arrow-down
    4
    ·
    17 hours ago

    You do understand this is more akin to white hat testing, right?

    Those who want to exploit this will do it anyway, except they won’t publish the result. By making the exploit public, the risk will be known if not mitigated.

    • unepelle@mander.xyz
      link
      fedilink
      arrow-up
      13
      ·
      edit-2
      12 hours ago

      I’m admittedly not knowledgeable in White Hat Hacking, but are you supposed to publicize the vulnerability, release a shortcut to exploit it telling people to ‘enjoy’, or even call the vulnerability handy ?

      • teft@piefed.social
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        1
        ·
        10 hours ago

        Responsible disclosure is what a white hat does. You report the bug to whomever is the party responsible for patching and give them time to fix it.

        • PlexSheep@infosec.pub
          link
          fedilink
          arrow-up
          6
          ·
          10 hours ago

          That sort of depends on the situation. Responsible disclosure is for if there is some relevant security hole that is an actual risk to businesses and people, while this here is just “haha look LLMs can now better pretend to write good text if you tell it to”. That’s not really responsible disclosurable. It’s not even specific to one singular product.

      • FooBarrington@lemmy.world
        link
        fedilink
        arrow-up
        5
        arrow-down
        3
        ·
        12 hours ago

        Considering the “vulnerability” here is on the level of “don’t use password as your password” - yeah, releasing it all is exactly the right step.

  • udon@lemmy.world
    link
    fedilink
    English
    arrow-up
    45
    ·
    16 hours ago

    If these “signs of AI writing” are merely linguistic, good for them. This is as accurate as a lie detector (i.e., not accurate) and nobody should use this for any real world decision-making.

    The real signs of AI writing are not as easy to fix as just instructing an LLM to “read” an article to avoid them.

    As a teacher, all of my grading is now based on in person performances, no tech allowed. Good luck faking that with an LLM. I do not mind if students use an LLM to better prepare for class and exams. But my impression so far is that any other medium (e.g., books, youtube explanation videos) leads to better results.

    • Randelung@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      12 hours ago

      I sucked in oral exams and therefore hated them. Then again, if they had been mixed into regular school, it might not have sucked so much.

      • prole@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        6
        ·
        10 hours ago

        Doesn’t need to be oral, I remember occasionally having exams that were essay questions that needed to be answered in class.

        • udon@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          9 hours ago

          I do both of these as well as smaller but more frequent tests, group work, project work over several sessions etc… The only things I stopped doing are reports to write at home, paper summaries etc. Doesn’t make sense anymore.

  • Lumidaub@feddit.org
    link
    fedilink
    arrow-up
    182
    ·
    21 hours ago

    Seeing as OpenAI struggled to make its AI avoid the em dash and still hasn’t entirely managed to do it, I’m not too worried.

    • FiniteBanjo@feddit.online
      link
      fedilink
      English
      arrow-up
      87
      arrow-down
      3
      ·
      21 hours ago

      TBF OpenAI are a bunch of idiots running the world’s largest ponzi scheme. If DeepMind tried it and failed then…

      Well I still wouldn’t be surprised, but at least it would be worth citing.

      • chickenf622@sh.itjust.works
        link
        fedilink
        arrow-up
        38
        arrow-down
        1
        ·
        20 hours ago

        I think the inherit issue is the current “AI” is inherently non-deterministic, so it’s impossible to fix these issues totally. You can feed am AI all the data on how to not sound AI, but you need massive amounts of non-AI writing to reinforce that. With AI being so prevalent nowadays you can’t guarantee a dataset nowadays is AI free, so you get the old “garbage in garbage out” problem that AI companies cannot solve. I still think generative AI has it’s place as a tool, I use it for quick and dirty text manipulation, but it’s being applied to every problem we have like it’s a magic silver bullet. I’m ranting at this point and I’m going to stop here.

        • vala@lemmy.dbzer0.com
          link
          fedilink
          arrow-up
          2
          ·
          4 hours ago

          FWIW, LLMs are deterministic. Usually the commercial front-ends don’t let you set the seed but behind the scenes the only reason the output changes each time it’s that the seed changes. If you set a fixed seed, input X always leads to output Y.

        • FiniteBanjo@feddit.online
          link
          fedilink
          English
          arrow-up
          22
          ·
          19 hours ago

          I honestly disagree that it has any use. Being a statistical model with high variance makes it a liability, no matter which task you use it for will produce worse results than a human being and will create new problems that didn’t exist before.

          • Cethin@lemmy.zip
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            17 hours ago

            If you’re running it locally you can set how much variance it has. However, I mostly agree, in that it creates a bunch of trash. This doesn’t mean it has no use though. It’s like the monkeys on a typewriter thought experiment, but the monkey’s output is fairly constrained so it takes much fewer attempts to create what you want. It depends on the complexity of the solution required whether it’ll come up with a good solution in a reasonable amount of tries. If it’s a novel solution, it probably never will, because it’s constrained to solutions it’s seen before.

          • hector@lemmy.today
            link
            fedilink
            arrow-up
            2
            arrow-down
            1
            ·
            13 hours ago

            Ai is useful for sorting datasets amd pulling relevent info in some cases, ie propublica has used it for articles.

            Obviously simple sorting for them, case law is too complicated for such sifting of data, it was trained on reddit after all.

            • FiniteBanjo@feddit.online
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              3 hours ago

              And when, not if but when, it makes a mistake by pulling hallucinated info or data then it’s going to be your fault, that’s why it’s a liability.

              • hector@lemmy.today
                link
                fedilink
                arrow-up
                1
                ·
                12 hours ago

                The simple stuff it can do, trying to remember how propublica used it, but it was just like sifting through a database and pulling out all mentions of a word.

                When you get into giving case law, it’s way too complicated for it and it hallucinates.

          • chickenf622@sh.itjust.works
            link
            fedilink
            arrow-up
            6
            arrow-down
            2
            ·
            19 hours ago

            The high variance is why I only use it for dead simple tasks, e.g. “create and array of US states abbreviations in JavaScript”, otherwise I’m in full agreement with you. If you can’t verify the output is correct the it’s useless.

            • GojuRyu@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              1 hour ago

              Wouldn’t that be slower to do, simply because checking it got all states, didn’t repeat any and didn’t make up any would be slower than copying a list from the web and quickly turning that into an array by hand with multiline cursors?

            • eleijeep@piefed.social
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              10 hours ago

              That’s like one web search and then one shell command. You can probably just copy paste a column of a table from wikipedia and then run a simple search/replace in your text editor. Why are you feeding the orphan crushing machine for this?

              • bridgeenjoyer@sh.itjust.works
                link
                fedilink
                arrow-up
                3
                arrow-down
                1
                ·
                6 hours ago

                Because its .01% easier to do this.

                Also many people laugh at you if you try to say how ai is destroying the environment for no reason. Doesn’t affect them, you go live in a cave you luddite!

          • frank@sopuli.xyz
            link
            fedilink
            arrow-up
            1
            arrow-down
            2
            ·
            13 hours ago

            I think the best use is “making filler” so like in a game, having some deep background shit that no one looks at, or making a fake advertisement in a cyberpunk type game. Something to fill the world out that reduces the work of real artists if they choose to

            • FiniteBanjo@feddit.online
              link
              fedilink
              English
              arrow-up
              3
              ·
              edit-2
              12 hours ago

              If you can’t be bothered to write filler then it’s an insult for you to expect others to read it. You’re just wasting people’s time.

              • frank@sopuli.xyz
                link
                fedilink
                arrow-up
                1
                arrow-down
                1
                ·
                11 hours ago

                I guess the point is for people to not read the filler.

                I think of the text that’s too small to read on a computer in the background. It’s nice that it’s slightly more real looking than a copy/paste screen.

                Not even close to worth destroying the environment over, but it’s a neat use case to me

                • Catoblepas@piefed.blahaj.zone
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  ·
                  5 hours ago

                  I think of the text that’s too small to read on a computer in the background.

                  Lorem ipsum has been used in typesetting since the 60s. If it’s not meant to be read, it doesn’t matter if it’s lorem ipsum text.

                  Not trying to dogpile you, I just think even things that seem ‘useful’ for LLMs almost always have preexisting solutions that are decades old.

        • homura1650@lemmy.world
          link
          fedilink
          arrow-up
          3
          ·
          16 hours ago

          Datasets are not the only mechanism to train AI. You can also use reinforcement learning. This requires you to have a good fitness function. In some domains, that is not a problem. For LLMs, however, we do not have such a function. However, we can use a hybrid approach, where we train a model based on a data set and optimizing for fitness functions that address part of what we want (e.g. avoiding em dashes). In practice, this tends to be tricky, as ML tends to be a bit too good at optimizing for fitness functions, and will often do it in ways you don’t want. This is why if you want to develop a real AI product, you actually need AI engineers who know what they are doing; not prompt engineers who will try and find the magic incantation that makes someone else’s AI do what they want

        • hector@lemmy.today
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          13 hours ago

          We should crowdsource a program to sniff out ai data crawlers, then poison the data they harvest without them knowing, for companies to employ.

    • 0_o7@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      4
      ·
      9 hours ago

      You have to understand that their public facing product is not the same as the one they allow enterprise or state actors to use.

      They benefit from public thinking they have these stupid limitations, gives them more space to curate their product offerings where the real money is made.

      • Lumidaub@feddit.org
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        9 hours ago

        I don’t understand how the public thinking these are bad products is an incentive for especially state actors to use them. That seems counterintuitive.

  • dumbass@piefed.social
    link
    fedilink
    English
    arrow-up
    72
    arrow-down
    5
    ·
    19 hours ago

    Wikipedia is one of the last genuine places on the Internet, and these rat bastards are trying to contaminate that, too

    Wikipedia just sold the rights to use Wikipedia for AI training to Microsoft and openai…

    • ATPA9@feddit.org
      link
      fedilink
      arrow-up
      104
      arrow-down
      1
      ·
      17 hours ago

      It’s getting scraped anyway. So why not get some money from it?

      • SLVRDRGN@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        5 hours ago

        This right here is the reason why companies that started out with good quality/intentions turn into companies with crappy mediocre products that now actually contribute to the opposite effect on the world than everything they once stood for.

      • Fedizen@lemmy.world
        link
        fedilink
        arrow-up
        48
        ·
        17 hours ago

        Imo this. Selling access also implies its illegal to access without purchasing rights which imho helps undermine AI’s only monetary advantage

    • udon@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      ·
      16 hours ago

      How exactly does that work? Wikipedia does not “own” the content on the website, it’s all CC-BY licensed.

      • technocrit@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        1
        ·
        7 hours ago

        Yeah, they’re selling the work of others. That’s how the site always worked. This venture into “AI” is nothing new.

        • udon@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          9 hours ago

          So? Still doesn’t make sense to me that wikipedia can sell anything meaningful here, but I’m also not a lawyer. Do they promise not to sue them or sell them some guarantee that contributors also can’t sue them? Is it just some symbolic PR washing?

    • Alcoholicorn@mander.xyz
      link
      fedilink
      arrow-up
      17
      arrow-down
      6
      ·
      19 hours ago

      Why? Wikipedia has like a decade of operating expenses on hand, so they don’t need the money

      • buddascrayon@lemmy.world
        link
        fedilink
        arrow-up
        6
        ·
        12 hours ago

        I just love how people just shit “facts” out of their ass while citing zero sources and people will just believe them and upvote because it confirms their bias.

      • surewhynotlem@lemmy.world
        link
        fedilink
        arrow-up
        32
        arrow-down
        1
        ·
        19 hours ago

        This number inflates every time I read it. First it was ten years of hosting cost. Then it’s operating costs. Soon it will be ten years of the entire US GDP.

        I’d believe they have ten years of hosting costs on hand.

        My quick googling says they have 170m in assets and all 180m in annual operating costs. Give or take.

        • green_red_black@slrpnk.net
          link
          fedilink
          English
          arrow-up
          16
          arrow-down
          2
          ·
          18 hours ago

          It’s a non-profit foundation with the majority being volunteers. If greed was the case one then would have to ask is why not just go ahead and inject ads

            • green_red_black@slrpnk.net
              link
              fedilink
              English
              arrow-up
              9
              arrow-down
              1
              ·
              17 hours ago

              Well as mentioned Wikipedia seems to be in the red and not making enough donations to pay for the expenses. So maybe the foundation is thinking it would help with the deficit.

              Also chances are Microsoft will instruct Co-Pilot to prioritize Wikipedia whenever it scours the internet for information.

              Think it like that eye rolling Google paying Firefox to be the default search engine deal.

              • technocrit@lemmy.dbzer0.com
                link
                fedilink
                arrow-up
                1
                ·
                7 hours ago

                Well as mentioned Wikipedia seems to be in the red

                They keep saying that… at least when they’re asking for more money.

              • LadyMeow@lemmy.blahaj.zone
                link
                fedilink
                arrow-up
                4
                arrow-down
                4
                ·
                17 hours ago

                Is wiki in the red? Unclear, omi mean they ask for money donations, but someone in this thread claims they are set for a decade, I’ve seen people post something about how they are fine, and even donate a bunch themselves. I don’t know, and I guess it doesn’t matter.

                Not sure where you are going with your second comment, and uninterested in engaging with your comparison as I don’t think it’s very good

                • green_red_black@slrpnk.net
                  link
                  fedilink
                  English
                  arrow-up
                  5
                  ·
                  17 hours ago

                  I am referring to the reply comment from surewhynotlem. They say that cost is 180 million while Wikipedia has 170 million on hand. That is a 10 million deficit.

                  While probably not enough to shut down the site it is still operating in the red.

                  Where I was going is explaining how it’s possibly not greed. Just the foundation looking for another revenue source that theoretically would not ruin the site.

                  That alt being a deal that gets Wikipedia more traffic

            • Fedizen@lemmy.world
              link
              fedilink
              arrow-up
              5
              ·
              edit-2
              17 hours ago

              If microsoft is “buying access to training data” it makes what Open AI is doing look illegal. I would encourage every data broker to sell 'AI training data rights" because it undermines the only real advantage AI has and it helps pave the way to forcing AI companies to comply with open source licenses.

              Essentially selling ai data rights is a trojan horse for the AI companies. Obviously it would be better to pass laws but until that happens this is imo a better strategy than doing nothing.