• Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    60
    ·
    edit-2
    2 days ago

    Imagine you have a magic box that can generate any video you want. Some people ask it to generate fan fiction-like videos, some ask it to generate meme-like videos, and a whole lot of people ask it to generate porn.

    Then there’s a few people that ask it to generate videos using trademarked and copyrighted stuff. It does what the user asks because there’s no way for it to know what is and isn’t copyrighted. What is and isn’t parody or protected fair use.

    It’s just a magic box that generates videos… Whatever the human asks for.

    This makes some people and companies very, very upset. They sue the maker of the magic box, saying it’s copying their works. They start PR campaigns, painting the magic box in a bad light. They might even use the magic box quite a lot themselves but it doesn’t matter. To them, the magic box is pure evil; indirectly preventing them from gaining more profit… Somehow. Just like Sony was sued for making a machine that let people copy whatever videos they wanted (https://en.wikipedia.org/wiki/Sony_Corp._of_America_v._Universal_City_Studios%2C_Inc.).

    Before long, other companies make their own magic boxes and then, every day people get access to their own, personal magic boxes that no one can see the output from unless they share.

    Why is this different from the Sony vs Universal situation? The AI magic box is actually worse at copying videos than a VCR.

    When a person copies—and then distributes—a movie do we say the maker of the VCR/DVD burner/computer is at fault for allowing this to happen? No. It’s the person that distributed the copyrighted work.

    • sanzky@beehaw.org
      link
      fedilink
      arrow-up
      1
      ·
      3 hours ago

      This analogy is absolutely bonkers. the VCR is not made out of copyrighted material. If the VCR does not spit bunch of copyrighted material on demand because the makers put it there. AI Image generation models cannot be created without copyrighted material. That is not even a controversial take.

    • t3rmit3@beehaw.org
      link
      fedilink
      arrow-up
      3
      ·
      23 hours ago

      Yes, this is in fact a good argument for not banning AI.

      It’s not an argument for not holding companies legally accountable for using copyrighted material to do it.

      These suits aren’t actually equivalent to Sony v UCS, they’re equivalent to someone suing a bootleg video company.

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        4
        ·
        22 hours ago

        If you believe AI companies should NOT be allowed to train AI with copyrighted works you should stop using Internet search engines. Because the same rules that allow Google to train their search with everyone’s copyrighted websites are what allow the AI companies to train their models.

        Every day, Google and others download huge swaths of the Internet directly into their servers and nobody bats an eye. An AI company does the same thing and now people say that’s copyright infringement.

        What the fuck! I don’t get it. It’s the exact same thing. Why is an AI company doing that any different‽

        It’d be one thing if people were bitching about just the output of AI models but they’re not. They’re bitching about the ingress step!

        The day we ban ingress of copyrighted works into whatever TF people want is the day the Internet stops working.

        My comment right here is copyrighted. So is yours! I didn’t ask your permission before my Lemmy client downloaded it. I don’t need to ask your permission to use your comment however TF I want until I distribute it. That’s how the law works. That’s how it’s always worked.

        The DMCA also protects the sites that host Lemmy instances from copyright lawsuits. Because without that, they’d be guilty of distribution of copyrighted works without the owner’s permission every damned day.

        People who hate AI are supporting an argument that the movie and music studios made in the 90s: That “downloading is theft.” It is not! In fact, because that is not theft, we’re all able to enjoy the Internet every day.

        Ever since the Berne convention, literally everything is copyrighted. Everything.

        • sanzky@beehaw.org
          link
          fedilink
          arrow-up
          1
          ·
          3 hours ago

          I f you believe AI companies should NOT be allowed to train AI with copyrighted works you should stop using Internet search engines. Because the same rules that allow Google to train their search with everyone’s copyrighted websites are what allow the AI companies to train their models.

          sorry but no. most search bots have been for years quite reasonable in following instructions from the sites on what to scrap and what not. AI scrappers have shown they are willing to go to great lenghts to scrap content against the wishes of the website owners.

          Additionally this scrapping has shown to put a tremendous amount of problems into some sites and platforms, open source projects, for example.

          Last, search engines are a win-win. They get to show ads and then redirect traffic to the source. LLMs for the most part steal that traffic, by regurgitating the same content they stole in the first place.

        • t3rmit3@beehaw.org
          link
          fedilink
          arrow-up
          5
          arrow-down
          1
          ·
          edit-2
          22 hours ago

          Because the same rules that allow Google to train their search with everyone’s copyrighted websites are what allow the AI companies to train their models.

          This is false, by omission. Many of the AI companies have been downloading content through means other than scraping, such as bittorrent, to access and compile copyrighted data that is not publicly scrape-able. That includes Meta, OpenAI, and Google.

          The day we ban ingress of copyrighted works into whatever TF people want is the day the Internet stops working.

          That is also false. Just because you don’t understand the legal distinction between scraping content to summarize in order to direct people to a site (there was already a lawsuit against Google that established this, as well as its boundaries), versus scraping content to generate a replacement that obviates the original content, doesn’t mean the law doesn’t understand it.

          My comment right here is copyrighted. So is yours! I didn’t ask your permission before my Lemmy client downloaded it. I don’t need to ask your permission to use your comment however TF I want until I distribute it. That’s how the law works. That’s how it’s always worked.

          The DMCA also protects the sites that host Lemmy instances from copyright lawsuits. Because without that, they’d be guilty of distribution of copyrighted works without the owner’s permission every damned day.

          And none of this matters, because AI companies aren’t just reading content, they’re taking it and using it for commercial purposes.

          Perhaps you are unaware, but (at least in the US) while it is legal for you to view a video on YouTube, if you download it for offline use that would constitute copyright infringement if the owner objects. The video being public does not grant anyone and everyone the right to use it however they wish. Ditto for something like making an mp3 of a song on Spotify using Audacity.

          People who hate AI are supporting an argument that the movie and music studios made in the 90s: That “downloading is theft.” It is not! In fact, because that is not theft, we’re all able to enjoy the Internet every day.

          First off, I do not hate AI, I use it myself (locally-run). My issue is with AI companies using it to generate profit at the expense of the actual creators whose art AI companies are trying to replace (i.e. not directing people to it, like search results).

          Secondly, no one is arguing that it is theft, they are arguing that it is copyright infringement, which is what all of us are also subject to under the DMCA. So we’re actually arguing that AI companies should be held to the same standard that we are.

          Also, note that AI companies have argued in court (in the case brought by Steven King et al) that their use of copyrighted material shouldn’t fall under DMCA at all (i.e. arguing that it’s not about Fair Use), because their argument is that AI training is not the ‘intended use’ of the source material, so this is not eating into that commercial use. That argument leaves copyright infringement liability intact for the rest of us, while solely exempting them from liability. No thanks.

          Luckily, them arguing they’re apart and separate from Fair Use also means that this can be rejected without affecting Fair Use! Double-win!

          • Riskable@programming.dev
            link
            fedilink
            English
            arrow-up
            2
            ·
            19 hours ago

            Many of the AI companies have been downloading content through means other than scraping, such as bittorrent, to access and compile copyrighted data that is not publicly scrape-able. That includes Meta, OpenAI, and Google.

            Anthropic is the only company to have admitted publicly to doing this. They were sued and settled out of court. Google and OpenAI have had no such accusations as far as I’m aware. Furthermore, Google had the gigantic book scanning project where it was determined in court that the act of scanning as many fucking books as you want is perfectly legal (fair use). Read all about it: https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.

            In late 2013, after the class action status was challenged, the District Court granted summary judgment in favor of Google, dismissing the lawsuit and affirming the Google Books project met all legal requirements for fair use. The Second Circuit Court of Appeal upheld the District Court’s summary judgment in October 2015, ruling Google’s “project provides a public service without violating intellectual property law.” The U.S. Supreme Court subsequently denied a petition to hear the case.

            You say:

            That is also false. Just because you don’t understand the legal distinction between scraping content to summarize in order to direct people to a site (there was already a lawsuit against Google that established this, as well as its boundaries), versus scraping content to generate a replacement that obviates the original content, doesn’t mean the law doesn’t understand it.

            There is no such legal distinction. Scraping content is legal no matter WTF you plan to do with it. This has been settled in court many, many times. Here’s some court cases for you to learn the actual legality of scraping and storing of said scraped data:

            To summarize all this: You are 100% wrong. I have cited my sources. I was there (“3000 years ago…”) when all this went down. Pepperidge Farm remembers.

            You say:

            And none of this matters, because AI companies aren’t just reading content, they’re taking it and using it for commercial purposes.

            This is a common misconception of copyright law: Remember Napster? They were sued and argued in court that because users don’t profit from sharing songs with their friends, it is legal. The court rejected this argument: https://en.wikipedia.org/wiki/A%26M_Records,_Inc._v._Napster,_Inc. See also: https://en.wikipedia.org/wiki/Capitol_Records,_Inc._v._Thomas-Rasset and https://en.wikipedia.org/wiki/Harper_%26_Row_v._Nation_Enterprises and https://en.wikipedia.org/wiki/American_Geophysical_Union_v._Texaco,_Inc. where the courts all ruled the same way.

            You say:

            Perhaps you are unaware, but (at least in the US) while it is legal for you to view a video on YouTube, if you download it for offline use that would constitute copyright infringement if the owner objects. The video being public does not grant anyone and everyone the right to use it however they wish. Ditto for something like making an mp3 of a song on Spotify using Audacity.

            Downloading a Youtube video for offline use is legal… Depending on the purpose. This is one of those very, very nuanced areas of copyright law where fair use intersects with the DMCA and also intersects with the CFAA. The DMCA states, “No person shall circumvent a technological measure that effectively controls access to a work protected under this title.” Since Youtube videos have some technical measures to prevent copying (depending on the resolution and platform!), it is illegal to circumvent them. However, The Librarian of Congress can grant exceptions to this rule and has done so for many situations. For example, archiving (https://www.arl.org/news/librarian-of-congress-expands-dmca-exemption-for-text-and-data-mining/) which is just plain wacky, IMHO.

            Regardless, if Youtube didn’t put an anti-circumvention mechanism into their videos it would be perfectly legal to download the videos. Just like it’s legal to record TV shows with a VCR. This was ruled in Sony Corp. of America v. Universal City Studios (already cited). There’s no reason why it wouldn’t still apply to Youtube videos. The fact that no one has been sued for doing this since then (that I could find) seems to indicate that this is a very settled thing.

            You say:

            no one is arguing that it is theft, they are arguing that it is copyright infringement, which is what all of us are also subject to under the DMCA. So we’re actually arguing that AI companies should be held to the same standard that we are.

            No. Fuck no. A shittton of people are saying it’s “theft”. Have you been on the Internet recently? LOL! I see it every damned day and I’m sick of it. I repeat myself that, “it’s not theft, it’s copyright infringement” and I get downvoted for “being pedantic”. Like it’s not a very fucking important distinction!

            …but also: What an AI model does isn’t copyright infringement (usually). You ask it to generate an image or some text and it just does what you ask it to do. The fact that it’s possible for it to infringe copyright shouldn’t matter because it’s just a tool like a Xerox machine/copier. It has already been ruled fair use for an AI company to train their models with copyrighted works (great summary of that here: https://www.debevoise.com/insights/publications/2025/06/anthropic-and-meta-decisions-on-fair-use ). Despite these TWO court rulings, people are still saying that training AI models is both “theft” and somehow “illegal”. We’re already past that.

            AI models are terrible copyright violators! Everything they generate—at best—can only ever be, “kinda sorta like” a copyrighted work. You can get closer and closer if you get clever with prompts and tell the model to generate say, 10000 images of the same thing. Then you can look at your prayers to the RNG gods and say, “Aha! Look! This image looks very very similar to Indiana Jones!”

            You say:

            Also, note that AI companies have argued in court (in the case brought by Steven King et al) that their use of copyrighted material shouldn’t fall under DMCA at all (i.e. arguing that it’s not about Fair Use), because their argument is that AI training is not the ‘intended use’ of the source material, so this is not eating into that commercial use. That argument leaves copyright infringement liability intact for the rest of us, while solely exempting them from liability. No thanks.

            Luckily, them arguing they’re apart and separate from Fair Use also means that this can be rejected without affecting Fair Use! Double-win!

            Where TF did you see this? I did some searching and I cannot see anything suggesting that the AI companies have rejected any kind of DMCA protection.

            • t3rmit3@beehaw.org
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              16 hours ago

              Might have to break this into a couple replies. because this is a LOT to work through.

              Anthropic is the only company to have admitted publicly to doing this. They were sued and settled out of court. Google and OpenAI have had no such accusations as far as I’m aware.

              Meta is being sued by several groups over this, including porn companies who caught them torrenting. Their defense has been to claim that the 2,400 videos downloaded to their corporate IP space was done for “personal use”.

              OpenAI is also being accused of pirating books (not scraping), and it has been unable to prove legal procurement of them.

              There is no such legal distinction [scraping for summary use vs scraping for supplanting the original content]. Scraping content is legal no matter WTF you plan to do with it.

              Interestingly, it’s actually Meta’s most recent partial win that explicitly helps disproves this. Apart from just generally ripping into Meta for clearly infringing copyright, the judge wrote (page 3)

              There is certainly no rule that when your use of a protected work is “transformative,” this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. Under the fair use doctrine, harm to the market for the copyrighted work is more important than the purpose for which the copies are made.

              So yes, Fair Use absolutely does take into account market harms.

              What an AI model does isn’t copyright infringement (usually).

              I never asserted this, and I am well aware of the distinction between the copyright infringement which involved the illegal obtainment of copyrighted material, and the AI training. You seem to be bringing a whole host of objections you get from others and applying them to me.

              I think it’s perfectly reasonable to require that AI companies legally acquire a copy of any copyrighted material. Just as it would not be legal for me to torrent a movie even if I wanted to do something transformative with it, AI companies should not be able to do so either.

    • shnizmuffin@lemmy.inbutts.lol
      link
      fedilink
      English
      arrow-up
      100
      arrow-down
      1
      ·
      2 days ago

      Now imagine it’s not a magic box. Imagine it is a computer program, written with intent, that was intentionally fed copyrighted material so it could make those things people asked for.

      Giant companies operating outside the law at “the cost of doing business” built plagiarism machines off the life’s work of thousands of people so that horny weirdos could jerk it to Pikachu with tits.

      • murmelade@lemmy.ml
        link
        fedilink
        English
        arrow-up
        12
        ·
        edit-2
        2 days ago

        O_o Pikachu with tits eh? Now where could one of these hypothetical horny weirdos find these magic boxes you speak of?

      • whoever loves Digit@piefed.social
        link
        fedilink
        English
        arrow-up
        9
        ·
        edit-2
        2 days ago

        To be fair, a horny weirdo would still probably pay a human artist for a Pikachu with tits. Image generators probably wouldn’t get the nuance exactly right. An image generator is better for like, Azula getting her toes sucked by a Dai Li agent or something. Probably. I’m only speculating

        • XLE@piefed.social
          link
          fedilink
          English
          arrow-up
          9
          ·
          2 days ago

          as a totally off-topic aside, I find that people tend to dislike AI in areas they are deeply familiar with.

    • Mark with a Z@suppo.fi
      link
      fedilink
      arrow-up
      33
      ·
      2 days ago

      do we say the maker of the VCR/DVD burner/computer is at fault

      There’s one difference you ignored: to copy a video with a VCR, the user needs to supply the copyrighted material. I’m sure the manufacturers would’ve been in more legal trouble if they shipped VCRs packed with pirated content.

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        You obviously never used a VCR to record a live broadcast before. When people were using a VCR to record things, that’s what they were doing 99% of the time. Nobody had two VCRs hooked up to each other to copy tapes. That was a super rare situation that you’d typically only find in professional studios.

          • Riskable@programming.dev
            link
            fedilink
            English
            arrow-up
            1
            ·
            23 hours ago

            Your VCR is hooked up to your TV (coaxial into the VCR and from the VCR into the TV). Just before the broadcast starts, you press the record button (which was often mechanically linked to the play button). When it’s done, you press stop. Then you rewind and can play it back later.

            The end user is sort of passively recording it. The broadcast happens regardless of the user’s or the VCR’s presence.

            • Mark with a Z@suppo.fi
              link
              fedilink
              arrow-up
              1
              ·
              22 hours ago

              Well derailed, succesfully avoided going anywhere near the actual argument.

              I’m sure the manufacturers would’ve been in more legal trouble if they shipped VCRs packed with pirated content.

              VCRs do not contain such copyrighted material.

      • FarceOfWill@infosec.pub
        link
        fedilink
        arrow-up
        10
        ·
        edit-2
        2 days ago

        And you need a blank tape to copy to, the makers of which did have to pay a charge to content firms to cover piracy.

        In this tortured analogy the blanks are also included in the ai firm product, so yes they would have had to pay

    • SmoochyPit@lemmy.ca
      link
      fedilink
      arrow-up
      22
      arrow-down
      2
      ·
      2 days ago

      The difference between Gen AI and Sony v. Universal feels pretty substantial to me: VCRs did not require manufacturers to use any copyrighted material to develop and manufacture them. They only could potentially infringe copyright if the user captured a copyrighted signal and used it for commercial purposes.

      If you read the title and the description of the article, it admittedly does make it sound like the studios are taking issue with copyrighted IPs being able to be generated. But the first paragraph of the body states that the problem is actually the usage of copyrighted works as training inputs:

      The Content Overseas Distribution Association […] has issued a formal notice to OpenAI demanding that it stop using its members content to train its Sora 2 video generation tool without permission.

      You compare Gen AI to “magic boxes”… but they’re not magic. They have to get their “knowledge” from somewhere. These AI tools are using many patterns far more subtle and complex than humans can recognize, and they aren’t storing the training inputs using them— it’s just used to strengthen connections within the neural net (afaik, as I’m not an ML developer). I think that’s why it’s so unregulated: how to you prove they used your content? And even so, they aren’t storing or outputting it directly. Could it fall under fair use?

      Still, using copyrighted information in the creation of an invention has historically been considered infringement (I may not be using the correct terminology in this comparison, since maybe it’s more relevant to patent law), even if it didn’t end up in the invention— in software, for example, reverse engineers can’t legally rely on leaked source code to guide their development.

      Also, using a VCR for personal use wouldn’t be a problem, which I’d say was a prominent use-case. And using it commercially wouldn’t involve any copyrighted material, unless the owner inputs any. Those aren’t the case with Gen AI: regardless of what you generate, non-commercially or commercially, the neural network was built using a majority of unauthorized, copyrighted content.


      That said, copyright law functions largely to protect corporations anyways— an individual infringing the copyright of a corporation for personal or non-commercial use causes very little harm, but can usually be challenged and stopped. A corporation infringing copyright of an individual often can’t be stopped. Most individuals can’t even afford the legal fees, anyways.

      For that reason, I’m glad to see companies taking legal action against OpenAI and other megacorps which are (IMO) infringing the copyright of individuals and corporations at this kind of a massive scale. Individuals certainly can’t stop it, but corporations may be able to get some justice or encourage more to be done to safeguard the technology.

      Much damage is already done, though. E-waste and energy usage from machine learning have skyrocketed. Websites struggle to fight crawlers and lock down their APIs, both harming legit users. Non-consensual AI pornography is widely accessible. Many apps encourage people, including youth, to forgo genuine connection, both platonic and romantic, in exchange for AI chatbots. Also LLMs are fantastic misinformation machines. And we have automated arts, arguably the most “human” thing we can do, and put many artists out of work in doing so.

      Whether the lack of safety guards is because of government incompetence, corruption, or is inherent to free-market capitalism, I’m not sure. Probably all of those reasons.


      In summary, I disagree with you. I think companies training AI with unauthorized material are at fault. And personally, I think the entire AI industry as it exists currently is unethical.