Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • Slashme@lemmy.world
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    1
    ·
    3 hours ago

    The most common pushback on the car wash test: “Humans would fail this too.”

    Fair point. We didn’t have data either way. So we partnered with Rapidata to find out. They ran the exact same question with the same forced choice between “drive” and “walk,” no additional context, past 10,000 real people through their human feedback platform.

    71.5% said drive.

    So people do better than most AI models. Yay. But seriously, almost 3 in 10 people get this wrong‽‽

    • T156@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      2 hours ago

      It is an online poll. You also have to consider that some people don’t care/want to be funny, and so either choose randomly, or choose the most nonsensical answer.

    • masterofn001@lemmy.ca
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      3
      ·
      edit-2
      3 hours ago

      Without reading the article, the title just says wash the car.

      I could go for a walk and wash my car in my driveway.

      Reading the article… That is exactly the question asked. It is a very ambiguous question.