sanitation@lemmy.today to Technology@lemmy.worldEnglish · 1 day agoAdvanced AI models suffer a near-total collapse on classic psychology test as cognitive demands increasewww.psypost.orgexternal-linkmessage-square73linkfedilinkarrow-up1236arrow-down17
arrow-up1229arrow-down1external-linkAdvanced AI models suffer a near-total collapse on classic psychology test as cognitive demands increasewww.psypost.orgsanitation@lemmy.today to Technology@lemmy.worldEnglish · 1 day agomessage-square73linkfedilink
minus-squareexpr@programming.devlinkfedilinkEnglisharrow-up11arrow-down1·23 hours agoThat’s not lying. There’s nothing linguistic about numerical computation.
minus-squareCommunist@lemmy.frozeninferno.xyzlinkfedilinkEnglisharrow-up1arrow-down2·16 hours agoNo. https://www.nature.com/articles/d41586-025-02343-x It’s lying
minus-squarezbyte64@awful.systemslinkfedilinkEnglisharrow-up1·13 hours agoYou know the “DeepMind and OpenAi models” is the hint that the LLM model is not the one doing the math. The LLM provides a hypothesis and the DeepMind model provides grounding or feedback on whether the hypothesis even makes sense or works.
minus-squareCommunist@lemmy.frozeninferno.xyzlinkfedilinkEnglisharrow-up1·5 hours agoIt is totally irrelevant that the model calls tools to do the math. That is still a success.
That’s not lying. There’s nothing linguistic about numerical computation.
No.
https://www.nature.com/articles/d41586-025-02343-x
It’s lying
You know the “DeepMind and OpenAi models” is the hint that the LLM model is not the one doing the math. The LLM provides a hypothesis and the DeepMind model provides grounding or feedback on whether the hypothesis even makes sense or works.
It is totally irrelevant that the model calls tools to do the math. That is still a success.