ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Optional@lemmy.world · 3 months ago

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

MadMadBunny@lemmy.ca · 3 months ago

Attempting to badly quote someone on another post: « How can people honestly think a glorified word autocomplete function could be able to understand what is a logarithm? »

Ephera@lemmy.ml · 3 months ago

You can make external tools available to the LLM and then provide it with instructions for when/how to use them.
So, for example, you’d describe to it that if someone asks it about math or chess, then it should generate JSON text according to a given schema and generate the command text to parametrize a script with it. The script can then e.g. make an API call to Wolfram Alpha or call into Stockfish or whatever.

This isn’t going to be 100% reliable. For example, there’s a decent chance of the LLM fucking up when generating the relatively big JSON you need for describing the entire state of the chessboard, especially with general-purpose LLMs which are configured to introduce some amount of randomness in their output.

But well, in particular, ChatGPT just won’t have the instructions built-in for calling a chess API/program, so for this particular case, it is likely as dumb as auto-complete. It will likely have a math API hooked up, though, so it should be able to calculate a logarithm through such an external tool. Of course, it might still not understand when to use a logarithm, for example.

Xanthobilly@lemmy.world · 3 months ago

Electricblush@lemmy.world · 3 months ago

This is so stupid and pointless…

“Thing not made to solve spesific task fails against thing made for it…”

This is like saying that a really old hand pushed lawn mower is better then a SUV at cutting grass…

SpaceNoodle@lemmy.world · 3 months ago

SUVs aren’t marketed as grass mowers. LLMs are marketed as AI with all the answers.

otp@sh.itjust.works · 3 months ago

I’d be interested in seeing marketing of ChatGPT as a competitive boardgame player. Is there any?

pinball_wizard@lemmy.zip · 3 months ago

These tools are marketed as replacing lots of jobs that are a hell of a lot more complex than a simple board game.

otp@sh.itjust.works · 3 months ago

These tools are marketed as replacing lots of jobs that are a hell of a lot more complex than a simple board game.

There isn’t really a single sliding scale of “complexity” when it comes to certain tasks.

Given the appropriate input, a calculator can divide two numbers. But it can’t count the number of R’s in the word “strawberry”.

Meanwhile, a script that could count the number of instances of a letter in a word could count those R’s, but it couldn’t divide any two numbers.

Similarly, we didn’t complain that a typewriter couldn’t put pepperoni slices onto a pizza.

SchmidtGenetics@lemmy.world · 3 months ago

Source?

bridgeenjoyer@sh.itjust.works · 3 months ago

Is this just because gibbity couldn’t recognize the chess pieces? I’d love to believe this is true otherwise, love my 2600 haha.

Stillwater@sh.itjust.works · edit-2 3 months ago

At first it blamed its poor performance on the icons used, but then they switched to chess notation and it still failed hard

bridgeenjoyer@sh.itjust.works · 3 months ago

That is baffling

pinball_wizard@lemmy.zip · 3 months ago

That’s on them for taking on the Atari 2600, where “the games don’t get older, they get better!”

QueenHawlSera@sh.itjust.works · 3 months ago

True AI does not and will not exist

JeeBaiChow@lemmy.world · 3 months ago

If llms are statistics based, wouldn’t there be many many more losing games than perfectly winning ones? It’s like Dr strange saying ‘this is the only way’.

Railcar8095@lemm.ee · 3 months ago

It’s not even that. It’s not a chess AI or a AGI (which doesn’t exist). It will speak and pretend to play, but has no memory of the exact position of the pieces nor the capability to plan several steps ahead. For ask intended and porpoises, it’s like asking my toddler what’s the time (she always says something that sounds like a time, but doesn’t understand the concept of hours or what the time is)

The fact that somebody posted this on LinkedIn and not only wasn’t shamed out of his job but there are several articles about it is truly infuriating.

Optional@lemmy.world · 3 months ago

clop - clop - clop - clop - clop - clop

. . .

*bloop*

. . .

[screen goes black for 20 minutes]

. . .

Hmmmmm.

clop - clop - clop - clop - clop - clop - clop - clop - clop - clop

*bloop*

OsrsNeedsF2P@lemmy.ml · 3 months ago

Hey I don’t mean to ruin your day, but maybe you should Google what you just commented…

Optional@lemmy.world · 3 months ago

There is 100% no chance google knows what that is

Optional@lemmy.world · 3 months ago

Little disappointed more people didn’t get this.

OsrsNeedsF2P@lemmy.ml · 3 months ago

What happens if you ask ChatGPT to code you a chess AI though?

4am@lemm.ee · 3 months ago

It doesn’t work without 200 hours of un-fucking

pedz@lemmy.ca · 3 months ago

It probably consumes as much energy as a family house for a day just to come up with that program. That’s what happens.

In fact, I did a Google search and didn’t have any choice but to have an “AI” answer, even if I don’t want it. Here’s what it says:

Each ChatGPT query is estimated to use around 10 times more electricity than a traditional Google search, with a single query consuming approximately 3 watt-hours, compared to 0.3 watt-hours for a Google search. This translates to a daily energy consumption of over half a million kilowatts, equivalent to the power used by 180,000 US households.

redsunrise@programming.dev · 3 months ago

in other words, a hammer “got absolutely wrecked” by a handsaw in a board-halving competition

Redkey@programming.dev · 3 months ago

When all you have (or you try to convince others that all they need) is a hammer, everything looks like a nail. I guess this shows that it isn’t.

JeeBaiChow@lemmy.world · 3 months ago

Clearly you didn’t swing the hammer hard enough

Optional@lemmy.world · 3 months ago

One of those Fisher-Price plastic hammers with the hole in the handle?