Elon Musk's X pushed a fake headline about Iran attacking Israel. X's AI chatbot Grok made it up.

Stopthatgirl7@lemmy.world · 7 months ago

Elon Musk's X pushed a fake headline about Iran attacking Israel. X's AI chatbot Grok made it up.

Ottomateeverything@lemmy.world · 7 months ago

I bet if such a law existed in less than a month all those AI developers would very quickly abandon the “oh no you see it’s impossible to completely avoid hallucinations for you see the math is just too complex tee hee” and would actually fix this.

Nah, this problem is actually too hard to solve with LLMs. They don’t have any structure or understanding of what they’re saying so there’s no way to write better guardrails… Unless you build some other system that tries to make sense of what the LLM says, but that approaches the difficulty of just building an intelligent agent in the first place.

So no, if this law came into effect, people would just stop using AI. It’s too cavalier. And imo, they probably should stop for cases like this unless it has direct human oversight of everything coming out of it. Which also, probably just wouldn’t happen.

wizardbeard@lemmy.dbzer0.com · 7 months ago

Yep. To add on, this is exactly what all the “AI haters” (myself included) are going on about when they say stuff like there isn’t any logic or understanding behind LLMs, or when they say they are stochastic parrots.

LLMs are incredibly good at generating text that works grammatically and reads like it was put together by someone knowledgable and confident, but they have no concept of “truth” or reality. They just have a ton of absurdly complicated technical data about how words/phrases/sentences are related to each other on a structural basis. It’s all just really complicated math about how text is put together. It’s absolutely amazing, but it is also literally and technologically impossible for that to spontaneously coelesce into reason/logic/sentience.

Turns out that if you get enough of that data together, it makes a very convincing appearance of logic and reason. But it’s only an appearance.

You can’t duct tape enough speak and spells together to rival the mass of the Sun and have it somehow just become something that outputs a believable human voice.

For an incredibly long time, ChatGPT would fail questions along the lines of “What’s heavier, a pound of feathers or three pounds of steel?” because it had seen the normal variation of the riddle with equal weights so many times. It has no concept of one being smaller than three. It just “knows” the pattern of the “correct” response.

It no longer fails that “trick”, but there’s significant evidence that OpenAI has set up custom handling for that riddle over top of the actual LLM, as it doesn’t take much work to find similar ways to trip it up by using slightly modified versions of classic riddles.

A lot of supporters will counter “Well I just ask it to tell the truth, or tell it that it’s wrong, and it corrects itself”, but I’ve seen plenty of anecdotes in the opposite direction, with ChatGPT insisting that it’s hallucination was fact. It doesn’t have any concept of true or false.

neatchee@lemmy.world · edit-2 7 months ago

The shame of it is that despite this limitation LLMs have very real practical uses that, much like cryptocurrencies and NFTs did to blockchain, are being undercut by hucksters.

Tesla has done the same thing with autonomous driving too. They claimed to be something they’re not (fanboys don’t @ me about semantics) and made the REAL thing less trusted and take even longer to come to market.

Drives me crazy.

FlashMobOfOne@lemmy.world · 7 months ago

Yup, and I hate that.

I really would like to one day just take road trips everywhere without having to actually drive.

cygon@lemmy.world · 7 months ago

I love that example. Microsoft’s Copilot (based on GTP-4) immediately doesn’t disappoint:

Microsoft Copilot: Two pounds of feathers and a pound of lead both weigh the same: two pounds. The difference lies in the material—feathers are much lighter and less dense than lead. However, when it comes to weight, they balance out equally.

It’s annoying that for many things, like basic programming tasks, it manages to generate reasonable output that is good enough to goat people into trusting it, yet hallucinates very obviously wrong stuff or follows completely insane approaches on anything off the beaten path. Every other day, I have to spend an hour to justify to a coworker why I wrote code this way when the AI has given him another “great” suggestion, like opening a hidden window with an UI control to query a database instead of going through our ORM.