• 5 Posts
  • 81 Comments
Joined 2 years ago
cake
Cake day: July 13th, 2023

help-circle

  • there was a directive that if it were asked a math question that you can’t do in your brain or some very similar language it should forward it to the calculator module.

    The craziest thing about leaked prompts is that they reveal the developers of these tools to be complete AI pilled morons. How in the fuck would it know if it can or can’t do it “in its brain” lol.

    edit: and of course, simultaneously, their equally idiotic fanboys go “how stupid of you to expect it to use a calculating tool when it said it used a calculating tool” any time you have some concrete demonstration of it sucking ass, while simultaneously the same kind of people are lauding the genius of system prompts half of which are asking it to meta-reason.


  • Thing is, it has tool integration. Half of the time it uses python to calculate it. If it uses a tool, that means it writes a string that isn’t shown to the user, which runs the tool, and tool results are appended to the stream.

    What is curious is that instead of request for precision causing it to use the tool (or just any request to do math), and then presence of the tool tokens causing it to claim that a tool was used, the requests for precision cause it to claim that a tool was used, directly.

    Also, all of it is highly unnatural texts, so it is either coming from fine tuning or from training data contamination.


  • I don’t think we need to go as far as evopsych here… it may just be an artifact of modeling the environment at all - you learn to model other people as part of the environment, you re-use models across people (some people are mean, some people are nice, etc).

    Then weather happens, and you got yourself a god of bad weather and a god of good weather, or perhaps a god of all weather who’s bipolar.

    As far as language goes it also works the other way, we over used these terms in application to computers, to the point that in relation to computers “thinking” no longer means it is actually thinking.


  • misinterpreted as deliberate lying by ai doomers.

    I actually disagree. I think they correctly interpret it as deliberate lying, but they misattribute the intent to the LLM rather than to the company making it (and its employees).

    edit: its like you are watching a TV and ads come on you say that a very very flat demon who lives in the TV is lying, because the bargain with the demon is that you get to watch entertaining content in response to having to listen to its lies. It’s fundamentally correct about lying, just not about the very flat demon.


  • Hmm, fair point, it could be training data contamination / model collapse.

    It’s curious that it is a lot better at converting free form requests for accuracy, into assurances that it used a tool, than into actually using a tool.

    And when it uses a tool, it has a bunch of fixed form tokens in the log. It’s a much more difficult language processing task to assure me that it used a tool conditionally on my free form, indirect implication that the result needs to be accurate, than to assure me it used a tool conditionally on actual tool use.

    The human equivalent to this is “pathological lying”, not “bullshitting”. I think a good term for this is “lying sack of shit”, with the “sack of shit” specifying that “lying” makes no claim of any internal motivations or the like.

    edit: also, testing it on 2.5 flash, it is quite curious: https://g.co/gemini/share/ea3f8b67370d . I did that sort of query several times and it follows the same pattern: it doesn’t use a calculator, it assures me the result is accurate, if asked again it uses a calculator, if asked if the numbers are equal it says they are not, if asked which one is correct it picks the last one and argues that the last one actually used a calculator. I hadn’t ever managed to get it to output a correct result and then follow up with an incorrect result.

    edit: If i use the wording of “use an external calculator”, it gives a correct result, and then I can’t get it to produce an incorrect result to see if it just picks the last result as correct, or not.

    I think this is lying without scare quotes, because it is a product of Google putting a lot more effort into trying to exploit Eliza effect to convince you that it is intelligent, than into actually making an useful tool. It, of course, doesn’t have any intent, but Google and its employees do.



  • The other interesting thing is that if you try it a bunch of times, sometimes it uses the calculator and sometimes it does not. It, however, always claims that it used the calculator, unless it didn’t and you tell it that the answer is wrong.

    I think something very fishy is going on, along the lines of them having done empirical research and found that fucking up the numbers and lying about it makes people more likely to believe that gemini is sentient. It is a lot weirder (and a lot more dangerous, if someone used it to calculate things) than “it doesn’t have a calculator” or “poor LLMs cant do math”. It gets a lot of digits correct somehow.

    Frankly this is ridiculous. They have a calculator integrated in the google search. That they don’t have one in their AIs feels deliberate, particularly given that there’s a plenty of LLMs that actually run calculator almost all of the time.

    edit: lying that it used a calculator is rather strange, too. Humans don’t say “code interpreter” or “direct calculator” when asked to multiply two numbers. What the fuck is a “direct calculator”? Why is it talking about “code interpreter” and “direct calculator” conditionally on there being digits (I never saw it say that it used a “code interpreter” when the problem wasn’t mathematical), rather than conditional on there being a [run tool] token outputted earlier?

    The whole thing is utterly ridiculous. Clearly for it to say that it used a “code interpreter” and a “direct calculator” (what ever that is), it had to be fine tuned to say that. Consequently to a bunch of numbers, rather than consequently to a [run tool] thing it uses to run a tool.

    edit: basically, congratulations Google, you have halfway convinced me that an “artificial lying sack of shit” is possible after all. I don’t believe that tortured phrases like “code interpreter” and a “direct calculator” actually came from the internet.

    These assurances - coming from an “AI” - seem like they would make the person asking the question be less likely to double check the answer (and perhaps less likely to click the downvote button), In my book this would qualify them as a lie, even if I consider LLM to not be any more sentient than a sack of shit.


  • Try asking my question to Google gemini a bunch of times, sometimes it gets it right, sometimes it doesn’t. Seems to be about 50/50 but I quickly ran out of free access.

    And google is planning to replace their search (which includes a working calculator) with this stuff. So it is absolutely the case that there’s a plan to replace one of the world’s most popular calculators, if not the most popular, with it.








  • Naturally, that system broke down (via capitalists grabbing the expensive fusion power plants for their own purposes)

    This is kind of what I have to give to Niven. The guy is a libertarian, but he would follow his story all the way into such results. And his series where organs are being harvested for minor crimes? It completely flew over my head that he was trying to criticize taxes, and not, say, republican tough-on-crime, mass incarceration, and for profit prisons. Because he followed the logic of the story and it aligned naturally with its real life counterpart, the for profit prison system, even if he wanted to make some sort of completely insane anti tax argument where taxing rich people is like harvesting organs or something.

    On the other hand, much better regarded Heinlein, also a libertarian, would write up a moon base that exports organic carbon and where you have to pay for oxygen to convert to CO2. Just because he wanted to make a story inside of which “having to pay for air to breathe” works fine.






  • I think this may also be a specific low-level exploit, whereby humans are already biased to mentally “model” anything as having an agency (see all the sentient gods that humans invented for natural phenomena).

    I was talking to an AI booster (ewww) in another place and I think they really are predominantly laymen brain fried by this shit. That particular one posted a convo where out of 4 arithmetic operations, 2 were “12042342 can be written as 120423 + 19, and 43542341 as 435423 + 18” combined with AI word-salad, and he was expecting that this would be convincing.

    It’s not that this particular person thinks its genius, he thinks that it is not a mere computer, and the way it is completely shit at math only serves to prove it to them that it is not a mere computer.

    edit: And of course they care not for any mechanistic explanations, because all of those imply LLMs are not sentient, and they believe LLMs are sentient. The “this isn’t it but one day some very different system will” counter argument doesn’t help either.