Do leaders even believe that generative AI is useful?

diz@awful.systems · edit-2 15 minutes ago

Do leaders even believe that generative AI is useful?

diz@awful.systems · 2 hours ago

Isn’t it part of the lawsuit that one of the developers literally said that downloading torrents on a corporate machine feels wrong?

That they routinely use bittorrent protocol for data only makes it more willful, since they know how it works while your average Joe may not understand that he is distributing anything.

diz@awful.systems · 4 days ago

Meta was “allegedly” seeding porn to speed up their book downloads.

diz@awful.systems · edit-2 4 days ago

Film photography is my hobby and I think that there isn’t anything that would prevent from exposing a displayed image on a piece of film, except for the cost.

Glass plates it is, then. Good luck matching the resolution.

In all seriousness though I think your normal set up would be detectable even on normal 35mm film due to 1: insufficient resolution (even at 4k, probably even at 8k), and 2: insufficient dynamic range. There would probably also be some effects of spectral response mismatch - reds that are cut off by the film’s spectral response would be converted into film-visible reds by a display. Il

Detection of forgery may require use of a microscope and maybe some statistical techniques. Even if the pixels are smaller than film grains, pixels are on a regular grid and film grains are not.

Edit: trained eyeballing may also work fine if you are familiar with the look of that specific film.

diz@awful.systems · edit-2 8 days ago

Hmm, maybe too premature - chatgpt has history on by default now, so maybe that’s where it got the idea it was a classic puzzle?

With history off, it still sounds like it has the problem in the training dataset, but it is much more bizarre:

https://markdownpastebin.com/?id=68b58bd1c4154789a493df964b3618f1

Could also be randomness.

Select snippet:

Example 1: N = 2 boats

Both ferrymen row their two boats across (time = D/v = 1/3 h). One ferryman (say A) swims back alone to the west bank (time = D/u = 1 h). That same ferryman (A) now rows the second boat back across (time = 1/3 h). Meanwhile, the other ferryman (B) has just been waiting on the east bank—but now both are on the east side, and both boats are there.

Total time

$$ T_2 ;=; \frac{1}{3} ;+; 1 ;+; \frac{1}{3} ;=; \frac{5}{3}\ \mathrm{hours} \approx 1,\mathrm{h},40,\mathrm{min}. $$

I have to say with history off it sounds like an even more ambitious moron. I think their history thing may be sort of freezing bot behavior in time, because the bot sees a lot of past outputs by itself, and in the past it was a lot less into shitting LaTeX all over the place when doing a puzzle.

diz@awful.systems · 9 days ago

Now we need to make a logic puzzle involving two people and one cup. Perhaps they are trying to share a drink equitably. Each time they drink one third of remaining cup’s volume.

diz@awful.systems · edit-2 10 days ago

Yeah that’s the version of the problem that chatgpt itself produced, with no towing etc.

I just find it funny that they would train on some sneer problem like this, to the point of making their chatbot look even more stupid. A “300 billion dollar” business, reacting to being made fun of by a very small number of people.

diz@awful.systems · edit-2 10 days ago

Oh wow it is precisely the problem I “predicted” before: there are surprisingly few production grade implementations to plagiarize from.

Even for seemingly simple stuff. You might think parsing floating point numbers from strings would have a gazillion examples. But it is quite tricky to do it correctly (a correct implementation allows you to convert a floating point number to a string with enough digits, and back, and always obtain precisely the same number that you started with). So even for such omnipresent example, which has probably been implemented well over 10 000 times by various students, if you start pestering your bot with requests to make it better, if you have the bots write the tests and pass them, you could end up plagiarizing something identifiable.

edit: and even suppose there were 2, or 3, or 5 exfat implementations. They would be too different to “blur” together. The deniable plagiarism that they are trying to sell - “it learns the answer in general from many implementations, then writes original code” - is bullshit.

diz@awful.systems · 10 days ago

We did it. 2 people and many boats problem is a classic now. [content warning: botshit]

diz@awful.systems · 21 days ago

I think if people are citing in another 3 months time, they’ll be making a mistake

In 3 months they’ll think they’re 40% faster while being 38% slower. And sometime in 2026 they will be exactly 100% slower - the moment referred to as “technological singularity”.

diz@awful.systems · edit-2 21 days ago

Yeah, the glorious future where every half-as-good-as-expert developer is now only 25% as good as an expert (a level of performance also known as being “completely shit at it”), but he’s writing 10x the amount of unusable shitcode.

diz@awful.systems · edit-2 21 days ago

I think more low tier output would be a disaster.

Even pre AI I had to deal with a project where they shoved testing and compliance at juniors for a long time. What a fucking mess it was. I had to go through every commit mentioning Coverity because they had a junior fixing coverity flagged “issues”. I spent at least 2 days debugging a memory corruption crash caused by such “fix”, and then I had to spend who knows how long reviewing every such “fix”.

And don’t get me started on tests. 200+ tests, of them none caught several regressions in handling of parameters that are shown early in the frigging how-to. Not some obscure corner case, the stuff you immediately run into if you just follow the documentation.

With AI all the numbers would be much larger - more commits “fixing coverity issues” (and worse yet fixing “issues” that LLM sees in code), more so called “tests” that don’t actually flag any real regressions, etc.

diz@awful.systems · 21 days ago

I suspect that the kind of people who would “know how to use it” don’t use it right now since it has not yet reached “useful if you know how to use it” status.

Software work is dominated by the fat tail distribution of time it takes to figure out and fix a bug. Not by typing code. LLMs, much like any other form of cutting and pasting code without having any clue what it does, gives that distribution a longer, fatter tail, hence its detrimental effect on productivity.

diz@awful.systems · edit-2 22 days ago

And the other “nuanced” take, common on my linkedin feed, is that people who learn how to use (useless) AI are gonna replace everyone with their much increased productive output.

Even if AI becomes not so useless, the only people whose productivity will actually improve are the people who aren’t using it now (because they correctly notice that its a waste of time).

diz@awful.systems · 28 days ago

That philosophy always ends in stepping into dogshit to try to boost stock prices.

diz@awful.systems · edit-2 30 days ago

When they tested on bugs not in SWE-Bench, the success rate dropped to 57‑71% on random items, and 50‑68% on fresh issues created after the benchmark snapshot. I’m surprised they did that well.

After the benchmark snapshot. Could still be before LLM training data cut off, or available via RAG.

edit: For a fair test you have to use git issues that had not been resolved yet by a human.

This is how these fuckers talk, all of the time. Also see Sam Altman’s not-quite-denials of training on Scarlett Johansson’s voice: they just asserted that they had hired a voice actor, but didn’t deny training on actual Scarlett Johansson’s voice. edit: because anyone with half a brain knows that not only did they train on her actual voice, they probably gave it and their other pirated movie soundtracks massively higher weighting, just as they did for books and NYT articles.

Anyhow, I fully expect that by now they just use everything they can to cheat benchmarks, up to and including RAG from solutions past the training dataset cut off date. With two of the paper authors being from Microsoft itself, expect that their “fresh issues” are gamed too.

diz@awful.systems · 1 month ago

Yeah I’m thinking that people who think their brains work like LLM may be somewhat correct. Still wrong in some ways as even their brains learn from several orders of magnitude less data than LLMs do, but close enough.

diz@awful.systems · edit-2 1 month ago

You can film with an actual camera then use video to video to make it look very AI. If you’re just grifting, that would be the way to go I think.

diz@awful.systems · 1 month ago

They’re also very gleeful about finally having one upped the experts with one weird trick.

Up until AI they were the people who were inept and late at adopting new technology, and now they get to feel that they’re ahead (because this time the new half-assed technology was pushed onto them and they didn’t figure out they needed to opt out).

diz@awful.systems · edit-2 1 month ago

I was writing some math code, and not being an idiot I’m using an open source math library for doing something called “QR decomposition”, and its efficient, and it supports sparse matrices (matrices where many numbers are 0), etc.

Just out of curiosity I checked where some idiot vibecoder would end up. AI simply plagiarizes from some shit sample snippets which exist purely to teach people what QR decomposition is. It’s actually unusable, due to being numerically unstable.

Who in the fuck even needs this shit to be plagiarized, anyway?

It can’t plagiarize a production quality implementation, because you can count those on the fingers of one hand, they’re complex as fuck and you can’t just blend a few together to try to pretend you didn’t plagiarize.

The answer is, people who are peddling the AI. They are the ones who ordered plagiarism with extra plagiarism on top. These are not coding tools, these are demos to convince the investors to buy the actual product, which is company’s stock. There’s a little bit of tool functionality (you can ask them to refactor the code), but it’s just you misusing a demo to try to get some value out of it.

And to that end, the demos take every opportunity to plagiarize something, and to talk about how the “AI” wrote the code from scratch based on its supposed understanding of fairly advanced math.

And in coding, it is counter productive to plagiarize. Many of the open source libraries can be used in commercial projects. You get upstream fixes for free. You don’t end up with some bugs or worse yet security exploits that may have been fixed since the training cut-off date.

No fucking one in the right mind would willingly want their product to contain copy pasted snippets from stale open source libraries, passed through some sort of variable-renaming copyright laundering machine.

Except of course the business idiots who are in charge of software at major companies, who don’t understand software. Who just failed upwards.

They look at plagiarized lines and count them as improved productivity.

diz@awful.systems · 1 month ago

Indistinguishable from a business idiot.

diz@awful.systems · edit-2 1 month ago

If it was a basement dweller with a chatbot that could be mistaken for a criminal co-conspirator, he would’ve gotten arrested and his computer seized as evidence, and then it would be a crapshoot if he would even be able to convince a jury that it was an accident. Especially if he was getting paid for his chatbot. Now, I’m not saying that this is right, just stating how it is for normal human beings.

It may not be explicitly illegal for a computer to do something, but you are liable for what your shit does. You can’t just make a robot lawnmower and run over a neighbor’s kid. If you are using random numbers to steer your lawnmower… yeah.

But because it’s OpenAI with 300 billion dollar “valuation”, absolutely nothing can happen whatsoever.

diz@awful.systems · edit-2 1 month ago

In theory, at least, criminal justice’s purpose is prevention of crimes. And if it would serve that purpose to arrest a person, it would serve that same purpose to court-order a shutdown of a chatbot.

There’s no 1st amendment right to enter into criminal conspiracies to kill people. Not even if “people” is Sam Altman.

diz@awful.systems · 1 month ago

AI solves every river crossing puzzle, we can go home now [content warning: botshit]

diz@awful.systems · edit-2 2 months ago

Google's Gemini 2.5 pro is out of beta.

diz@awful.systems · 3 months ago

Musk ("xAI") now claims grok was hacked

diz@awful.systems · 4 months ago

Gemini seem to have "solved" my duck river crossing, lol.

diz@awful.systems · 4 months ago

Gemini 2.5 "reasoning", no real improvement on river crossings.

diz@awful.systems · 1 year ago

[long] Some tests of how much AI "understands" what it says (spoiler: very little)