Lol GPT vs Copilot were in stark contrast....
I think the journalists should just try to stick to things they understand. They probably ran a single query and it failed so they kept going on the same conversation.
Sometimes the difference between a good answer and a bad answer is two or three attempts.
It's not like LLM's are particularly good at sussing out lies anyway. It's like summarize the concepts in the article than do web searches on each one trying to find an answer. It's a fairly expensive query that they're honestly going to try to avoid if they can.