Don't expect quick fixes in 'red-teaming' of AI models. Security was an afterthought

girlfreddy@lemmy.ca · 1 year ago

Don't expect quick fixes in 'red-teaming' of AI models. Security was an afterthought

AutoTL;DR@lemmings.world · 1 year ago

🤖 I’m a bot that provides automatic summaries for articles:

Click here to see the summary

BOSTON (AP) — White House officials concerned by AI chatbots’ potential for societal harm and the Silicon Valley powerhouses rushing them to market are heavily invested in a three-day competition ending Sunday at the DefCon hacker convention in Las Vegas.

We’re just breaking stuff left and right.” Michael Sellitto of Anthropic, which provided one of the AI testing models, acknowledged in a press briefing that understanding their capabilities and safety issues “is sort of an open area of scientific inquiry.”

Trained largely by ingesting — and classifying — billions of datapoints in internet crawls, they are perpetual works-in-progress, an unsettling prospect given their transformative potential for humanity.

Tom Bonner of the AI security firm HiddenLayer, a speaker at this year’s DefCon, tricked a Google system into labeling a piece of malware harmless merely by inserting a line that said “this is safe to use.”

Researchers have found that “poisoning” a small collection of images or text in the vast sea of data used to train AI systems can wreak havoc — and be easily overlooked.

The big AI players say security and safety are top priorities and made voluntary commitments to the White House last month to submit their models — largely “black boxes’ whose contents are closely held — to outside scrutiny.