2024-05-06: A GPT-4 Killer in the Wild

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

🤔 A GPT-4 Killer in the Wild

The week began with a new and mysterious chatbot making an appearance on lmsys.org (Large Models Systems Organization), a blind taste-testing site for AI language models:

There were some who pointed out that this was still mediocre performance:

The model was extremely capable when tested by folks who had a lot of experience testing models:

Meanwhile, the capability increases were obvious

Breaking out of its training in context

The theory on why this happens, TLDR for a short prompt, pattern match to a memorized task, for a long prompt try to figure out what’s going on:

Better at code manipulation, as judged by a founder building a code generator

It was too good:

Perhaps because it had perfectly memorized the answers:

Theories abounded, was it reasoning and planning agent bolted onto the original, now open-sourced GPT2?

The pinnacle of technical discussion that is 4chan weighed in

Sam Altman played into the whole furor:

Even his edits were scrutinized, gpt2 or gpt-2?

The prompt was dug up

Getting chased down, lmsys clarified that a) it was a new model and b) it was secretly introduced for testing in partnership with the developer. Is lmsys getting paid for it? Unclear.

Attention intensified

And soon, the fix was in.

Bye-bye gpt2-chatbot, we hardly knew ye. lmsys updated its policies to disclose:

What have we learned from this?

  1. There are monsters out there—undisclosed groups working on projects with high capability

  2. Capability increases are easier than we thought - this was likely a small organization, given that large providers have ethics reviews prior to release, and one of the core underpinnings of AI safety is that humans have a right to know about the AI they are interacting with

  3. Benchmarking organizations deserve greater scrutiny

Axios later (and uselessly) reported: “Speaking on Wednesday at Harvard University, Altman told an audience that the mystery bot is not GPT-4.5, what many see as the likely next major update to GPT-4.“

Ah, the classic confirming non-conformation.

Then, on Sunday,

At this point, who knows anymore? More drama shall follow

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🖼️ AI Artwork Of The Day

Crustacean so hot right now - u/powderedminidonut from r/midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Join the conversation

or to participate.