It's hard to keep up with what leading AI models CAN do. But what about what they WON'T do?
AI companies are incentivized to safety-align their public models, which opens up an interesting avenue for detection and model fingerprinting ["What?"]
[Here's an example] of ChatGPT Agent refusing to complete the piracy CAPTCHA.
On the flip side, [here's a convo] where GPT Agent created an insult "intended to be derogatory"