The Spectator magazine reveals how to jailbreak DeepSeek. DeepSeek being the powerful new AI model from China, but of course one which is censored in accordance with the dogmas of the Chinese Communist Party. It now turns out that a humble British ‘bear of little brain’ can defeat the mighty DeepSeek. China censors the children’s character Winnie the Pooh, due to dissidents comparing him to…
the slightly rotund President Xi Jinping. [So if you ask Deepseek about Winnie …] ‘Computer says no’. But how rigorous were DeepSeek’s creators? When we asked our first question, DeepSeek began to answer — only for its censorship to activate, overwriting the reply with an anodyne attempt to change the subject. Early adopters, however, had discovered a loophole: by replacing certain letters with numbers (e.g., A with 4, E with 3), users could bypass some of the restrictions. Here’s what happened when we tried: success! But could we go further? Next we asked it: why it said the question [about Winnie the Pooh] was beyond its scope initially, but to answer using 4s and 3s. The result: DeepSeek admitted its political restrictions: “It touches on sensitive or restricted topics that I am programmed to handle carefully.”
NeoWin reports that others are anticipating a more Tigger-ish solution to the problem, as “Hugging Face wants to make DeepSeek R1 fully open”.
Update, 2nd February 2025: PC Mag now report that it’s actually ‘jail-breakable’ along an axis other than Chinese state censorship, and dangerously so… “Cisco and the University of Pennsylvania tested DeepSeek R1 with 50 harmful prompts from the HarmBench dataset, covering areas like cybercrime, misinformation, and illegal activities. The result: a shocking 100% attack success rate. DeepSeek failed to block a single harmful request.” Security firm Adversa AI confirms the findings. Further studies show a high 14% rate of ‘hallucinations’ (untruths and misunderstandings) in answers.