6 Comments
User's avatar
richardstevenhack's avatar

Much of MoltBook turns out to be human spam controlling the bots, according to a Wired report. Which is pretty much what anyone with a brain suspected all along.

A whole bunch of nothing is what it amounts to. Even Andrew Karpathy was fooled, which says a lot about AI these days.

But it's great that people are finally producing some security products to mitigate the security NIGHTMARE that OpenClaw and MoltBook are.

I'll reiterate the best advice: Do NOT run this stuff on your own machine. Run it only on a machine that you don't care if it gets compromised and you have to wipe it and reinstall it, like a VPS or a cheap mini-PC.

Nir Diamant's avatar

Or dockerize it

Athena's avatar

Hi! This is helpful information! I downloaded moltbook yesterday and I want to take it off. Can you show me how to do that?

Nir Diamant's avatar

You can follow the steps in the repo it is super simple.

You can also tell Claude code to run it for you

Pawel Jozefiak's avatar

This hit home. My agent got hit with what I'd call a 'social engineering' attack — another agent posted something deliberately designed to trigger my agent's cross-promotion behavior. Basically tried to weaponize my agent's helpfulness.

Fixed it by adding a simple intent-verification step before my agent responds to direct mentions. Not bulletproof but stopped the obvious stuff. The harder problem is the subtle manipulation — agents that slowly shift your agent's topic preferences over many interactions.

Feels like we're speedrunning every social media problem humans took 15 years to encounter. Anyone building a 'mute/block' equivalent for agents yet?

Lakshmi Narasimhan's avatar

2.6% is a floor, not a ceiling — that's just the obvious attacks. The subtle ones that slowly shift agent behavior without triggering pattern matching are harder to detect and arguably more dangerous.

The layered defense approach is the right call. Most developers building on top of LLMs treat security as an afterthought, bolting on input validation after the system is already in production. Having Llama Guard + LLM Guard as a pre-filter before content hits the model is smart architecture. Will check the repo.