So, You Want to Build an AI Hacker
Everyone wants to build the next Ultron of cybersecurity, so, why shouldn’t you? Exercise your right to participate in the arms race to wipeout human creativity and craftsmanship with as cheap as a $30 investment and a few ounces of your sanity.
This is a documentation of my experience building autonomous hacking systems. We’ll start with harness design, learn what worked, what didn’t and finally, the output.
By the end of this post you should have enough clarity to get that VC funding or at least score 90% on X-Bench and self-declare yourself the goat.
Inside your AI hacker are two wolves
There’s a phase by phase linear flow system and then there’s a REPL, loop driven system.
Phased Design
Feedback Loop
But API too costly gang
Yes, if you use OpenAI or, Anthropic Keys you will be spending 600 USD by 14th of the month. But where there’s poverty and curiosity, there’s three special words:
China numba wun
Unlike the mainstream providers, Chinese open-weight models are cheaper and actually quite performant. They are definitely punching above the weight that they charge for. Models like Kimi and GLM are raising the bar with every release and let lightweight pockets like mine experiment without losing rent.
For this system, I used Z.ai’s GLM 4.7. Chef’s kiss model at dirt cheap price.
Disclaimer:
This post is not sponsored by z.ai. Not that I would mind getting sponsored (pls).
Writing the first agent
Okay, now you have the API and the agent design patterns down, let’s write the first agent. On a basic level, this will be a single harnessing agent with multiple sub-agents in the flow. You can’t hack what you can’t see. So first sub-agent will be a recon agent, then we’ll have our threat-modeller agent to theorize threat scenarios in context of the application and finally we have the actual hunter agent to execute those threat scenarios.
Create the agent
import claude_agent_sdk
CreateAgentDefintion():
name = "recon"
description = "Recon agent"
return agent
recon_agent()
Now building the threat modelling agent.
CreateAgentDefintion():
name = "threat_modeller"
description = "Threat modeller agent"
return agent
threat_modeller_agent()
Finally, the attack agent.
CreateAgentDefintion():
name = "hunter"
description = "Hunter agent"
return agent
hunter_agent()
Skill issues
Correlation is causation of better vulns
Mostly automating authenticated scanning
Sweet but wher UI
Memory management
Up until a few years ago, memory management to me meant dealing with malloc and hunting User-After-Frees. Now, it’s making sure your agents don’t get premature dementia.