Hey guys, member of the (currently unverified) third place team, Shellphish. If anyone has any questions, I (or another member of my team) would be glad to answer them. We'll also be giving a talk at DEF CON on Sunday after the CTF ends, where we'll be open sourcing our CRS!
Can you explain how this particular CTF work and how the system in general work against adversary? The article said insecure code and code filled with bugs are constantly being fed to the system. I don't really get it.
I hope someone more knowledgeable can chime in, but AFAIU, each player acts as the manager of a certain set of services, and as an attacker against all the others.
Such services contain bugs, so what each player must do is identify the bugs, fix them or mitigate them, and at the same time exploit them to gain access to the boxes of the other players.
So basically the programs in the competition do
* vulnerability identification
* vulnerability mitigation
* identification of the best target to attack (presumably based on the first thing, not sure if other things factor in)
First of all, congratulations for the awesome work. Do any of the components of your CRS make use of machine learning techniques? I read somewhere that mayhem uses deep learning but I'm not sure how exactly that would work in a program analysis scenario. I am assuming you used some form of symbolic execution (Edit: just realized it's angr, which is often useful in CTFs). How different was it from other general purpose SE systems (Klee etc)? Did you use any formal methods too?
Is this both automated defense and offense via machine learning, or just automated defensive systems? If it includes automated offensive systems, what's to keep these kinds of systems from jumping outside of their sandboxes and compromising the outside world?
I'd love to learn more about the techniques actually being used in thse systems. Any good pointers to some scientific papers or review articles on the subject? I have a background in machine learning so am comfortable with technical papers.
If you mean AI in the sense of neural networks, Bayesian inference, etc., absolutely none in our CRS :) In retrospect, we could have made some better decisions about when to patch by using some of the simpler "AI" methods, but in terms of the actual core exploiting and defending, there's not much research into using AI in security.
It's funny that Brumley's first-place-winning robot CTF team is going to be competing against his first-place-winning human CTF team at DEFCON.
The DARPA team is headed up by professor David Brumley. He also leads the Carnegie Mellon CTF hacker group PPP (Plaid Parliament of Pwning) that often wins at DEFCON's CTFs.
This article mentioned that the Mayhem robot is going to be battling the human CTF players at DEFCON. I wonder who he'll be rooting for.
I just came from a full day of talks at DEF CON, and a highlight for me was how the CGC servers were all lit up on stage behind the speakers of one room of the con [1]. It was incredibly stylish and impressive.
This was a really amazing competition. Imagine running symbolic analysis and fuzzing like integration tests as part of a deploy process, then having fixes proposed algorithmically when a vulnerability is discovered.
I thought that the production of the competition was extraordinary. Seeing everything lit up on stage was straight out of a movie (in a good way). I thought that the event itself at Defcon was super weird, though. A lot of people, myself included, assumed that the event was going to be more real-time. In reality, the servers had been competing for hours already.
That being said, huge props to these amazing teams. It was so fascinating to see how each system reacted to the same situations and then either hunkered down to protect itself or go on the offensive. Really amazing stuff.
I tried browsing the Darpa challenge's website to know more, but I couldn't find any information. Could someone please post a link to a detailed description of the challenge?
It is basically computers playing Capture the Flag (CTF) against each other. They are given binary programs with security flaws. They need to identify the flaws automatically and develop a patch for their own system. At the same they go out to crash the other teams. Normally humans do this, but the darpa challenge was to have computer systems do it autonomously.