Darwinian Security - Evolving the PWN
Lately I’ve been thinking a little bit about experimenting with the amalgamation of Artificial Intelligence algorithms and different aspects of Information Security. I’m hardly an AI expert, but I’ve had some limited academic exposure to the foundation of a few branches of AI research. I think the important thing is that I’ve seen enough of the material to get my mental wheels spinning.
I’ve thought of two specific concepts I think deserve exploration when time permits. I’m posting my interim thoughts here about one of them mainly to garner some constructive feedback and hopefully some resources. I’m an info sec enthusiast, but certainly not an expert. It would be a great help if I could try finding some related journal publications. Due to the… more obscure… nature of the areas of study I’m also interested in any “less formal” (see: ezines, blogs, mailing lists) research that may relate.
Genetic Algorithms and Protocol Fuzzing
Having explored a so called “vanilla” genetic algorithm I couldn’t help but mentally combine the idea of the algorithm with the concept of “Fuzzing”. For someone evaluating the quality and security of a piece of code, having a tool capable of generating malformed, incorrect and otherwise invalid input in an automated fashion is a great asset. There are fuzzing applications and frameworks available for a variety of protocols from HTTP to FTP that can accomplish this task.
By introducing a genetic algorithm to the Fuzzer we can theoretically add some brains so that it is generating malformed data that leads to areas likely to break the application in interesting ways. To be able to see how it might be a plausible approach, we need a little bit of theory on genetic algorithms.
Genetic Algorithms
Genetic algorithms (like the name implies) draw their inspiration from the biological world around us. We use software modeled after a simple approximation of Darwinian evolution to “evolve” our way to a solution for a computationally difficult problem. The core or our evolution is built by the following three factors that Darwin outlined as being requirements for evolution:
- Reproduction with inheritance
- Variation
- Selection
Reproduction with inheritance allows us to take two solutions that were promising but not the optimum and use them as genetic ancestors for a new solution that will inherit their traits. Variation helps us zero-in on a solution space to ensure we aren’t missing “local” optimums as we jump around from inheritance. Finally selection is the method by which we determine which of our population are the “fittest” and should be used to continue the evolutionary process.
To accomplish these three things we need to model our problem in terms similar to the biological constructs we use when discussing real genetics. My aim isn’t to establish an exceptionally in depth introduction to GA’s, so I’ve taken an approach my instructor did and summarized the biology terms with their algorithmic counterparts.
| Evolutionary Term | Algorithm Equivalent |
|---|---|
| chromosome | list |
| gene | a problem feature in the list |
| allele | the possible values of a problem feature |
| locus | the location of the feature in the list (index) |
| genotype | the structure of a solution |
| phenotype | the decoded structure |
The basic idea is that for an arbitrary sized population you start with a random set of chromosomes. Each chromosome is a list structure, with each index in the list being a specific locus. Every locus stores a gene. The possible values this gene could take are the alleles for that locus. Genes and alleles are problem specific and will be discussed later.
Once you’ve established this random population you can apply your selection technique to determine which of your random chromosomes is “best” (again, defined by the specific problem you are trying to solve). In my experience this selection operation was defined in terms of a heuristic function that was able to give a relative score to each chromosome. The higher the score, the more “fit” that individual was.
After determining fitness, you are able to continue the selection process by using some strategy to pick a number of the most fit individuals for reproduction. With these “donor” chromosomes in place you randomly apply combinations of crossover, mutation and replication to create new child individuals.
This description of a GA is overlooking a lot of details such as the strategy used for selection, the role of elitism, how children should be merged into the population, how “stale” genetic material is livened up or discarded, etc. This article was meant to be more of an abstract thought dump than a recipe, so actual implementations will need to consider these factors on their own.
Intermud Fuzzing
Before I go any further I want to make something clear… This entire concept is not intended to be used on a public network of servers and MUDs that aren’t under your direct control. An enclosed network with test installations of various libs would be the only ethical way to explore these ideas. There is nothing subtle about these techniques and they are designed to cause the most friction and jarring input possible.
For a basic example of what I had in mind I thought I would pick a protocol that I know inside and out. Intermud refers to a roughly defined set of protocols used for connecting several MUD instances to each other. The version I’m familiar with is usually colloquially called Zebedee Intermud after the MUD it was created on; Zebedee. I’ve most often seen it credited to Nostradamus@Zebedee and Alvin@Sushi but it’s so old and informal that it’s hard to tell.
Zebedee Intermud is also called “I2” Intermud. It’s design makes it a brilliant choice for a fuzzing experiment for several reasons:
- It’s a very loose standard. The doc’s describing it7 are very vague and any given MUD implements different parts of the document in subtly different ways. There is an original implementation by the authors that has floated around the LP MUD world for over 10 years being tweaked and modded by coders of different experience/abilities.
- It’s a simple human readable protocol. The protocol itself is a very simple HEADER -> value string jammed into a UDP packet. This header to value mapping easily translates to a gene to allele mapping for a chromosome string.
- As mentioned, it’s UDP based. This means there is very little overhead involved with sending a large number of Intermud requests.
Zebedee Protocol Details
For illustrative purposes I’ve broken down the core parts of the I2 protocol into the following tables.
The first table describes the core headers that every I2 daemon6 is expected to understand.
| I2 Header | Value(s) | Mandatory? |
|---|---|---|
| NAME | Sending MUD’s name | Yes |
| UDP | Sending MUD’s UDP receive port | Yes |
| ID | Unique incrementing packet ID. | If reply needed |
| HST | IP address of sending MUD. Autoset by daemon | N/A |
| REQ | Type of request the packet is | Yes |
| PKT | Special packet extension header | For large messages |
Note: The “Unique” ID is only unique between two given MUDs. I.e. if MUD A sends a packet with ID 4 to MUD B then it’s assumed MUD A will never send another ID 4 packet to MUD B again. It’s not unique across I2, however most MUDs will just have one counter that gets incremented and used for all outgoing packets.
This second table describes all of the documented REQ types and their sub-fields. As soon as you start working with these req types and their associated field headers you are going to find mixes of support.
| REQ Type | Field | Value(s) | Mandatory? |
|---|---|---|---|
| channel | CHANNEL | The channel1 to broadcast on | Yes |
| channel | DATA | The message to broadcast | Yes |
| channel | CMD | The type of message. “”, “emote”2 “history”, or “list” | No |
| channel | EMOTE | 1 if the message is an emote, 2 for a “gemote”3 | No |
| finger | DATA | The player to query information about | Yes |
| locate | user | Player on sending MUD who sent the request | Yes |
| locate | vbs | Verbosity. 1 or 2 | No |
| locate | fnd | Set to 1 on success or 0 otherwise in locate reply | Yes (in reply) |
| locate | DATA | The player to find the location of | Yes |
| man | DATA | Which manual page to fetch | Yes |
| … | Too complicated to document presently | N/A | |
| ping | N/A | Just standard fields for a ping | N/A |
| query | DATA | Values include: commands, email, hosts4, inetd, list, info, mud_port, time, users, version, www) | Yes |
| reply | DATA | String response to a packet | Yes |
| reply | RCPNT | Who the reply is destined for on the target MUD | For directed replies |
| reply | QUERY | Response to a query | For query responses |
| reply | vbs | Verbosity of reply | When VBS requested |
| reply | user | User who sent the packet | No |
| reply | fnd | Status of a locate request | For locate responses |
| tell | RCPNT | Who the tell5 is destined for | Yes |
| tell | METHOD | “emote” for emote tells, “gemote” for gemote tells. | No |
| tell | DATA | The message to be sent as a tell | Yes |
| who | DATA | Additional switches for the who command | No |
Some example packets might help clarify things. Firstly, as mentioned earlier the packets are a string of header & value pairings that roughly fits the format below:
header1:body1|headerN:bodyN|DATA:body-data
There are two delimiters in use, the “:” character for splitting a header and it’s value, and the “|” pipe character for separating header/value pairs. By convention the DATA header is always last in the string. The idea of this convention is that then the DATA value doesn’t need to have any escaping done for the separator characters. Implementations are to treat anything after the DATA header as raw text.
A typical packet to send a message to the global intermud channel would look like:
channel:intermud|SND:Paradox|NAME:TestMUD|UDP:4242|REQ:channel|DATA:Hello World!
This packet tells the story of a user named Paradox on a MUD identifying itself as “TestMUD” sending a request to other MUD’s to display “Hello World!” on their intermud channel.
As mentioned in the first table, there is also a special PKT header used for messages that need to be sent across several UDP packets. It forms the basis of a very simple control protocol built on top of the “spray and pray” nature of UDP. The PKT header is by convention always the first header in a packet and follows this format:
PKT:mudname:packetID:packet#/totalPackets|regular intermud packet string
The basic idea here is the PKT indicates it’s a multi-part packet. From there you identify what MUD it should be cached for, what the packet’s unique ID is, and what packet it is in the sequence. This # out of total # of packets bit is important since they could arrive in any order and we need to be able to tell when we have them all. The other normal packet headers follow the PKT header after the standard “|” delimiter.
So what can we take away from this strange protocol? Well, besides the fuzzing benefits mentioned above you’ll notice we have very flexible composition rules. Besides the PKT and DATA headers, we can combine any number of header/value pairs in any order we want — even for valid requests.
Genetic Fuzzing
With the core concepts and the protocol details out of the way I can finally describe how I imagine a genetic fuzzer for this protocol could operate. From there it will hopefully be possible to extrapolate the techniques to a more practical real world example. I’m going to explain my theories by categorizing them into their respective areas of the GA process.
Chromosome/Problem Representation
The chromosome for this protocol is simple to imagine. The genotype will be an arbitrary length list of headers and values. The phenotype of this list will be the string composition of these headers as combined by the I2 rules. This phenotype can be blasted over a UDP socket for the evaluation routines. Our genes will be the different header types, and the alleles are the values that the headers can take. I suspect the main difficulties will arise from the fact that I’m bumping heads with two things I’ve never considered in a GA implementation:
- The chromosome isn’t a fixed length. Previously I’ve worked on a GA for finding Traveling Salesman Paths, which by the nature of the problem are fixed length because every solution must visit every city precisely once. An arbitrary length chromosome may introduce complications to the crossover and mutation stages.
- The “legality” of the chromosomes generated needs to be configurable. The GA chromosome for a problem like the above mentioned TSP forces you to forbid any illegal chromosomes. The goal of a fuzzer is contradictory to this preconception. Ideally we’ll be able to generate only legal packets, only illegal packets, and a broad spread in between. This could mean illegally delimited packets, packets missing information, packets with too much information, etc.
Crossovers and Mutation
As mentioned, the “legality” we are comfortable with for this application gives us a good deal of flex room in this department. We don’t need to carefully select crossovers and mutations that preserve the legality of parent chromosomes. A custom crossover scheme that was able to preserve or disturb this legality in various ways in a stochastic fashion would likely be the best idea.
Again because of the fuzzer nature of the project it’s possible to even experiment with crossing alleles into different genes. I.e. if a parent packet had a CHANNEL:intermud gene it could be mutated into a CHANNEL:45 gene if the allele from the ID gene were mashed into the CHANNEL locus. Because we are aiming for bad data we make terrible leaps any one else working with GA’s would likely cringe at.
Evaluation and Selection
The evaluation scheme for this problem is another tricky spot. I have a lot of ideas about heuristic functions that could be used for interesting effect. The main difficulties I foresee are related to the networked aspect of the problem and side-effects from fuzzing. That is, we need to send a packet and get it’s response to be able to apply any especially interesting heuristics. Further, because we are sending potentially illegal packets with the express intent to have weird things happen we need to account for our destination not responding to that packet (or any further packets for that matter).
The heuristic ideas I’m throwing around in my head are currently:
- Primary heuristics:
- Response validity. If we send a packet and get an illegal intermud packet back, that’s an interesting phenomenon. We should certainly be interested in that and promote the packet’s heuristic value.
- Response time. The longer it takes a server to respond, the more likely it is that our packet was making it do something wonky. In this case we do get a response and are just looking at the time it took.
- Response existence. If we don’t get a response then it could mean any number of things. The most likely however are two possibilities: either the MUD threw our packet out silently or the MUD crashed. We should try to differentiate between the two. A silent packet discard is worthless, but a crash packet is a very important result.
- Secondary Heuristics:
- Packet length. The shorter a packet is, the better. Ideally if we have two packets that both are heuristically interesting based on the notes above we want to shorten them while keeping the score relatively static. The idea here is that if you’ve mashed together a bunch of headers in potentially illegal ways and managed to get interesting feedback you are going to want to zero-in on the specific parts of the packet that cause it. Anything we can remove without changing the “interest” level of the packet is junk data obscuring the real results.
- Key-word bonuses. We could likely also offset a packets heuristic value by looking for key words in the response the packet generated. Mentions of the word “error” or “invalid” should affect the score for the packet positively. Some words might decrease the value of the packet.
Because of the incompatibilities of the various protocol stacks and the network based nature of the problem some interesting parallelisms could be exploited. For instance, if every packet that was generated was sent to several different intermud stacks at once then we could compare the results and note differences in the heuristic score. A packet that generates high fluctuation and deviation in the scores indicates that we’ve found a corner case that is poorly handled by the software.
Conclusion
I realize that Intermud, and specifically Zebedee intermud is near worthless in terms of practical applications. In the MUD community I2 is already considered archaic and a poor solution (thus the appearance of routed I3 intermud). My idea of describing it here was just to illustrate the overall idea of what I’m thinking about.
I’m hoping to have time to pursue this idea properly, but who knows when that will be. For now I’d love to hear feedback from anyone that thinks they can offer suggestions, advice or constructive criticism. I apologize to any specialists in the area if I’ve offended you with my naivety.
- Dan
Endnotes
1 Channel – Similar to the IRC Concept of separated chat rooms on a server. Channels are communications streams that listeners on a server may enable/disable/join/leave based on the lib features.
2 Emote – a message in the third person. I.e. If the user’s name were Paradox, a typical emote would be similar to “Paradox wishes he weren’t lagging”. Similar to /me in IRC
3 GEmote – I have no idea what a gemote is.
4 The hosts request is of particular importance because an intermud MUD needs to bootstrap itself by requesting a MUD list from some other MUD to find a list of UDP destinations it should spam for global intermud packets. This system is the reason that intermud 2 is so fragile and ad-hoc. You generally have to query the hosts list of several MUDs and merge the responses into a single list. Many MUDs will then cache this.
5 A “tell” is the equivalent of a “whisper” or “private message” on more traditional chat systems. A 1 on 1 communication between two users.
6 A “daemon” in MUD lingo is essentially a background process. Most drivers are single threaded so it doesn’t align with definitions of daemon that you may find in places like the Java thread documentation.
7 Zebedee Intermud protocol documentation. man intermud and man intermud.basic
Transmissions:
← JRiver Advanced High-speed note taking with Textile →

Dear Sir,
You are a brilliant!
I agree with everything you said because I think it is fantastically innovative!
I see good thinking here. Keep it up. That’s an order.
Sincerely,
GlamGothGirl
— GlamGothGirl · Feb 24, 03:09 AM · #