Step Three: Privacy!
Status Update - More Transports

So far I’ve implemented Dust and obfs3 and I’m putting the finishing touches on obfs2. I’ve also rewritten the pluggable transport API and rewritten the transport plugins to use this new API.

The API now provides 4 events: start, encodedReceived, decodedReceived, and end. The plugins implement callbacks for these events as well as an __init__(self, decodedSocket, encodedSocket) method that provides virtual sockets to the plugins. The virtual sockets have read and write methods. Control is transferred to the plugin when one of the event callbacks is called, in at which time the plugin can read and write data to the socket buffers. When the callback returns, control is transferred back to the framework, which takes care of reading and writing between the buffers and the actual sockets.

The next step is testing of all of the plugins and documenting the plugin API. In particular I expect that the python obfs2 implementation will not work with the C obfs2 implementation due to undocumented assumptions in the protocol implementation, so these will need to be worked out.

Status Update - Testing and Transports

The last two weeks were spent getting the code in better shape, fixing bugs and streamlining so that I could get to the point that I could actually write some transports. Now it’s transport implementation time. I’ve written a Dust transport, using the most barebones version of Dust, which is basically just an ECDH handshake and Skein encryption. I’ll be adding the rest of Dust over time. This barebones Dust implementation was really just to lay the foundation to start working on obfs3 and then obfs2. It was easier to start with Dust because I already have the code and it works. Next I’m bringing in an AES implementation for obfs3 and obfs2 to use.

The schedule has changed somewhat in that I an now going to be focusing on obfs2 and obfs3 as the main protocols. This is going to be easier because they both follow the pattern of a handshake followed by encrypted bytes. Both full Dust support and flash proxy were going to require a more complicated API that should probably wait until the basic API is working.

Status Update - Refactoring

There was a slight change in the schedule in that I moved refactoring up in front of testing and debugging.

The major refactoring effort was in splitting the project into two parts. The library for implementing the pluggable transport specification, with environment variable parsing and printing of pluggable transport protocol lines to stdout is now all that is included in pyptlib. The framework and obfsproxy command line program replacement have been split off into py-obfsproxy.

A high-level API has also been added to pyptlib which requires less knowledge of the spec. It automatically parses environment variables and generates protocol lines. The transport implementation just needs to get the list of supported transports and then report success or failure for the launch of each transport as well as the end of the transport launching phase. Examples have been included in pyptlib for how to use the high-level library and py-obfsproxy has been ported to the high-level library as well.

All of the code has also been reformatted according to PEP 8 guidelines. Private methods have also been renamed to use the __foo syntax as specified in the python style guidelines.

This week the focus is going to be on testing the framework and obfsproxy command line replacement to get the dummy protocol working. One everything is working the focus will shift to documenting the now hopefully stable pyptlib API.

pyptlib: http://github.com/blanu/pyptlib

py-obfsproxy: http://github.com/blanu/py-obfsproxy

Status Update - Schedule and Framework

Here is the proposed schedule for the project:

Week 1 - pyptlib.config API draft
Week 2 - pyptlib.config implementation
Week 3 - pyptlib.framework API draft
Week 4 - pyptlib.framework implementation
Week 5 - pyptlib.transports example implementations (dummy, rot13) and command line options
Week 6 - testing and debugging of whole system
Week 7 - Refactoring, cleaning, and documentation update
Midterm evaluations
Week 8 - Dust
Week 9 - obfs2
Week 10 - flashproxy
Week 11 - Transport refinements and enhancements
Week 12 - Testing, debugging, refactoring, cleaning, and documentation update
Pencils down

We are currently finishing up Week 4 and I have a rough draft of the framework in place. It will probably need a lot of refinement as I start implementing actual transports. I’ve also implement a pluggable transport manager in python which can stand in for tor when doing testing. So far everything seems to be working fine. Next week I look forward to getting some actual traffic obfuscated using the example plugins.


Status Update - pyptlib

Deliverable #1 for this summer’s project is this:

“A library for parsing pluggable transport configuration options

This will be a python library that authors of SOCKS proxies can use to integrate their proxies with Tor.”

A first pass at this is available from github and pypi.

I tried to follow the spec very closely in developing the API.

Next steps for the library are testing, improvement documentation including pydoc strings, and error checking for higher level protocol conformance, for instance the order in which proxy reply lines are output.

After that, it will be time to use the library in building the pluggable transport framework.

New GSoC 2012 Project! Pluggable Transports in Python

Here is my introductory post to tor-dev announcing the new project:

Some information about me:

I worked for EFF/Tor Project last year for GSoC 2011, my project was a blocking-resistant transport evaluation framework: https://gitweb.torproject.org/user/blanu/blocking-test.git

I am also the author of a pluggable transport written in python: https://github.com/blanu/Dust/tree/master/py/dust/services/socks2

I’ve been working on censorship resistance technology since 2001. Here are some of my projects:
http://blanu.net/Dust-FOCI.pdf
http://blanu.net/BayesianClassification.pdf
http://blanu.net/Arcadia.pdf
http://blanu.net/Freenet2001.pdf

Some information about the project:

The overall goal of the project is to make it easy for pluggable transports to be written in python. There has been a lot of interest in doing pluggable transports in python, but currently they are all written from scratch. For C transports, obfsproxy can be used to do a lot of the heavy lifting, making it relatively easy to write a new C-based transport. I’ve heard there is also a port of obfsproxy to C++. A the author of a python transport, I am of course an advocate of writing transports in python. Fortunately, so are some other Tor folks, so soon it will be easy to write python transports!

The deliverables for this project are as follows:

A library for parsing pluggable transport configuration options

This will be a python library that authors of SOCKS proxies can use to integrate their proxies with Tor.

A framework (both server and client-side) for writing pluggable transports in python

The framework will provide a SOCKS proxy server already integrated with the pluggable transport library. All the protocol author will need to do is provide the obfuscation and de-obfuscation functions and a main function to do command line parsing and call the framework.

A python implementation of the obfsproxy command line tool

This will be a command line program using the framework that will accept the same command line options as the existing obfsproxy tool. It will support the selection of an obfuscation function, although not all of the protocols currently supported by obfsproxy will initially be available in python.

A python implementation of the obfs2 protocol implemented as an obfsproxy module

The obfs2 protocol will be implemented as a plugin for the framework and made available to the command line tool.

Conversion of Dust to an obfsproxy module

The Dust protocol will be implemented as a plugin for the framework and made available to the command line tool.

py2exe packaging for obfsproxy

The command line tool will be packaged into a standalone executable for Windows.

Optional deliverables if there is sufficient time: obfsproxy modules for other protocols, experiment with other packaging systems


Current status:

I’m working on a spec of the API for the option parsing library. It should be available soon.

Google Summer of Code Final Status Update

Last week I said that I was going to work on finishing up some graphs this week, but I decided that was not the best use of my time. Graphs are cool and interesting, but I decided instead that I should work on transitioning to doing things useful to the general Tor developer community. So instead I decided to try to work on closing bug #2860 using the blocking-test framework. The bug asks for research to be done on connection patterns of Tor connections. In order to close this I need to add Tor (without obfsproxy) capturing and a SYN/FIN/RST packet counting detector, as well as some new traffic generation tasks. So this is my new goal for the first post-GSoC task for the project.  One fortunate thing is that we just received about 1GB of traffic data for Tor and SSL connections, so we’re currently working on parsing that into the blocking-test format and doing some analysis on it. Actually, with this data I may be able to close bug #2860 by just writing a SYN/FIN/RST detector, so that will save some time.

So Summer of Code is now over, but the project continues as a part of the greater Tor project. Hopefully it will be useful for the Tor community.

Google Summer of Code Penultimate Status Update

Scoring and reporting for detectors and encoders is pretty much done. The graphs aren’t quite ready yet, for aesthetic reasons, but I have some nice tables to report:

Length detector: 16% accuracy
Timing detector: 89% accuracy
Entropy detector: 94% accuracy

SSL was correctly identified 25% of the time, 70% of the time other protocols were misidentified as being SSL, and 5% of the time SSL was misidentified as another protocol.
obfsproxy was correctly identified 55% of the time, 10% of the time other protocols were misidentified as being obfsproxy, and 35% of the time obfsproxy was misidentified as another protocol.
Dust was correctly identified 48% of the time, 4% of the time other protocols were misidentified as being Dust, and 48% of the time Dust was misidentified as another protocol.

obfsproxy is distinguishable from SSL 96% of the time.
obfsproxy is distinguishable from Dust 98% of the time.
Dust is distinguishable from SSL 56% of the time.

I’m sure these numbers are confusing as several different sorts of things are being measured, all with similar numbers. It is probably quite unclear how all of these percentages add up. I think this is where the graphs will help. In particular I think it’s hard to visualize accuracy when there are both false positives and false negatives to take into account.

I think the most interesting result here is the entropy detector. It’s the simplest and also the most effective. Not only that, but the entropy detector only looks at the entropy of the first packet, so it’s inexpensive. The original reason that I implemented this detector was that there was an argument against Dust that it could be trivially filtered by setting an entropy threshold above which all traffic is filtered. Surprisingly, Dust has lower entropy than SSL, so this attack will not work to specifically target Dust. This is unexpected because Dust uses totally random bytes, whereas the SSL handshake does not. This is a result of a general issue with how Shannon entropy is defined. Shannon entropy is normally defined across a large statistical sampling, in which case all three protocols tested will converge to maximum since they are all encrypted and therefore apparently random. However, the attacker does not in this case get a large statistical sample. Since they don’t know a priori which connections are Dust connections, they only get however many packets are in each trace to use as samples. The current version of Dust that I’m testing has most of the advanced obfuscation stripped out as I haven’t had time to add it back in after completely reimplementing the protocol to use TCP instead of UDP and all of the modified assumptions that come with that change. So this version of Dust is basically just the ntor protocol with no packet length or timing hiding in use. The first packet is therefore 32 random bytes consisting of the curve25519 ephemeral ECDH key, as compared to the 1k first packet of SSL. This is the reason for the low entropy calculation, because given only 32 bytes to sample almost all possible bytes are going to have a probability of 0 or 1. There is an upper bound on entropy when sample sizes are small. Achieving maximum first-order entropy here would require at least 512 bytes so that each value 0 to 255 could occur exactly twice.

There are of course many ways to measure entropy and you could say that I’m just doing it wrong. Some alternative measurements have already been suggested that I might implement as well. I am not suggesting that Dust is in any way immune to entropy-based detection. In fact, Dust is currently detected very well by the current entropy detector (it passed 100% of the packet length and timing tests, so all detection of Dust was done by the entropy detector) precisely because it has strangely low entropy because of it’s small first packets. I bring this up only as an interesting example of how intuition can fail when trying to use Shannon entropy to measure individual messages instead of channels.

As far as the state of the project goes, I think the project can be considered done. Looking at the project documentation, everything didn’t work out exactly as originally planned, I think it all worked out for the best and the project is where I wanted it to be by the completion of Summer of Code. I have generation of HTTP and HTTPS traffic, both over Tor and plain. I have two encoders: obfs2 and Dust. I have three detectors: length, timing, and entropy. The string detector didn’t work out, but the entropy detector worked out better than planned, so on balance I think it was a win. I didn’t implement all of the detector modes because upon further reflection it doesn’t really make sense to just try all of these different modes. You want the mode that works best. So for packet lengths and timing that is full conversations and for entropy that’s the first packet (could be first N packets, but I tried N=1 and it worked great). Random sampling of packets just seems like a way to limit your success at this point when non-random sampling of just the first packet seems to be so effective. All of the utilities have been written, and more. I have the specified utilities for separating the pcap files into streams, and extracting the strings, lengths, and timings. Additionally, I have some utilities for extracting the entropy of streams, tagging the streams with what protocol they are using (by port), and extracting dictionary and corpus models for latent semantic analysis. I also have some utilities for graphing distributions of properties of different protocols which I hope to finish up next week during the wrap-up period.

Looking to the future, I have changed my goals somewhat from what I had in mind at the beginning of the project. After having attended the TorDev meetup in Waterloo and the PETS conference, I am much more interested in integrating my work with the Tor project instead of just testing the various protocols that have been proposed in order to see how my own protocol stacks up against the competition. As part of this transition, I have rewritten Dust from scratch, both in terms of implementation and in terms of the protocol, in order to be better suited for use as a pluggable transport for Tor. My goal for Dust is to have it shipped with both Tor clients and bridges so that it can be used in the field to contact bridges despite DPI filtering of Tor connections. My goal now for the blocking-test project is to focus on blocking resistance for Tor. So instead of adding all of the protocols under the sun, I’m going to concentrate on protocols that have been implemented or are under consideration to become pluggable transports. Additionally, the next protocol I’m going to add after GSoC is over is just plain Tor. There is at least one bug filed about a suspicion that Tor can currently be fingerprinted based on some packet characteristics and I’d like to be able to close that bug because we now have the capability to easily generate the necessary information.

Finally, I would like to offer some insight into what I think is the future of blocking-resistant transports now that I have done some work trying to break them. Fundamentally, the only thing an attacker can do is limit the bitrate of the connection. Given any sort of channel, we can encode arbitrary information over that channel. However, the smaller the channel the slower the bitrate. The goal of encodings should be to maximize the bitrate given the constraints of the censor. It’s easy to develop very slow encodings such as natural language encodings over HTTP. However, in order to maximize the throughput you need to have a good definition of the attack model. All encoding for blocking-resistant purposes lowers the efficiency above just transmitting the normal content (assumed it has already been compressed). Therefore all encoded conversations are going to be longer than unencoded conversations. I think this is the future of writing detectors because it’s a fundamental constraint on encoding. I think this should even be able to defeat something like Telex (although let’s not get into the details of attacks on Telex specifically, I’m talking about a general idea here). Given a (static, for the sake of argument simplicity) website, I can download every page on that website. Then I can compare the length of intercepted downloads and detect when they don’t match up. For dynamic websites you can do the same thing with a statistical model. I think, therefore, that the future of blocking-resisting protocols is to encode single logical connections over an ensemble of multiple actual connections. Ideally these connections would mimic normal traffic in terms of number of connections, duration, directional flow rates, etc.. It’s something to think about anyway. It could be an interesting problem.

Post-TorDev Status Update

With the TorDev meetup and everything it’s been a while since I sent out an update on the state of the project, although I think we more or less caught up on the status at the meetup. The bulk of the implementation work had already been completed, so what I’ve been working on since then is just cleaning up the code and rewriting some parts. In particular this week I rewrote the detectors to be mostly python. The core Bayesian model is still in BUGS/JAGS and there is still R code to interface with that. However, the output of the models after training is now converted to a CSV file and the actually detection code is a python program which reads the CSV file with the trained model’s prediction data and the compares it to the data of a stream to be classified. This means that you only need R for training the models. If we can put together a canonical model for each protocol then you can use it to classify new traffic just using python. I think this is a win because R was the bulkiest dependency. It also segments the process more nicely in that there can be some people gathering traces, other people building models, and then other people running detectors against new traces.

In terms of code cleanup, I’ve been adding comments and refactoring large files (in particular pavement.py) to be more readable. I also have the scoring tasks implemented so that, given the output of the detectors, the various detectors and encoders are scored based on their number of correct guesses, false positives, and false negatives. The only functionality left to implement is some sort of reporting, probably in the form of graphs of the scores so that detectors and encoders can be compared. The rest of the work is just going to be refining everything and getting both the workflow and the code structure to be smooth.

Also, I’ve stopped working on the string detector because I realized that it’s irrelevant for the two encoders I actually have support for right now: obfs2 and Dust. They both already use totally randomized packet contents and so will not be vulnerable to this kind of detector. Other encodings will be vulnerable, but the string detector can wait until we support these encodings. So instead I’ve implemented an entropy detector, which should work very well against both obfs2 and the current implementation of Dust. The entropy detector is actually the easiest model so far as the entropy of a trace is just a single number, so that’s convenient.

Week #7 Status Update

This week I worked on finishing up the detectors. The main change is that they now output CSV files with the results. I also discovered some very interesting things in my research this week:

I’d like to include the HTTP and PERSEUS transports, but I guess we’ll see if time permits.
The pluggable transports spec I thought I had read before, but I totally missed some key elements, mainly that full Dust (including invites) is possible with the existing spec. This has renewed my enthusiasm for getting Dust working as a transport now that I realize it can exist fully on par with obfsproxy without having to be rewritten in C. I’ve started work on a new TCP implementation as the old implementation was biased towards UDP and this made it difficult to get the SOCKS proxy working well.
I’d like to fulfill those trace requests by adding support for Chromium traffic generation. The hard part is actually finding appropriate websites, both plain and JS-intensive that allow access over HTTPS without requiring a login.

Next week I’m going to implement scoring of traces and I think also some sort of scoring of detectors. I’ve noticed, for instance, that for whatever reason my “bag of words” detector doesn’t seem to be working very well. I will attempt to fix it, but it would also be good to have some way to automate evaluation of how well different detectors work. I have a start for this in the validateTimings.r file, where I just run the detector on the training data and see how well it does then. I just need to tie this into a scoring mechanism so that it just gives one number for each detector that I can look at to judge overall goodness of fit. I think this will probably just be an accuracy number, perhaps with # of false positives and # of false negatives available upon request.