New branch underway
I’ve begun work on a new branch of DistribuStream, which should not only be substantially faster (thanks in part to the much faster Ruby 1.9 virtual machine) but will also feature a new, simplified, and far more powerful algorithm for routing traffic on the network. The present algorithm looks at certain attributes of peers, such as history of successful transfers versus unsuccessful transfers, estimated bandwidth of a given peer, and network prefix. The new algorithm does away with all of that, and analyzes the history of every single peer on the network to predict which peer is optimal for a given transfer. The new branch will also support live streaming content as well. The delivery process for live streaming content is virtually identical to static files, and requires only a small change to clients in order to operate.
DistribuStream has taught me a lot about the limitations of Ruby, particularly the garbage collector. DistribuStream is a long-running process which has to keep lots of network connections open, which is a problem because unfortunately Ruby lacks a compacting garbage collector. This means that the DistribuStream server’s memory usage grows over time because the heap becomes fragmented. This is the same problem which leads to Firefox 2’s nasty memory usage over time. It’s not the same thing as a memory leak, as memory is being properly freed. The problem is more like objects in memory get more and more spaced out as time goes on. There’s still a lot of free memory, but it’s stuck between objects and can’t be used in typical cases.
Ruby 1.9 does not fix this problem with the garbage collector, and the new branch of DistribuStream will have similar problems with memory usage over time. But don’t fear, there is hope!
Rubinius is a project to develop a next generation Ruby virtual machine with all sorts of snazzy new features, one of the most important to DistribuStream being a compacting garbage collector. This will solve problems with DistribuStream’s memory usage constantly growing. Rubinius also has many other features which DistribuStream will leverage, including support for scaling across multiple CPU cores. DistribuStream’s new algorithm for handling traffic routing will be both CPU and memory intensive, and Rubinius will provide the underpinnings for leveraging all of a system’s resources in order to accomplish this.
If you’re interested in using DistribuStream now, the existing branch is stable and production ready. Just be sure to keep an eye on its memory usage…
Posted by Tony Arcieri on Friday, May 09, 2008

