Tuesday, December 18, 2012

Moving to svbtle

In case you haven't noticed already, I've started blogging on the svbtle network. You can find my new blog at:

http://tonyarcieri.com/

Friday, August 10, 2012

Debunking the Node.js Gish Gallop

A programmer who once a Ruby on Rails enthusiast switches to Node.js and thinks it's awesome, then proceeds to write a blog post about why Node is the bee's knees and Rails is crap. Attention is drawn to the changing nature of web design, from web pages with server-generated HTML to single-page JS-heavy apps written using Backbone, Ember, etc. Stop me if you think that you've heard this one before...

This is an argument I keep hearing over and over, and as far as I'm concerned it's nothing but a Gish Gallop of completely specious arguments, but I really worry... I worry because I keep hearing it over and over, and the fact that I keep hearing it over and over makes me worry that people are actually believing it. I don't know why I keep hearing it over and over. I'm not sure if people are running into problems, reading some of the prevailing "wisdom", and coming to the same conclusion or what. This really makes me sad, because whenever I read the posts like this, I do feel my previous passion for these same ideas, but for me that was half a lifetime ago, and my opinions have changed. I have been down these roads, over mountains, blazed my own trails, and then realized how stupid I was...

How do you defeat the Gish Gallop? I don't really enjoy doing this, but as far as I can tell there is no other way: we must go through the arguments one by one and show why they are completely ludicrous. So here we go...

In case you were confused, Rails is AWESOME for JSON APIs and single page applications

I love client-heavy HTML5/JS apps. I don't want every page on the web to be one, but there are many applications that can benefit a ton from keeping all of their state in the browser. In general: if you can do something without having to go across the network to do it, you will provide a better user experience, bar none.

The primary thing these applications crave are awesome JSON APIs (and Websockets... stay tuned). So why should you use Rails for a JSON API? Isn't Rails designed for HTML/JS pages? What benefit does Rails give you for building JSON APIs? And isn't Rails really slow?

Well no, I've been through this before. If you are building API-only applications with a single-page HTML5/JS frontend, you should definitely check out Rails::API. Rails::API completely eliminates any ActionView-centrism you may be worried about in Rails, and gives you awesome tools for building JSON APIs, like ActiveModel::Serializers. But that alone can't express what Rails brings to the table, so here as list of features Rails provides which are useful for JSON APIs, courtesy the Rails::API README:

Handled at the middleware layer:

  • Reloading: Rails applications support transparent reloading. This works even if your application gets big and restarting the server for every request becomes non-viable.
  • Development Mode: Rails application come with smart defaults for development, making development pleasant without compromising production-time performance.
  • Test Mode: Ditto test mode.
  • Logging: Rails applications log every request, with a level of verbosity appropriate for the current mode. Rails logs in development include information about the request environment, database queries, and basic performance information.
  • Security: Rails detects and thwarts IP spoofing attacks and handles cryptographic signatures in a timing attack aware way. Don't know what an IP spoofing attack or a timing attack is? Exactly.
  • Parameter Parsing: Want to specify your parameters as JSON instead of as a URL-encoded String? No problem. Rails will decode the JSON for you and make it available in params. Want to use nested URL-encoded params? That works too.
  • Conditional GETs: Rails handles conditional GET, (ETag and Last-Modified), processing request headers and returning the correct response headers and status code. All you need to do is use the stale? check in your controller, and Rails will handle all of the HTTP details for you.
  • Caching: If you use dirty? with public cache control, Rails will automatically cache your responses. You can easily configure the cache store.
  • HEAD requests: Rails will transparently convert HEAD requests into GET requests, and return just the headers on the way out. This makes HEAD work reliably in all Rails APIs.

Handled at the ActionPack layer:

  • Resourceful Routing: If you're building a RESTful JSON API, you want to be using the Rails router. Clean and conventional mapping from HTTP to controllers means not having to spend time thinking about how to model your API in terms of HTTP.
  • URL Generation: The flip side of routing is URL generation. A good API based on HTTP includes URLs (see the GitHub gist APIfor an example).
  • Header and Redirection Responses: head :no_content and redirect_to user_url(current_user) come in handy. Sure, you could manually add the response headers, but why?
  • Caching: Rails provides page, action and fragment caching. Fragment caching is especially helpful when building up a nested JSON object.
  • Basic, Digest and Token Authentication: Rails comes with out-of-the-box support for three kinds of HTTP authentication.
  • Instrumentation: Rails 3.0 added an instrumentation API that will trigger registered handlers for a variety of events, such as action processing, sending a file or data, redirection, and database queries. The payload of each event comes with relevant information (for the action processing event, the payload includes the controller, action, params, request format, request method and the request's full path).
  • Generators: This may be passé for advanced Rails users, but it can be nice to generate a resource and get your model, controller, test stubs, and routes created for you in a single command.
  • Plugins: Many third-party libraries come with support for Rails that reduces or eliminates the cost of setting up and gluing together the library and the web framework. This includes things like overriding default generators, adding rake tasks, and honoring Rails choices (like the logger and cache backend).
Rails has an unquestionably awesome feature set even if applied exclusively to JSON APIs, and this guy is taking it completely for granted:
"So your Rails server becomes an API, and your web site, like the iOS app, is the client. It's a clean separation of responsibilies, but given what Rails was designed to do, it's like having a horse rider climb on top of an elephant."
The design of Rails, as of Rails 1.2, provided clean abstractions for using the same code to provide server-generated HTML views and "REST" APIs in multiple serialization formats. This was a big deal at the time, and "the time" was 2 years before Node even existed. Fast forward 4 years and Rails 3 has been rewritten with an emphasis on modularization, allowing you to strip out the components you don't use and build lightweight stacks with only the things you need. Rails::API provides convention over configuration for a lightweight JSON-oriented stack.

But let me back up a little bit...
"The view in MVC is not just HTML and CSS; it's the presentation logic, and the presentation logic needs structure. With this need, client-side frameworks like Backbone, Spine, and Ember have come into the picture."
So I hear this guy Yehuda Katz worked on both Ember and Rails. You may have heard of Ember, it just won Throne of JS's framework of choice (Backbone won in the "library" category). But appeal to authority aside, what does using Ember and Rails in combination actually get you?

A problem I am certain you have run into is the manual nature of serializing JSON. Exactly how should you translate from a domain object into a JSON representation? What if the client wants to avoid repeat requests by eagerly loading other domain objects which are associated with the one you want to retrieve and including them in the JSON result? And wouldn't it be great if there were a single canonical representation for all of this that a standardized domain object abstraction running in the browser could automatically consume for us, so we don't have to manually write a bunch of JSON serialization and deserialization logic for everything in our system?

Can we put JSON on Rails? Yes we can: it's called ActiveModel::Serializers and Ember Data. All that glue code you've been writing over and over for serializing and unserializing JSON? Stop that. Seriously. You have better things to do than deal with the idiosyncrasies of whether you should wrap a particular array in an object or return a literal string or number as opposed to an object for future proofing. You are wasting your time with this minutiae and chances are the ActiveModel::Serializers representation is better than the one you are using. Let's take a look at why. 

The defining characteristics of the ActiveModel::Serializers JSON representation is that it explicitly avoids nesting objects within objects, instead preferring to keep the resulting structure flat and using IDs to correlate the relationships between data in the structure. Here is an example of a "post" object which includes comments and tags, taken from the ActiveModel::Serializers README:
{
  "post": {
    "id": 1,
    "title": "New post",
    "body": "A body!",
    "comments": [ 1, 2 ]
  },
  "comments": [
    { "id": 1, "body": "what a dumb post", "tags": [ 1, 2 ] },
    { "id": 2, "body": "i liked it", "tags": [ 1, 3 ] },
  ],
  "tags": [
    { "id": 1, "name": "short" },
    { "id": 2, "name": "whiny" },
    { "id": 3, "name": "happy" }
  ]
}
There are multiple nested relationships in this document: the post has many comments, and comments have many tags. And yet we don't see duplication of comment or tag objects. We don't have to worry about which version of a repeated object is canonical, because there are no repeated objects. Objects within the resulting document are deduplicated and referred to symbolically by their ID. Using this JSON structure we can represent arbitrarily nested relationships between objects in the most efficient manner possible and completely avoid any problems with inconsistencies between duplicated versions of objects present in the document. This representation of JSON just makes sense, and perhaps you too have standardized upon it. Better yet, if you use this representation, then with very little effort on your part Ember Data can automatically consume it.

If you use Ember and Rails, you can abstract away JSON and save yourself the headache of writing custom serialization code. I'm going to say: score one for Rails and single page applications. Maybe you have some Node thing that can do that too, I don't know, but seriously, if you think Rails is bad for JSON APIs, you don't know Rails.

Moving right along, let's continue slogging through the Gish Gallop.

Node has nonblocking async I/O and Rails doesn't so Rails is slow!!!

Where to start with this one. Hmm, let's start here:
"When I think of Ruby and Rails' performance, I think of Ilya Grigorik."
Let me start by saying that Ilya is an awesome guy who has done a very thorough and nuanced survey of the many facets of Ruby performance over time. Taking any single thing he's said out of context and treating it like gospel is probably doing a disservice to Ilya. That said, let's see what thing Ilya said that this guy chose to single out and present out of context. Quoth Ilya:
"There is nothing about node that can't be reproduced in Ruby or Python (EventMachine and Twisted), but the fact that the framework forces you to think and use the right components in place (fully async & non-blocking) is exactly why it is currently grabbing the mindshare of the early adopters. Rubyists, Pythonistas, and others can ignore this trend at their own peril. Moving forward, end-to-end performance and scalability of any framework will only become more important."
So this is a line I hear out of Ryan Dahl a lot too. It's a line I used to believe.

Folks, I've been doing this stuff for awhile. I first discovered synchronous I/O multiplexing when I was about 15, which for me was half a lifetime ago, and since then I've been building network servers using this approach. I've built my own abstraction layers across select/poll/epoll/kqueue. I wrapped libev for Ruby in Rev/Cool.io and nio4r, the latter of which is a cross-platform abstraction for Java NIO on JRuby. I cannot express to you how much work I've invested in doing things the evented non-blocking way.

I don't think non-blocking I/O is a good fit for web applications that talk HTTP, although I think it can be a good fit for Websocket applications. I will get to my reasons later. But first, let's continue digging through the Gish Gallop:
"Ilya mentioned the framework/ecosystem that I now consider to be the threat to Rails: Node.js [...] The biggest thing I noticed was the difference in performance. It consumed less memory than Ruby, and it served more requests per second than Sinatra or even Rack."
I have a huge pet peeve, and that's when people talk about performance without numbers. I tried it and it was faster. I tried it and it was slower. If you really want to make a point about the performance of a particular thing, can you at least pretend you're using science?

I hate to do this, but I think I have to destroy your god. Let's see how Ilya's software stacks up to mine on a crappy "hello world" web server benchmark. First, the numbers for my web server Reel:

# httperf --num-conns=50 --num-calls=1000

Ruby Version        Throughput    Latency
------------        ----------    -------
JRuby HEAD          5650 reqs/s   (0.2 ms/req)
Ruby 1.9.3          5263 reqs/s   (0.2 ms/req)
JRuby 1.6.7         4303 reqs/s   (0.2 ms/req)
rbx HEAD            2288 reqs/s   (0.4 ms/req)
Let's compare to Ilya's web server Goliath, as well as Thin and Node.js:
Web Server          Throughput    Latency
----------          ----------    -------
Goliath (0.9.4)     2058 reqs/s   (0.5 ms/req)
Thin    (1.2.11)    7502 reqs/s   (0.1 ms/req)
Node.js (0.6.5)     11735 reqs/s  (0.1 ms/req)
All of these servers, including mine, are using non-blocking evented I/O. Is that remotely relevant? No. That's just a coincidence.

My web server is faster than Ilya's. So by Gish Gallop logic, Ilya must be wrong about everything. There must be no reason to use Ilya's web server. Let's write everything in Node since it won the benchmark.

There's a huge problem here: Goliath does things that Reel, Thin, and Node's HTTP server don't do. The reason it's slower isn't because Ilya sucks and is clueless about performance. The reason is that Goliath has features which these other web servers don't, which makes it an apples to oranges comparison. (I guess scumbag me for putting them all in a big list on the Reel web page)

The same can be said of Rails: it probably isn't ever going to have better latency through the entire stack  than any Node.js framework, but the latency of the Rails stack is probably going to be a lot less than your application logic, and that's still going to be a drop in the bucket compared to the network latency to a given user.

Celluloid solves every single problem you're whining about better than Node

Node has a lot of problems, and I'm not just talking about the audience it attracts. Let me start by saying this: many of the things I have built in Celluloid are based off of technologies originally developed for Node. My web server Reel uses the Node HTTP parser, and it's quite likely that the next iteration of nio4r I develop will be based off of libuv.

All that said, let me start with Node's fundamental problem: callback-driven I/O. Celluloid::IO is one of many systems, including Erlang and Go, that demonstrate that "nonblocking" and "evented" I/O are orthogonal to callbacks. Celluloid uses Ruby's coroutine mechanism to provide a synchronous I/O API on top of an underlying nonblocking system. However, where systems like Node force you to use nonblocking I/O for everything, Celluloid lets you mix and match blocking and nonblocking I/O as your needs demand.

If you have ever worked in a language like C(++) or Java, you probably know an amazing property of sockets: you can mix and match blocking and nonblocking I/O, even over the lifecycle of a single socket. Perhaps you will handle incoming sockets in a nonblocking manner at first, but if they make a complex request, you might change the socket to a blocking mode and hand it off to a worker thread.

Celluloid::IO makes this handoff completely transparent: simply by giving the socket to another Ruby thread which isn't a Celluloid::IO actor, it will automatically switch from nonblocking to blocking mode completely transparently.

But let's talk about Node's real fundamental problem, one that is extremely difficult to solve in any callback-driven system: flow control. Unfortunately the Node.js community has adopted the phrase "flow control" to mean "building abstractions around managing callbacks", however the phrase "flow control" has a very specific definition relating to the rates at which data is transmitted between systems.

In general, callback-driven systems can't manage flow control effectively. The most notable pathological case is the producer-consumer problem, whereby a slow consumer might force a system like Node to unboundedly buffer data from an unchecked producer. There's a clear and simple solution to this problem: make all I/O synchronous. Using coroutines that provide blocking-style APIs, you can easily compose producer/consumer problems in a manner that doesn't result in unbounded writes to a buffer, because simply by virtue of a virtual blocking API, the rate at which data is transfered from producer to consumer is kept in check.

But what about WebSockets?

Ruby has had some pretty awesome albeit overlooked and therefore stagnant solutions for WebSockets for awhile, like Cramp. I've been working on web-based push technologies for half a decade now, and explored a multitude of solutions including Comet, XMPP/BOSH, RabbitMQ long polling, and my own XHR long polling systems which I originally built around *gasp* threads nearly 3 years ago at this point.

Well, I'm quite happy to say that Reel now supports WebSockets. I certainly don't want to say that my recent spike is anywhere as mature as WebSockets in Node or their surrounding ecosystem. Instead, I think the API that Reel provides for WebSocks is simply better by design. If you managed to catch tenderlove's recent blog post on streaming live data, you may understand that all previous APIs you may have encountered in both systems like Rails or Node for streaming data were really obscuring the one API that truly makes sense for this use case: a socket.

WebSockets are in many ways similar to 0MQ sockets (which are used in DCell via Celluloid::ZMQ). WebSockets provide a framing mechanism which provides a message-based transport instead of the typical stream-based transport provided by TCP. That said, when processing message sequences, callbacks become extremely problematic, because you must reconstruct the state of the current request from the point of each incoming message. Callbacks work well for e.g. a chat protocol where there is no state relationship between messages, but as soon as there is you are effectively stuck building a finite state machine to manage the processing of each incoming message.

This is madness. There's a much better and much more straightforward solution to this problem: just use the goddamn stack. In order to do so, you need to provide a "blocking" API, but this isn't orthogonal to using nonblocking I/O. Celluloid::IO, Go, and Erlang all let you build concurrent, multithreaded, and potentially multicore systems on top of coroutines spread across multiple native threads.

That said, native threads are cheap nowadays and they're only getting cheaper. On most Ruby VMs a native thread will cost you about 20kB of RAM. If you want you can just build blocking I/O systems completely out of native threads without using any sort of evented I/O, and these systems can scale up to tens of thousands of connections.

Don't believe the hype

Node provides a limited subset of what Ruby can do, and it can be done better with Ruby. Node does not have a web framework of the same caliber as Rails. Node doesn't have threads, which in Ruby will spare you from Node's callback soup. Finally, there's the elephant in the room: JavaScript is a terrible, terrible programming language compared to Ruby. We're forced to use JavaScript in the browser, but on the server, we can choose the best language for the job.

Ruby on Rails remains the best-in-class web framework, and while there are arguments to be made against it, the ones I hear coming out of confused Node.js detractors do not hold water.

Wednesday, June 6, 2012

Ruby is faster than Python, PHP, and Perl

There's a pervasive myth that Ruby is slow, and moreover, that it's the slowest language in popular use. Everyone knows Ruby is slow. Right? Who would possibly disagree that Ruby is slow? Here's an example IRC discussion on freenode's #postgres which happened just yesterday:

16:57 sobel: i can't find it now, but arstechnica benched all the popular dynamic languages
16:58 sobel: using C/C++ as the standard (1.0) they ranked other languages as multiples of C/C++ performance
16:58 sobel: java was a 2
16:58 sobel: twice as slow as C. and it was the winner, head and shoulders above the rest
16:58 sobel: i think Erlang placed somewhat favorably
16:59 sobel: python/perl were near the middle, at something like 25-35x slower than C
16:59 sobel: Ruby: a full 72x slower than C

Ruby loses against invisible Ars Technica benchmarks in the sky with unknown URLs! 72x slower than C! Over twice as slow as Python and Perl!

Fortunately, we don't have to rely on some one-off benchmark Ars Technica may or may not have done at some indeterminate point in time whose URL cannot be located offhand, because there's a site with a fairly decent reputation which provides ongoing benchmarks across multiple programming languages using implementations submitted by fans of said language. It's been around for awhile and is relatively well-trusted.

That site is the Programming Languages Shootout, and unlike the alleged Ars Technica benchmark, you can actually visit their web site at shootout.alioth.debian.org. What do they have to say about programming language performance?


According to this benchmark suite, JRuby is 34.5 times slower than (not C, not C++, but) Fortran. Ruby 1.9 (MRI/YARV) is 43.80 times slower than (not C, not C++, but) Fortran. Both JRuby and Ruby 1.9 beat Python, PHP, and Perl by a considerable margin. The nearest competitor is Python 3, at 47.93 times slower than (not C, not C++, but) Fortran. By the way, did I mention that the fastest language on their benchmark is... not C... not C++, but Fortran? (nothing personal sobel, but unsubstantiated hearsay is bad!)

Yes, that's right folks: according to the Programming Languages Shootout, Python, PHP, and Perl are all slower than Ruby. Did you think Ruby was slower than Python? Guess what, you're wrong. Ruby used to be one of the slowest popular languages, but that has changed. Ruby performance has advanced considerably over the years, so if you're still repeating some offhand information you may or may not have gotten from Ars Technica at some point but can't find the link to as your metric of Ruby performance, you may want to try again, and find modern, relevant information you can actually get a link to.

There are many future VM improvements in the pipe for Ruby, Python, and PHP (and I guess Perl users might continue dreaming of a Parrot-powered Perl 6). Rubyists can look forward to the upcoming JRuby 1.7 release which features InvokeDynamic support and allows for Java-speed method dispatch... in Ruby. InvokeDynamic is a game changer for the JVM in general, and it's a game changer for Ruby, because InvokeDynamic makes JRuby dispatch potentially as fast as Java.

Python users can look forward to PyPy, which is posting some incredibly impressive numbers, especially around numerical computing. Python users can also look forward to resumed work on Jython, which is adding InvokeDynamic support which can potentially make Python as fast as Java. Finally, PHP users can look forward to the HipHop VM developed at Facebook, which will provide much improved performance for PHP. These are all great projects, but none of them are really ready for general consumption yet (including JRuby 1.7).

All that said, the Programming Language Shootout doesn't include any of these unreleased development versions in the benchmarks you see when you visit their site. They show the numbers for the latest production releases, and those numbers show Ruby is faster than Python, PHP, and Perl.

The game has changed: you just weren't paying attention.

Last but not least, if you've seen some benchmark somewhere, even if you have an eidetic memory and remember but the numbers were, but can't even dredge up a link to it, please, please, don't quote said benchmark, even if you have an eidetic memory and remember what the numbers were.

For benchmarks to be remotely scientific, they must be both reproducible and falsifiable, and hopefully in addition to both those things they have a good methodology. If you can't even dredge up a link to the benchmark in question, please don't go quoting numbers off the top of your head to people who might be influenced by them.

Let's advance computer science beyond the state of witch doctors telling people to bleed themselves with leeches because at some point someone said they might make you feel better maybe.

Tuesday, April 17, 2012

Introducing DCell: actor-based distributed objects for Ruby

DCell, which is short for Distributed Celluloid (and pronounced like the batteries you used to jam into boom boxes and RC cars) is an actor-based distributed object oriented programming framework for Ruby. Celluloid is an actor library I wrote for Ruby which exposes concurrent Ruby objects that "quack" just like any other Ruby object. DCell takes the asynchronous messaging protocol from Celluloid and exposes it to distributed networks of Ruby interpreters by marshaling Celluloid's messages as strings and sending them to other nodes over 0MQ.

Before I talk about DCell I'd like to talk a little bit about the history behind distributed objects in general and the ideas that DCell draws upon.

A Brief History of Distributed Objects

Nowadays you don't hear people talking much about distributed objects. However, once upon a time, distributed objects were big business. They used to be one of Steve Jobs' passions during his time at NeXT. In the mid-90's, Steve Jobs phased out NeXT's hardware division and began repositioning the company as, among other things, the "largest object supplier in the world." NeXT turned its focus exclusively to its software, namely the WebObjects framework for building dynamic web sites. It would also ship the Enterprise Objects Framework, which Steve described as allowing you to "make NeXTSTEP objects, and with no programming, have persistent and coherent storage with SQL databases" (sound a little bit like ActiveRecord a decade before ActiveRecord, anyone?)

WebObjects was built on a technology called Portable Distributed Objects, which allowed Objective C applications developed on any platform to seamlessly interoperate over computer networks and across platforms. If you haven't seen it, watch Steve Jobs 1995 presentation The Future of Objects (although you may want to skip directly to where Steve begins talking about distributed objects). Even now, some 16 years later, these demos still seem futuristic. Steve loved the simplicity of distributed objects: "Objects can message objects transparently that live on other machines over the network, and you don't have to worry about the networking gunk, and you don't have to worry about finding them, and you don't have to worry about anything. It's just as if you messaged an object that's right next door."

So why is it some of Steve's demos seem futuristic and slightly beyond our grasp even a decade and a half later, and building multi-tier client/server (i.e. web) applications is a challenge typically involving building lots of client and server wrapper code instead of transparently invoking distributed objects? Unfortunately, Steve's rosy future of distributed objects never really came to pass. NeXT's technology was commercial, only available in Objective C (at a time when C++ reigned supreme and no one used Objective C), and couldn't beat the open web and the standards that would emerge for developing web APIs. Meanwhile, the industry standards for distributed objects would be mired in debacles of their own.

In the before time, in the long long ago, in the days before JSON, XML, and REST, when Tim Berners-Lee had just started hosting the world's first web site on his personal NeXTstation, C++ was the lingua franca of the day and people were just getting started making C++ programs communicate over networks. The Object Management Group (with the now-unfortunate acronym OMG) hashed out the necessary wire protocols, object representations, interface definitions, and a profusion of other standards needed to allow C++ programs to invoke remote objects over TCP/IP networks. The result was CORBA.

CORBA is a technology that drowned in its own complexity, however in the early '90s it was the Enterprise: Serious Business way to allow applications to communicate over networks. Once you waded through the myriad complexity of the various CORBA technologies, you were left with objects you could invoke over the network in a manner somewhat reminiscent of the way you would invoke an object locally.

However, soon the web would take off and HTTP would soon become the de facto protocol programs used to communicate. Unfortunately, HTTP is a terrible protocol for implementing distributed objects. Where HTTP exposes a predefined set of verbs you can use to manipulate resources which have a number of possible representations, objects don't have representations: they are just objects, and you talk to them with messages. This didn't stop people from trying to build a distributed object protocol on top of HTTP though. The result was SOAP, a protocol which abandoned CORBA's "orbs" for web servers, its Interface Definition Language for WSDL, its Common Data Representation for XML, and its Inter-Orb Object Protocol for HTTP.

This was something of a step in the right direction: SOAP actually was a "Simple" Object Access Protocol when compared to CORBA, and SOAP would soon vanquish CORBA for Enterprise: Serious Business applications. While SOAP would gain a few fans, particularly in the Java and .NET communities, who saw the value of being able to expose objects to the network with a few point-and-clicks which generated gobs of XML, SOAP would soon join CORBA's ranks as a reviled technology.

SOAP's complexity comes first and foremost being a committee standard that would succumb to "too many cooks" syndrome in the same way CORBA did. It also suffered from trying to be a cross-language protocol that needed to deal with static type systems, requiring tools which could read WSDL definitions and spit out volumes of generated boilerplate code for interacting with remote services. Beyond that, SOAP suffered from an impedance mismatch with HTTP by largely ignoring the features HTTP provides, using it as little more than a transport wrapper for shoving blobs of XML across the network. The XML contained the actual messages to be sent to remote objects or the responses coming from a remote method invocation, while anything done at the HTTP level itself was just boilerplate.

REST to the rescue?

For all its complexity, it was easy to lose sight of what SOAP was actually trying to do. Rather than painlessly interacting with remote objects, SOAP left us wondering why things were so slow, staring at WSDL errors wondering what went wrong, and picking through gobs and gobs of XML trying to debug problems. REST, which eschewed distributed objects and favored the paradigms of HTTP, would be SOAP's coup de grâce. SOAP is now relegated to a handfull of legacy enterprise applications whereas the rest of the open web has almost universally embraced REST.

So REST triumphed and web APIs reign supreme. Distributed objects are little more than a footnote in history. And we're still left wondering why putting together the sorts of demos Steve Jobs was showing with Portable Distributed Objects in 1995 is so hard.

Using REST makes sense when exposing services for third parties to use. However, if you control both the client and server, and have Ruby frontends talking to Ruby services, REST begins to look like a bunch of gunk that's getting in the way:

Implementing domain objects, REST services, and REST clients becomes work duplicated in 3 places across systems using this sort of architecture. Wouldn't it be a lot simpler if the frontend web application could simply consume remote services as if they were just objects?

This sort of simplicity is the goal of distributed objects.

Distributed Objects in Ruby

Ruby has its own foray into the world of distributed objects: DRb, or Distributed Ruby. DRb exposes Ruby objects to the network, each uniquely identified by a URI. Clients can ask for a DRb object by URI and get a DRbObject back. These DRbObjects act as proxies, intercepting calls and sending them over the network, serializing them with Ruby's Marshal, where they're handled by a remote DRbServer. DRbServer uses a thread-per-connection model, allowing it to concurrently process several requests. When a DRb connection handler receives a request, it looks up the requested object, invokes a method on it, then serializes the response and sends it back to the caller.

For the most part, DRb has lingered in obscurity, save until recently with the publication of the dRuby book which has stirred up a modicum of interest. Where Steve Jobs thought PDO was "by far the easiest way to build multi-tier client/server applications because of [its] completely transparent distributed object model," Rubyists don't turn to DRb to build multi-tier applications, but instead typically rely on REST, building out APIs for what is, in the end, distributed communication between Ruby objects.

Why is it then that DRb isn't the go-to tool people use for building multi-tiered web applications in Ruby? It's easy to say that DRb failed because people are used to thinking in terms of HTTP and can better understand the semantics of the system when using HTTP, especially when it comes to areas like load balancing and caching. Separating services with HTTP also opens up the door to reimplementing those services in a different language in the future. However, even with future-proofing for a rewrite in another language out of the picture, I think most Rubyists would still choose to use REST APIs instead of DRb, and I think that's a defensible position.

While DRb does a great job of trying to make access to remote objects as transparent as possible, it has a number of flaws. DRb is inherently multithreaded but doesn't give the user any sort of tools or framework to manage concurrent access to objects. This means building DRb applications immediately exposes you to all the complexities of multithreaded programming whether you're aware of it or not, and Rubyists seem generally uncomfortable with building thread-safe programs. While DRb allows you to talk to in-process objects the same way you'd talk to out-of-process objects, but it doesn't make it natural to build a program that way.

Beyond Objects: The Power of Distributed Erlang

While CORBA and SOAP are reviled for their complexity, there's another distributed system which is beloved for its high level of abstraction: Distributed Erlang. Erlang is, if anything, a highly opinionated language whose central design goal is to build robust self-healing systems you never need to shut down. When it comes to distribution, Erlang's goal is to make it as transparent as possible. Erlang is a dynamic language which insists you express everything within a scant number of core types. This makes serializing state in Erlang so you can ship it across the wire extremely simple and fast.

However, the real strength of Erlang is the Actor Model, which can be more or less summarized as follows:
  1. Actors communicate by sending messages
  2. Every actor has a mailbox with a unique address. If you have an actor's address, you can send it messages
  3. Actors can create other actors, and when they do they know their address so they can send the newly-created actors messages
  4. Actors can pass addresses (or handles) to themselves or other actors in messages
Erlang uses this method within individual VMs as the basis of its concurrency model. Erlang actors (a.k.a. processes) all run concurrently and communicate with messages. However, Erlang also supports distribution using the exact same primitives it uses for concurrency. It doesn't matter which type of actor you're talking to in Erlang, they "quack" the same, and thus Erlang has you model your problem in a way that provides both concurrency and distribution using the same abstraction.

Distributed Erlang offers several features aimed at building robust distributed systems. The underlying messaging protocol is asynchronous, allowing many more messaging patterns than traditional RPC systems (e.g. HTTP) which use a request/response pattern that keeps the client and server in lockstep. Some examples of these patterns are round robin (distributing messages across N actors), scatter/gather (distributing computation across N actors and gathering the results), and publish/subscribe (allowing actors to register interest in a topic, then informing them of events related to that topic).

In addition, Erlang processes can link to each other and receive events whenever a remote actor exits (i.e. if it crashes). This allows you to build robust systems that can detect errors and take action accordingly. Erlang emphasizes a "fail early" philosophy where actors are encouraged not to try to handle errors but instead crash and restart in a clean state. Linking allows groups of interdependent actors to be taken down en masse, with all of them restarting in a clean state afterward. Exit events can also be handled, which is useful in distributed system for things like leader election.

DCell provides all of these features. When you create an actor with Celluloid, a proxy object to the actor is returned. This proxy lets you use the method protocol to communicate with an actor using messages. DCell implements special marshalling behavior for these proxy objects, allowing you to pass them around between nodes in a DCell system and invoke methods on remote actors in the exact same way you would with local actors.

Unlike DRb, DCell also exposes asynchronous behaviors, such as executing method calls in the background, and also using futures to schedule method invocation in advance then waiting for the result later. DCell also lets distributed actors to link to each other and be informed when a remote actor terminates.

I'm certainly not the first to imitate Erlang's approach to distribution. It's been seen in many other (distributed) actor frameworks, including the Akka framework in Scala and the Jobim framework in Clojure.

Bringing Erlang's ideas over to Ruby

I have a long history of projects that try to cross-pollenate Ruby and Erlang. My first attempt was Revactor, my previous attempt at an actor library which provided a very raw and low-level API which is almost identical to the Rubinius Actor API. Revactor modeled each actor as a Fiber and thus provided no true concurrency. Another of my projects, Reia, tried to bring a more friendly syntax and OO semantics to Erlang.

With Celluloid I've come full circle, trying to implement Erlang's ideas on Ruby again. Only this time, Celluloid makes working with actors easy and intuitive by embracing the uniform access principle and allowing you to build concurrent systems that you can talk to just like any other Ruby object. Celluloid also provides asynchronous calls (what Erlang would call a "cast") where a method is invoked on the receiver but the caller doesn't wait for a response. In addition to that Celluloid provides futures, which allow you to kick off a method on a remote actor and obtain the value returned from the call at some point in the future.

In addition Celluloid embraces many of Erlang's ideas about fault tolerance, including a "crash early" philosophy. Celluloid lets you link groups of interdependent actors together so if any one fails you can crash an entire group. Supervisors and supervision trees automatically restart actors in a clean state whenever they crash.

Celluloid does all of this using an asynchronous message protocol. Actors communicate with other actors by sending asynchronous messages. A message might say an actor has crashed, or another actor is requesting a method should be invoked, or that a method invocation is complete and the response is a given value. All of the heavy lifting for building robust, fault-tolerant systems is baked into Celluloid.

When programs are factored this way, adding distribution is easy. DCell takes the existing primitives Celluloid has built up for building concurrent programs and exposes them onto the network. DCell itself acts as little more than a message router, and the majority of the work in adding fault tolerance is still handled by Celluloid.

Getting Started with DCell

To understand how DCell works we need to look at how a cluster is organized. This is an example of a cluster with 5 nodes:



In this picture the green nodes represent individual Ruby VMs. The links between the nodes are shown in black or gray to illustrate actively connected or potentially connected nodes. DCell makes connections between nodes lazily as actors request them. This means DCell clusters are potentially fully connected networks where each of the nodes is (or can be) directly connected to every other node in the cluster. DCell doesn't implement any sort of routing system or overlay network, and instead depends on all nodes being directly accessible to each other over TCP.

DCell uses 0MQ to manage transporting data over the network. 0MQ supports several different messaging patterns, and in the future DCell may use more of them, but for the time being DCell uses PUSH/PULL sockets exclusively. This works well because Celluloid's messaging system is asynchronous by design: each node has a PULL socket that represents the node's mailbox, and the other nodes on the network have a PUSH socket to send that node messages.

To configure an individual DCell node, we need to give it a node ID and a 0MQ address to bind to. Node IDs look like domain names, and 0MQ addresses look like URLs that start with tcp:

require 'dcell'
DCell.start :id => "node1", :addr => "tcp://127.0.0.1:7777"
view raw gistfile1.rb hosted with ❤ by GitHub

To create a cluster, we need to start another Ruby VM and connect it to the first VM. Once you have a cluster of multiple nodes, you can bootstrap additional nodes into the cluster by pointing them at any node, and all nodes will gossip about the newly added node:

DCell.start :id => "node2", :addr => "tcp://127.0.0.1:7778",
:directory => {:id => 'node1', :addr => 'tcp://127.0.0.1:7777'}
view raw gistfile1.rb hosted with ❤ by GitHub
Once you are connected to another node, you can browse the available nodes using DCell::Node.all:

>> DCell::Node.all
=> [#<DCell::Node[node1] @addr="tcp://127.0.0.1:7777">, #<DCell::Node[node2] @addr="tcp://127.0.0.1:7778">]
view raw gistfile1.txt hosted with ❤ by GitHub

To invoke a node on a particular service, obtain a handle to its node object, then look up an individual actor by the name it's registered under. By default, all nodes run a basic information service which you can use to experiment with DCell:

>> info_service = DCell::Node['node2'][:info]
=> #<Celluloid::Actor(DCell::InfoService:0x1864) @platform="java" @ruby_platform="jruby 1.6.7" @cpu_type="Intel(R) Core(TM) i7-2635QM CPU" @ruby_version="1.9.2" @ruby_engine="jruby" @cpu_vendor=:intel @distribution="Mac OS X 10.7.3 (11D50b)" @cpu_count=8 @hostname="wintermute.local" @cpu_speed=2.0 @os="darwin" @cpu_arch="x86_64" @os_version="11.3.0">
>> info_service.load_averages
=> [0.92, 1.18, 1.14]
view raw gistfile1.txt hosted with ❤ by GitHub
To implement your own DCell service, all you have to do is create a Celluloid actor and register it on your node. See the info service source code for an example.

DCell Explorer

DCell also includes a simple web UI for visualizing the state of a particular DCell cluster. To launch the web UI, run:

require 'dcell/explorer'
DCell::Explorer.new("localhost", 8000)
view raw gistfile1.rb hosted with ❤ by GitHub

Then go to http://localhost:8000/ (provided you used the same host/port). You should see the following:


This is the basic dashboard that DCell's web UI provides. You can see connected nodes, their connection state, and if they're available browse the info service and see various information about them.

We're just getting started...

DCell is only about half a year old, and still relatively immature, but already it's seen a great number of contributors and a lot of attention. Some of the most exciting developments are still on the horizon, including paxos-powered multicall support which will let you call a quorum of nodes, along with generalized group membership support with leadership election, and generalized pub/sub.

All that said, I'd like to think that DCell is the most powerful distributed systems framework available for Ruby today, and I would love to see the remaining bugs ironed out and missing features added.

That can only happen if people are using DCell, finding bugs, and reporting missing features. This may sound a little bit scary, but if you're considering building a nontrivial distributed system in Ruby, DCell is a great place to start.

Monday, March 19, 2012

Don't use bcrypt

(Edit: Some numbers for you people who like numbers)

If you're already using bcrypt, relax, you're fine, probably. However, if you're looking for a key derivation function (or in bcrypt's case, password encryption function) for a new project, bcrypt is probably not the best one you can pick. In fact, there are two algorithms which are each better in a different way than bcrypt, and also widely available across many platforms.

I write this post because I've noticed a sort of "JUST USE BCRYPT" cargo cult (thanks Coda Hale!) This is absolutely the wrong attitude to have about cryptography. Even though people who know much more about cryptography than I do have done an amazing job packaging these ciphers into easy-to-use libraries, use of cryptography is not something you undertake lightly. Please know what you're doing when you're using it, or else it isn't going to help you.

The first cipher I'd suggest you consider besides bcrypt is PBKDF2. It's ubiquitous and time-tested with an academic pedigree from RSA Labs, you know, the guys who invented much of the cryptographic ecosystem we use today. Like bcrypt, PBKDF2 has an adjustable work factor. Unlike bcrypt, PBKDF2 has been the subject of intense research and still remains the best conservative choice.

There has been considerably less research into the soundness of bcrypt as a key derivation function as compared to PBKDF2, and simply for that reason alone bcrypt is much more of an unknown as to what future attacks may be discovered against it. bcrypt has a higher theoretical-safety-to-compute-time factor than PBKDF2, but that won't help you if an attack is discovered which mitigates bcrypt's computational complexity. Such attacks have been found in the past against ciphers like 3DES. Where 3DES uses a 168-bit key, various attacks have reduced that key size's effectiveness to 80-bits.

PBKDF2 is used by WPA, popular password safes like 1Password and LastPass, and full-disk encryption tools like TrueCrypt and FileVault. While I often poke fun at Lamer News as a Sinatra antipattern, I have to applaud antirez on his choice of PBKDF2 when he got bombarded with a "just use bcrypt!" attack (although bro, antirez, there's a PBKDF2 gem you can use, you don't have to vendor it)

The second cipher to consider is scrypt. Not only does scrypt give you more theoretical safety than bcrypt per unit compute time, but it also allows you to configure the amount of space in memory needed to compute the result. Where algorithms like PBKDF2 and bcrypt work in-place in memory, scrypt is a "memory-hard" algorithm, and thus makes a brute-force attacker pay penalties both in CPU and in memory. While scrypt's cryptographic soundness, like bcrypt's, is poorly researched, from a pure algorithmic perspective it's superior on all fronts.

The next time you need to pick a key derivation function, please, don't use bcrypt.

Thursday, March 8, 2012

Announcing Lightrail: Lightweight Rails stack for HTML5/JS applications

There's been a lot of debate lately surrounding Rails suitability for the server stack underlying modern HTML5/JS applications. Having used Rails for some four years for this purpose, and worked with a number of Rails core members, this is a problem I think the Ruby community has solved wonderfully, but yet some are confused as to what solutions are available or the way forward.

Rails 3 provided enormous advances in terms of letting you specialize what Rails provides to the problem at hand. However, to a certain extent this goes against the Rails mantra of "convention over configuration". While Rails 3 provides ample opportunities for configuration, as Rubyists, we shouldn't have to configure anything, right?

I'm a huge believer in both Rails' suitability as a backend for modern client-heavy HTML5/JS applications, and someone experienced in building such applications. Rails is, was, and continues to be a game-changer for modern web development. ActionController::Metal provides the bare minimum needed to build apps which don't need the complete set of HTML-generating abstractions provided by ActionView, but need more tools than a more minimalistic framework like Sinatra makes available. Between Sinatra and Rails lies an unaddressed middle ground, one where, in theory, you should be able to build an ActionController::Metal stack appropriate to your needs, but maybe this is too daunting a task.

For you, the JSON API builder, who wants more than Sinatra but less than Rails... I have what you desire. Introducing Lightrail:


What is Lightrail? Lightrail is Strobe's ActionController::Metal stack for HTML5 applications, originally used to provide the backend APIs for Strobecorp.com and its frontend HTML5/JS application authored with SproutCore (which has been superseded by Ember.js).

Lightrail contains everything you need to build lightweight applications on the Rails stack which serve only JSON, and furthermore, contains an innovative system for building JSON APIs around your objects. Rather than adding a fat #to_json method on your models, or using a template to construct your JSON, Lightrail's allows you to map the JSON serializations of your objects to specific wrappers that know how to serialize specific objects. Like using #to_json, this makes it easy to recursively serialize nested objects to JSON without having to use ActionView voodoo like invoking other renderers. Besides that, it still separates the concerns of what your domain objects are and how they serialize to JSON. If you've tried to build JSON APIs in Rails and found the existing mechanisms for JSON serialization lacking, please try out Lightrail::Wrapper and let me know what you think.

Lightrail is something of an experiment. I didn't write it, but it's software I believe in so much I'd like to support it and see if people are interested in it. Rather than competing with Rails, Lightrail takes the latest, greatest Rails stack and reconfigures it for lightweight applications that provide a JSON API exclusively. Lightrails builds upon all of the modularity that Rails 3 brings to the table, and simply and easily delivers a lightweight stack which is still suitable for complex applications.

Please let me know if Lightrail seems like a good idea to you and if you'd like to help support it. As I have my hands in an awful lot of other open source projects, Lightrail isn't the sort of thing I can support full time. However, if you have some time to spare and ideas to contribute, I am definitely looking for people to help maintain and improve this project.

If you're interested in using Lightrail or helping out with its development, sign up for the mailing list. Just send any message to lightrail@librelist.com to join.

Monday, March 5, 2012

Why critics of Rails have it all wrong (and Ruby's bright multicore future)

Edit: Contrary to what I said here, José Valim is not stepping down from Rails core, he is merely on sabbatical. My bad.

Lately I've been getting the feeling the Ruby community has gotten a bit emo. The enthusiasm surrounding how easy Ruby makes it to write clean, concise, well-tested web applications quickly is fading. Rails has become merely a day job for many. Whatever hype surrounded Rails at its inception has died down into people who are just getting work done.

Meanwhile, Node.js is the new hotness, and many in the Node community have sought to build Node up by bringing Ruby and Rails down. I know that once upon a time Ruby enthusiasts were doing this sort of thing to Java. However, the tables have turned, and where Ruby used to be the mudslinging hype-monkey, it's now become the whipping boy and Node.js the new provocateur.

The sad thing is many of these people are former or current Rubyists who have taken a liking to Node and build it up by spreading blatant untruths about Ruby. I won't go as far as to call them liars, but at the very least, they are extremely misinformed, ignorant of the state of the Ruby ecosystem, and pushing their own agendas.

Jeremy Ashkenas, the creator of CoffeeScript, recently trashed Rails 3 and claimed "Node.js won":


The idea that Rails 3 was a major step backward was recently reiterated by both Giles Bowkett and Matt Aimonetti. Both of them painted building ActionController::Metal applications as some sort of byzantine, impossible task which can only be accomplished by a Rails core member. Are people actually building lightweight Rails applications using the newfound modularity of Rails 3?


Jose Valim, (now former) Rails core member, published a small, simple gist illustrating how to build barebones apps on ActionController::Metal (one of the most forked gists I've ever seen) which is further documented in his book Crafting Rails Applications. In just 50 lines of code you can strip Rails down to its core, making it ideal for use in modern client-heavy HTML5 applications. The funny thing about this gist is that while the idea of a 50 line Rails app seems pretty impressive, the basis of that gist is what Rails 3 puts into your config/boot.rb, environment.rb, and application.rb, just combined into a single file. Did I just blow your mind? Sadly, all the (in my opinion completely undeserved) bad press seems to have made Jose emo as well, and he has stepped down from Rails to pursue his Elixir language.

ActionController::Metal-based applications (along with apps written in Clojure) were the basis of our backend at Strobe, where we sought to ease the pains of people building modern client-heavy HTML5/JS applications with frameworks including SproutCore/Ember, Backbone, and Spine. ActionController::Metal provided a great, fully-featured, mature, and modular platform for us to build applications on top of, and Strobe's ActionController::Metal stack for client-heavy HTML5/JS applications is available on Github. The apps we built with the Strobe ActionController::Metal stack talked only JSON and our frontend was an HTML5/JS application written with SproutCore.

Before Strobe, I worked at a company building rich HTML/JS applications for digital television. Our backend was written in Rails. Our frontends were Flash and HTML/JS applications, the latter of which were single-page client-heavy HTML/JS apps that were packaged in .zip files and installed onto digital televisions and set top boxes, a sort of weird hybrid of web technologies and installable applications. Our Rails application didn't have any views, but provided only a JSON API for the HTML/JS frontend to consume.

Rails was great for this, because it provided the sort of high level abstractions we needed in order to be productive, ensure our application was well-tested, and above all else provided the necessary foundation for clean, maintainable code. I was doing this in 2008, and even then this was already considered old hat in the Rails community. In case you're not paying attention, that's one year before Node even existed.

Modern HTML5/JS apps depend on beautiful, consistent RESTful JSON APIs. This is a great way to develop rich interactive applications, because it separates the concerns of what the backend business logic is doing from the view layer entirely. Separate teams, each specialized in their role, can develop the frontend and backend independently, the frontend people concerned with creating a great user experience, and the backend people concerned with building a great API the frontend application can consume.

Rails is great for JSON APIs.

And yet this meme persists, that somehow Rails is actually bad at JSON APIs. Those who propagate this meme insist that Rails has lost its edge, and that only Node understands the needs of these sorts of modern client-heavy web applications. Giles recently attempted to argue this position:


Giles recently blogged about this issue at length. Let's look at what he has to say about ActionController::Metal and the new level of modularity and clean design that Rails 3 brings to the table:


So Jose wrote a great book about the incredible power of Rails 3's new modular APIs... but... but... but what?

WARD CUNNINGHAM BITCHES. TWEETS > BOOKS. NODE WINS. QED.

Hurrrrrrrr? Ward Cunningham is a cool guy and his concept of a Wiki was a transformative technology for the web, but what the fuck does that have to do with Rails 3's new modular APIs or Jose's book? I think that's what people in logical debate circles call a "non-sequitur".

Perhaps there's still a cogent argument to be had here. Let's dig deeper:


Okay, so the problem is there's not a damn simple way to do websockets. OH WAIT, THERE IS:


Cramp is an awesome, easy-to-use websockets/server-sent events framework (with socket.io support) which runs on Rainbows or Thin, and Thin is a great web server. According to my benchmarks it's approximately the same speed as Node's web server:

Web Server            Throughput  Latency
----------            ----------  -------
Thin    (1.2.11)      8747 reqs/s (7.3 ms/req)
Node.js (0.6.5)       9023 reqs/s (7.1 ms/req)
Yes folks, Node isn't significantly faster than Ruby at web server performance. They're about the same.

Giles also bemoans bundler, because typing "bundle exec" represents ceremony, and using any of the myriad solutions to avoid typing "bundle exec", such as bundler binstubs or rvm gemsets, represents configuration which violates the Rails mantra of "convention over configuration", and how npm is that much easier. I'm sure we would all love to not have to add a one line .rvmrc file to each project to avoid typing "bundle exec", but uhh Giles, bro, mountain out of a molehill much?

Meanwhile, let's check out how convention over configuration is going in the JavaScript world:


But enough about Giles... what kinds of awesome, modern HTML5 applications are people using Rails to build?

I think one of the best examples of this sort of application is Travis CI. Travis is an open source distributed build system with an Ember-based frontend and a Rails backend. Travis's interface shows, in real time, the state of all builds across the entire (distributed) system, allows you to interactively explore the history, see the distributed build matrix completing jobs in realtime, and even have it stream the console output of builds in progress directly to your browser as they complete. It's an amazing, modern client-heavy HTML5/JS application, and it's built on Rails.

Who else is using Ruby/Rails for their frontend? Oh, just Twitter, LivingSocial, Groupon, Heroku, EngineYard, Github, Square, Zendesk, Shopify, Yammer, Braintree, Boundary, Stripe, Parse, Simple, and of course let's not forget 37signals. Rails is the technology underlying the frontend web stack of many huge businesses. Many of these companies have client-heavy HTML5/JS applications which consume a JSON API coming out of Rails. Many of them have APIs that are routinely cited as archetypical RESTful JSON APIs. Many of them have top notch engineering teams that choose the best tools for the job and use many languages for many different tasks. Many of them were founded "post-Node" and had the opportunity to choose Node as their frontend web technology, and while they may use Node in some capacity, their main consumer-facing sites are written with Rails.

Node is three years old now. Where are the Node.js success stories? Who's built a brand on top of Node?  Nodejitsu? Hubot? Is Node anything more than a pyramid scheme or a platform for Campfire bots? Where Rails selling points eschewed performance and instead focused on clear code, rapid development, extensive testing, and quick time-to-market, Node's selling points seem to universally revolve around its insanely fast, destroy the internet fast performance (benchmarks not provided). Meanwhile code quality is de-emphasized and large Node programs degrade into incomprehensible, byzantine structures of callbacks and flow-control libraries, instead of being written in sequential code, you know, the code you can read:

 

What about Ruby in general? What advancements in the Ruby ecosystem are worth getting excited about?

JRuby is maturing into a high-performance Ruby implementation which taps the JVM's advanced features including the HotSpot compiler, multiple pluggable garbage collectors, and parallel multithreading which makes it suitable for multicore applications. One thing I think sets JRuby apart is that it's the most mature language on the JVM which didn't start there. Other projects to implement non-JVM languages on top of the JVM, such as Rhino and Jython, have languished, while JRuby keeps going strong.

The most exciting development in JRuby is Java 7's new InvokeDynamic feature. The Java Virtual Machine was originally designed for the statically-typed Java language, but has its roots in dynamic languages, namely Smalltalk. With InvokeDynamic, the JVM has come full circle and now natively supports dynamic languages like Ruby. InvokeDynamic provides the necessary information to the JVM's HotSpot compiler to generate clean native code whenever Ruby methods are called, in addition to many other potential optimizations. So how much faster will InvokeDynamic make Ruby?


Rubinius, a clean-room Ruby virtual machine based on the Smalltalk-80 architecture, is also a very exciting prospect for the Ruby community as it matures and reaches production quality. It features an LLVM-based JIT compiler, parallel thread execution, and advanced garbage collection, also making it suitable for multicore applications. Beyond being an awesome Ruby implementation, Rubinius has evolved into a true polyglot platform and now features multiple Rubinius-specific language implementations including Fancy and Atomy.

MacRuby also eliminated the GIL from their implementation and now supports parallel thread execution along with an LLVM-based JIT compiler.

There are no less than three Ruby implementations which now support thread-level parallelism and thus multicore CPUs. This is especially relevant in a time when computing is undergoing a sort of phase transition from single-threaded sequential applications to massively multithreaded concurrent applications and distributed systems made out of these multithreaded applications.

It wasn't too long ago that having even four CPU cores in your home computer seemed like a lot, and now 16-core commodity AMD CPUs are available. The future is multicore, and if your programming language doesn't have a multicore strategy, its usefulness is vanishing. Following Moore's Law, the number of cores in a CPU is set to explode exponentially. Is your programming language prepared?

Thanks to JRuby and Rubinius, Ruby can take advantage of multicore CPUs. This still leaves the small matter that multithreaded programming is, uhh, hard. Fortunately I have some ideas about that.

Celluloid is an actor-based concurrent object system that tries to pick up on the concurrent object research that was hot in the mid-90's but died shortly after the web gained popularity. In the '90s concurrent objects were ahead of their time, but with the advent of massively multicore CPUs I believe it's an area of computer science research that's worth reviving.

Celluloid packages up Ruby's core concurrency features into a simple, easy-to-use package that doesn't require any modifications to the language. Where many functional languages solve the issues surrounding concurrency with immutable state, Celluloid solves it with encapsulation (more information is available on the Celluloid github page).

Celluloid takes advantage of many of the features of Ruby, including parallel threads, fibers (coroutines), method_missing (proxy objects), and duck typing. There aren't many other languages with this particular mix of features. Python probably comes the closest, aside from multicore execution due to its GIL. Jython supports parallel thread execution thanks to the JVM but seems abandoned. For what it's worth, Python once had a concurrent object system quite similar to Celluloid back in the '90s called ATOM, unfortunately the source code has been lost.

Ruby is by far the best language available today to implement a system like Celluloid, and that alone makes me excited to be a Rubyist. Where Node.js gives you a hammer, the single-threaded event loop, Celluloid gives you a large toolbox and provides a singular framework of interoperable components which can be used to build arbitrary hybrids of concurrent multithreaded applications, event-based nonblocking applications (that are callback-free!), and distributed systems.

Ruby is a language which can survive the massively multicore future. Whether Node will stick around remains to be seen.