Sunday, 15 November 2009

100,000 tasklets: Erlang and Go

Purely out of interest, and to see how Erlang and Google's Go compare on my puny laptop (a Dell Mini 10v), I wrote an Erlang version of the example application on 100,000 tasklets: Stackless and Go.

Basically, it creates a chain of 100,000 microthreads (tasklet, goroutine, process ... take your pick), sends a value in one end and waits for the result at the other end. The number is incremented by each microthread it passes through.

The code comes in two parts. Firstly, a chain.erl module:


run(Num) ->
Tail = chain(Num, self()),
Tail ! 0,
receive Result -> Result end.

chain(0, Tail) ->

chain(Num, Tail) ->
chain(Num-1, spawn(fun() -> f(Tail) end)).

f(Tail) ->
Num -> Tail ! Num+1

And secondly a simple escript to start it off (mostly to make it easy to run under time):

#!/usr/bin/env escript
%%! +P 1000000 -smp disable

main([]) ->
Result = chain:run(100000),
io:format("~p~n", [Result]).

And the run times:

$ time ./chain 

real 0m1.520s
user 0m1.012s
sys 0m0.468s
$ time ./go-chain 

real 0m3.371s
user 0m1.672s
sys 0m1.000s

A couple of things to point out/mention:

  • The Erlang code is just beautiful (well, maybe not the escript so much ;-)). To me, it's more readable than either the Python or Go versions.
  • I turned SMP off. Yes, it's an optimisation but then the tasks are running in series so it's never going to help.
  • The Go version was compiled and linked using 8g and 8l.
  • The Go version didn't always complete and sometimes took a *very* long time ... just not when running under time for some reason.
  • Go seemed to use about 3x as much memory.

What does this prove? Absolutely nothing! Firstly, it's an unrealistic application. Also, Go is really quite new and I'm sure performance and memory use will improve in the coming months.

So why did I do this? Simply, because I really enjoy playing with Erlang and Go is interesting and a hot topic. (In my opinion anything with concurrency built into the runtime is onto a good thing ... I sure wish we didn't have to resort to Twisted, Stackless, greenlets or generator hacks in Python.)

Thursday, 22 October 2009

CouchDB and XML

Everyone knows by now that CouchDB is a JSON document store, right? What's not quite so obvious is that it handles XML very nicely too via E4X. Strictly speaking, it's the Mozilla Spidermonkey JavaScript engine that provides the E4X support but that's CouchDB's default view server.

I'm calling out to a 3rd party SOAP service and the result is returned as an XML string (don't blame me for that, it's what their WSDL file says). I've been tossing that result into a CouchDB doc - it's so easy, you might as well - in case I needed to refer to it later.

That time has come as I now need to be a bit more selective about how I handle the result. So, I fired up CouchDB's Futon and wrote a view that emitted bits of the JSON together with the interesting parts of the XML blob. Now I have a simple view of the data without all the XML "noise". Very nice :).

What's best about this is that using CouchDB, XML and E4X together is trivial and E4X is compact and quick to learn.

Wednesday, 30 September 2009

Build CouchDB on Ubuntu 9.10 (Karmic Koala)

Ubuntu 9.10 includes CouchDB in the standard desktop installation ... perhaps that's all you need?

Personally, I like to track CouchDB trunk but building on the latest Ubuntu is not quite as easy as it used to be. The following is mostly from memory so may not be quite right ...

The list of packages in the CouchDB README to compile from source does not work (at least, not for me). Instead, ensure your apt sources are up to date and install the following packages:

$ sudo apt-get install build-essential erlang-nox erlang-dev libicu-dev xulrunner-dev libcurl4-openssl-dev

Next, configure CouchDB to use xulrunner's headers and libraries, using a custom LD_RUN_PATH:

$ LD_RUN_PATH=/usr/lib/xulrunner- ./configure --with-js-lib=/usr/lib/xulrunner-devel- --with-js-include=/usr/lib/xulrunner-devel-

And then make with the same LD_RUN_PATH:

$ LD_RUN_PATH=/usr/lib/xulrunner- make

Finally, make install CouchDB as usual and you're ok to start the CouchDB server.

Note that if Ubuntu's standard CouchDB package is also installed you may need to avoid your install clashing with it. Edit your install's etc/couchdb/local.ini and set port in the httpd section to something different. I use 15984 but it doesn't really matter as long as it's unique.

Friday, 18 September 2009

Tornado ... first thoughts

It's quite amusing to see the furore that's surrounded Tornado's release, especially how it compares to Twisted.

I like Twisted a lot although I'm far from a fan boy. Deferreds are not ideal but without proper coroutines (stackless, greenlet, etc) or message dispatch (see Erlang) in the core language they're a reasonable way to model async processes.

Anyway, a colleague asked me about my opinion on Tornado the day it was released. I'd only looked at it very briefly by then (scanned the docs and scanned the code) but I'm never short on opinion ;-).

So, mostly for posterity (there are far more in-depth posts elsewhere), below is my response almost verbatim. I'm sure to be wrong about some points, and I have no problem admitting that, but I've since seen comments that confirm at least some.

Only had a quick look at Tornado yesterday; I intend to look more
closely sometime. However, a few thoughts did spring to mind ...

Simple web page looks nice and simple ... as it should.

Can't get too excited about the url dispatch and request handler.
Nothing new to see, afaict.

Performance is quite impressive but I was surprised that 4 cores
wasn't even close to 4x the performance of a single core. Probably
something else limiting it but not very clear from the chart.

No Twisted performance comparison? I suspect Tornado is a little
faster but then it's much more focussed on what it's trying to

Tornado's a callback-style framework. However, they don't seem to
bother handling errors (at least not in any consistent way). I think
that's a really bad idea. Sure, Twisted's Deferred is not ideal but I
think it's necessary. Sure, they could add an errback to every
function too ... but they don't seem to have bothered yet so they're
going to end up with an inconsistent way of reporting errors at best.
(The iostream module, for instance, appears to silently dump errors).

Their database module sucks big time. As far as I can see it blocks.
What good is that in an async framework!? It's MySQL only. Also ...
autocommit(True) ... argh!!!!

There's absolutely nothing in the docs about how to call those pesky
blocking libraries, you know like the entire Python stdlib or MySQL,
and there's no threading support afaict.

template library ... hohum.

locale support ... "Loads translations from CSV files in a directory"
... enough said.

OK, too many negatives so far ...

auth module looks very cool. i'd love to be able to support all those

web "UI modules". looks interesting, need to take a closer look.

XSRF. nice, and now I know how to handle it without sessions ... use a
cookie instead, duh!

static file serving ... i like the automatic far-future expires + file
version. i've been meaning to write something to do that for a while
now only I was going to use the file's timestamp instead of the
content hash (much better).

Tuesday, 10 March 2009

Ten Tracks - 10 music tracks for £1

Looks like Ten Tracks only launched late last year ... but wow, what a fantastic site for music lovers.

Basically, there are a bunch of channels, each channel publishes 10 tracks per month (almost, it's early days yet) and we, the music buying public, get to buy all 10 tracks for just £1.

Thank you Toob for telling me about the site and also for allowing Ten Tracks to distribute one of your tracks in this month's Open Ear channel.

Friday, 13 February 2009

World of Goo on Linux ... hurray!

2D Boy have just released the Linux version of World of Goo. As if that wasn't good enough, there are even .deb packages available.

A wonderful game just got a bit more wonderfuller (and yes, that is a real word ;-)).

Wednesday, 7 January 2009

restish resources ... from the ground up

I've been working on a WSGI web framework called restish. Yes, another one ... sorry! Only, this one's got a serious preference for trying to stay close to HTTP and the way of the web, and therefore encourages REST principles.

restish is really simple and, in my opinion, a pleasure to use. It's also extremely light-weight compared to some other web frameworks, mostly because it doesn't actually attempt to do that much :).

One feature that will hopefully be of interest is that there is no reliance on threads. There's no thread local use anywhere. In fact, they're banned, I despise the things! That allows a restish app to run happily inside a threaded Paste Deploy server or a Spawning server in greenlet mode, i.e. with threads switched off), or potentially any other web server. Basically, *you* choose how you want to deploy it. If you decide to use a threaded model then fine, but why should the web framework dictate to you form the start?

Anyway enough, let's see some code. I thought I'd try to demonstrate the basic idea of restish resources. (I do intend to move all this to the restish documentation at some point but that will take longer than an informal blog post.)

Oh, all the following bits of code *should* run. If you want to try them out, the 'application' is actually a WSGI application. Running under Spawning is as easy as:

$ spawn module_name.application

The simplest resource imaginable

from restish import app, http

def hello_world(request):
return http.ok([('Content-Type', 'text/plain')], 'Hello, world!')

application = app.RestishApp(hello_world)

OK, so there's this thing called a RestishApp. It's just a WSGI application that kicks of the request handling process. Nothing too interesting there. When it's created it's passed the root resource for the site.

A resource, at its simplest, is something callable that takes a http.Request instance as its only arg and returns a http.Response instance. You can build a http.Response yourself but the http module provides some response factories to simplify application code and save a bit of typing.

The Resource class

from restish import app, http, resource

class HelloWorld(resource.Resource):
def __call__(self, request):
return http.ok([('Content-Type', 'text/plain')], 'Hello, world!')

root_resource = HelloWorld()
application = app.RestishApp(root_resource)

You weren't really expecting anything interesting so soon were you? ;-)

Most of the time using a function as a resource is too limiting so restish provides a Resource class. It has some magic abilities as we'll see later but, just like the hello_world(request) function above, it's basically something callable.

Request parameters

from restish import app, http, resource

class Users(resource.Resource):
def __call__(self, request):
username = request.GET.get('username') or 'anonymous'
return http.ok([('Content-Type', 'text/plain')], 'Hello, %s!'%(username,))

root_resource = Users()
application = app.RestishApp(root_resource)

I hope noone's impressed by that code. In fact, I'm not even going to describe it but would like to point out that passing the username as a URL segment is almost certainly a nicer way to do things. So, moving swifly on ...

Resource children

from restish import app, http, resource

class Users(resource.Resource):

def __call__(self, request):
doc = "matt: %s" % (request.path.child('matt'),)
return http.ok([('Content-Type', 'text/plain')], doc)

def matt(self, request, segments):
return user

def user(request):
return http.ok([('Content-Type', 'text/plain')], 'Hello, matt!')

root_resource = Users()
application = app.RestishApp(root_resource)

The Users resource (the root of the site) returns a document that looks like, "matt: /matt", where "/matt" is the URL of the "matt" resource. Notice how the URL for the matt resource is created? 'request.path' is a url.URL instance - a smart string that knows how to parse and manipulate URLs, e.g. by adding a child segment. http.Request has a few URL instance attributes.

The 'matt' method has a @child decorator to expose it as a child resource factory. By default @child() uses the name of the decorated method as the name of the segment it matches so here it will be called to create a resource for the 'matt' child, i.e the thing at the URL '/matt'.

(You can pass an explicit segment name to @child instead, e.g. @child('matt'), allowing you to call your method whatever you want.)

Statically-named children are useful and quite common but dynamically-named children are more interesting.

Dynamically named resource children

from restish import app, http, resource

USERS = ['alice', 'matt', 'rebecca']

class Users(resource.Resource):

def __call__(self, request):
doc = '\n'.join(['%s: %s' % (username, request.path.child(username)) \
for username in USERS])
return http.ok([('Content-Type', 'text/plain')], doc)

def child_user(self, request, segments, username):
if username in USERS:
return User(username)

class User(resource.Resource):

def __init__(self, username):
self.username = username

def __call__(self, request):
return http.ok([('Content-Type', 'text/plain')], 'Hello, %s!'%(self.username,))

root_resource = Users()
application = app.RestishApp(root_resource)

We now have a "database" of users. OK, so it's just a list of username but you get the idea. The User resource returns a document containing a list of users each with their URL.

This time, the @child decorator has been passed a segment match template. '{username}' means match a single URL segment, extract the segment and pass it to the method as the username keyword arg.

The child_user method returns a User resource instance, giving it the username the resource represents, or None to signal a 404.

If you think the User class is bit "heavy" then, no problem, use a partial function instead, or a lambda it you prefer:

class Users(resource.Resource):
def child_user(self, request, segments, username):
if username in USERS:
return functools.partial(user, username)

def user(username, request):
return http.ok([('Content-Type', 'text/plain')], 'Hello, %s!'%(username,))

Request methods

So far, every resource would respond in exactly the same way for all HTTP methods. It doesn't differentiate between GET, POST, PUT, DELETE, etc. Let's fix that now.

from restish import app, http, resource

USERS = ['alice', 'matt', 'rebecca']

class Users(resource.Resource):

def text(self, request):
doc = '\n'.join(['%s: %s' % (username, request.path.child(username)) \
for username in USERS])
return http.ok([('Content-Type', 'text/plain')], doc)

def child_user(self, request, segments, username):
if username in USERS:
return User(username)

class User(resource.Resource):

def __init__(self, username):
self.username = username

def text(self, request):
return http.ok([('Content-Type', 'text/plain')], 'Hello, %s!'%(self.username,))

root_resource = Users()
application = app.RestishApp(root_resource)

The only difference here is that we've replaced the resource's __call__ method with a nicely named method decorated with @resource.GET(). Now the resources only respond to a HTTP GET; anything else returns a "405 Method Not Allowed" response.

$ curl -X GET http://localhost:8080/
alice: /alice
matt: /matt
rebecca: /rebecca
$ curl -X POST http://localhost:8080/
405 Method Not Allowed

I've actually just sneakily introduced some content negotation too. Not only does @resource.GET() match the HTTP method but it also matches the request's "Accept" header. However, GET defaults to an "Accept" match '*/*', i.e. any content type the client asks for.

Content negotiation ... at last

I mentioned above that decorating with @GET also performs '*/*' content negotiation. We can easily configure a resource to handle requests for different content types.

import simplejson
from restish import app, http, resource

USERS = ['alice', 'matt', 'rebecca']

class Users(resource.Resource):

def text(self, request):
doc = '\n'.join(['%s: %s' % (username, request.path.child(username)) \
for username in USERS])
return http.ok([], doc)

def json(self, request):
users = [{'username': username, 'url': request.path.child(username)} \
for username in USERS]
return http.ok([], simplejson.dumps(users))

def child_user(self, request, segments, username):
if username in USERS:
return User(username)

class User(resource.Resource):

def __init__(self, username):
self.username = username

def text(self, request):
return http.ok([], 'Hello, %s!'%(self.username,))

def json(self, request):
doc = simplejson.dumps({'username': self.username, 'url': request.path})
return http.ok([], doc)

root_resource = Users()
application = app.RestishApp(root_resource)

This time we have 'text' and 'json' methods, decorated with @GET(accept='text/plain') and @GET('application/json') respectively. Now we have a resource that will look at the Accept header, find the best matching method and call it. No match results in a "406 Not Acceptable" error.

$ curl -H "Accept: text/plain" http://localhost:8080/
alice: /alice
matt: /matt
rebecca: /rebecca
$ curl -H "Accept: application/json" http://localhost:8080/
[{"username": "alice", "url": "/alice"}, {"username": "matt", "url": "/matt"}, {"username": "rebecca", "url": "/rebecca"}]
$ curl -H "Accept: text/html" http://localhost:8080/
406 Not Acceptable

Note that the resource no longer has to specify Content-Type headers. That's because the Accept matching process knows what it found and fills it in for you ... how kind :). (Don't worry, you can still include the Content-Type in the response headers if you want to handle it yourself.)

Note also that the User resource uses shorthand in the form of @GET(accept='text') and @GET(accept='json'). They're expanded to the full MIME type on your behalf and so work just the same. Frankly, typing 'application/json' is tedious and 'application/xhtml+xml' is perverse ;-).

Well, that's all for now although there's a few other things I wanted to mention. A quick list will have to do for now:

  • wildcard accept matching, e.g. 'image/*'

  • PUT, POST, DELETE, etc

  • Content-Type header matching (basically the same as Accept matching but for data sent from the client

  • Handling multiple content types with one method, e.g. @GET(accept=['html', 'xhtml'])

  • @child URL matching in general

  • @child that matches any URL

  • Consuming additional URL segments during traversal

Hope someone finds this post interesting!

Saturday, 3 January 2009

The Royal Institution Christmas Lectures

I really enjoyed watching the Royal Institution Christmas Lectures, presented by Chris Bishop. I don't think I've actually watched them properly since I was a kid!

We sat down as a family every night to watch and it clearly sparked some interest from the kids. We've had the cover off the computer looking at its innards, we've been playing with Phun (the kids remembered it after seeing the multi-touch displays), we played with GNOME's Dasher. Heck, we even touched on public key encryption.

Sure, there were a couple of bits that weren't so good. In particualr Bill Gates was seriously dull and the programme on software, The Ghost in the Machine, was not exactly great (the kids still don't know what I do ;-)) but over all, fantastic.

If only all TV was that interesting!

GET and idempotence

I was reminded today of what seems to be a common misunderstanding of the idempotent requirements of a GET in a REST-ful architecture. GET doesn't mean the response must be the same each time; only that it must have no side effects, i.e. it should never cause the server's state to change.

For instance, CouchDB includes a resource, /_uuids, that returns a number of server-generated UUIDs. As far as I know, it only exists to support languages without a decent UUID library. It has no effect on the server.

However, CouchDB will only respond when /_uuids is POST'ed to:

$ curl -X GET http://localhost:5984/_uuids?count=2
$ curl -X POST http://localhost:5984/_uuids?count=2

One suggestion on the mailing list (although not from one of CouchDB's core developers) for the use of POST is to, "comply with REST as it returns a different output each time".

A GET would be just fine here. In fact, it would be more in keeping with the intended use of the HTTP methods.

It's easy to come up with examples of a REST-ful resource that sends a different response every request, with probably the most obvious being some sort of time server. Consider the following URLs:
  • /time
  • /time/BST
  • /time/EST
  • etc
You would GET the current time from an appropriate resource and I sure hope there will be a different response each time ;-). You could also PUT a time to one of those resource to set the time.