The Virtual Commons

building extensible software for collective action research

experiment scheduling

Requirements

The vcweb framework needs to be able to support flexible scheduling of experiments which can run over the course of a month, or in the course of an hour within a controlled computer lab environment. In the latter case a typical experiment run involves a combination of timed rounds where participants make decisions via the web interface, and untimed rounds where participants read instructions, debriefings, or answer survey / quiz questions and only move on when the experimenter has made sure that everyone is on the same page. In order to support timed rounds for controlled settings AND long-running experiments, we need some way to signal our web application that X amount of time has elapsed or that the given long-running experiment round (say a 24-hour round) has completed and that we should now execute our custom experiment-specific logic to calculate results and prep the data needed as input for the next round (or that the experiment is now over, for instance).

Implementation details: Celery and RabbitMQ scheduling and heartbeat

In order to meet these requirements a few choices had to be assessed:

  1. go with a custom cron-based solution
  2. use python's threading library
  3. go with some kind of scheduling / event queue mechanism

After some research and reading of tea leaves and tortoise shells, I decided to go with integrating with Celery and RabbitMQ. Celery is our scheduling library, and RabbitMQ is a high-performance AMQP message queue implementation used by Celery. This gives us quite a bit of flexibility in terms of allowing us to schedule periodic tasks such as a persistent heartbeat with one-second granularity that can also dispatch messages to our Django signals specified in core/signals.py. RabbitMQ has a lot of functionality and appears to be a very mature and robust piece of software implemented in Erlang. We may use it to handle real-time chat within groups in the future. For now though we're going to punt on orbited+stomp+rabbitmq integration but will eventually follow up on implementing real-time browser interaction (i.e., server push).

In order to set up the scheduling you'll need to start three additional services:

  • First, install celery and django-celery via easy_install or pip or virtualenv
  • celerybeat executed via python manage.py celerybeat. celerybeat provides the "heartbeat" for the system, allowing a single core periodic task that runs every second.
  • celeryd executed via python manage.py celeryd. celeryd is the worker daemon that pulls off the periodic tasks generated by celerybeat and actually executes them.
  • rabbitmq executed via /etc/init.d/rabbitmq-server start. Acts as the underlying messaging queue / system used by Celery.

We may want to revisit these design decisions in the future as I'm starting to become concerned about the number of external / additional services that need to be installed, configured, and set up in order to run the software. At this point short of switching to a J2EE / BlazeDS or GraniteDS / Flex solution I don't know what else we can go with though. Apparently liftweb has some Comet support as well but the learning curve for Scala + liftweb might be too esoteric, at least moreso than Django + everything we're using.

Next is to add real-time server push support so that we can implement group chat that is private to the particular experiment's group. Another remaining issue is that a one-second heartbeat generates a lot of log messages to rabbitmq, but it doesn't appear that there is a way to tune rabbitmq's logs without disabling them entirely by piping them to /dev/null.