Wednesday, September 1, 2010

Sharing is caring: Ruby, Perl, Memcached and MsgPack

Do you need to share data structures between ruby and perl ... FAST?

I recently saw MsgPack bubble through my RSS feeds and tagged it to go have another look. It provides very fast multi-language bindings for serialisation/de-serialisation.

A few quick experiments with the data I share (via memcached) between ruby and perl showed a write (serialisation + write to memcached) speed increase from 20s to 1.8s. Read (read from memcached + de-serialisation) performance showed similar performance increases.

All initial testing was done ruby -> memcached -> ruby but as soon as I switched to reading from memcached via perl I started getting 'extra bytes' errors from the perl side. I then tried perl -> memcached -> perl and everything was fine.

Weird.

A closer look at the data written to memcached and then read from perl showed that the data serialised with MsgPack on the ruby end was not the same as the data read by perl from memcached (validating the 'extra bytes' error).

Testing the write -> read process from perl to ruby yielded the following error:

/opt/local/lib/ruby/gems/1.8/gems/memcached-0.19.5/lib/memcached/memcached.rb:514:in `load': incompatible marshal file format (can't be read) (TypeError)
format version 4.8 required; 147.1 given
from /opt/local/lib/ruby/gems/1.8/gems/memcached-0.19.5/lib/memcached/memcached.rb:514:in `get'
from ./t_msgpack.rb:35:in `read_test'
from ./t_msgpack.rb:49

Now why on earth would I be getting a 'incompatible marshal file format' error as I am not using the ruby marshalling lib at all?

Turns out the memcached lib I use turns marshalling of ruby data on by default when you write to/read from memcached. This is most likely the best option for most cases where you don't want to use some other form of serialisation/de-serialisation but was really biting me here.

The solution is to simply stop the default behaviour of the memcached lib by using the following forms of get and set that turns on the 'raw' data handling switch for the memcached lib:

get KEY, false
set KEY, VALUE, TTL, false

The 'false' parameter at the end of those overrides the default behaviour turning default serialisation/de-serialisation via Marshall off.

Reality restored.

About Me

My photo
I love solving real-world problems with code and systems (web apps, distributed systems and all the bits and pieces in-between).