Do you need to share data structures between ruby and perl ... FAST?
I recently saw
MsgPack bubble through my RSS feeds and tagged it to go have another look. It provides very
fast multi-language bindings for serialisation/de-serialisation.
A few quick experiments with the data I share (via memcached) between ruby and perl showed a write (serialisation + write to memcached) speed increase from 20s to 1.8s. Read (read from memcached + de-serialisation) performance showed similar performance increases.
All initial testing was done ruby -> memcached -> ruby but as soon as I switched to reading from memcached via perl I started getting 'extra bytes' errors from the perl side. I then tried perl -> memcached -> perl and everything was fine.
Weird.
A closer look at the data written to memcached and then read from perl showed that the data serialised with MsgPack on the ruby end was not the same as the data read by perl from memcached (validating the 'extra bytes' error).
Testing the write -> read process from perl to ruby yielded the following error:
/opt/local/lib/ruby/gems/1.8/gems/memcached-0.19.5/lib/memcached/memcached.rb:514:in `load': incompatible marshal file format (can't be read) (TypeError)
format version 4.8 required; 147.1 given
from /opt/local/lib/ruby/gems/1.8/gems/memcached-0.19.5/lib/memcached/memcached.rb:514:in `get'
from ./t_msgpack.rb:35:in `read_test'
from ./t_msgpack.rb:49
Now why on earth would I be getting a 'incompatible marshal file format' error as I am not using the ruby
marshalling lib at all?
Turns out the
memcached lib I use turns marshalling of ruby data on by default when you write to/read from memcached. This is most likely the best option for most cases where you don't want to use some other form of serialisation/de-serialisation but was really biting me here.
The solution is to simply stop the default behaviour of the memcached lib by using the following forms of get and set that turns on the 'raw' data handling switch for the memcached lib:
get KEY, false
set KEY, VALUE, TTL, false
The 'false' parameter at the end of those overrides the default behaviour turning default serialisation/de-serialisation via
Marshall off.
Reality restored.