Debugging Rails Cache Writes With Wireshark

Recently, I found myself in a situation where I needed to share memcached entries between an internal legacy rails app and a new rails app. On the surface, this seemed straightforward since both applications could talk to the same memcached servers and be configured to use the same namespace. As is often the case with legacy code, it ended up causing me some grief and I thought I’d step through the process I used to sort it out.

Setup

First let’s take a look at the underlying setup. In both apps, the cache store is configured in config/application.rb. We tell Rails we’re using a :mem_cache_store and pass some basic configuration options. Rails takes care of the details.

1
2
3
4
5
6
# config/application.rb
module OurApp
  class Application < Rails::Application
    config.cache_store = :mem_cache_store, 'server1:11211', 'server2:11211', { 
      namespace: "shared_cache#{Rails.env}"
    }

Note: we’re using Rails.env to namespace the caches and avoid any possibility of keys set in development or test leaking into production and causing issues there. You could also use a null store in those environments, or different servers altogether, but this is a nice failsafe.

One other important detail: if we dig into the implementations of MemCacheStore in Rails 3 and 4+, we find that Rails 3 uses Mike Perham’s memcache-client gem for the underlying implementation. In Rails 4 and up, that’s changed to dalli. This detail will become important later on.

Initial Tests

Now that we’ve got everything set up correctly on paper, we can boot into the rails console in each application and run some manual tests to see if things work as expected.

Let’s start by writing a key from the new application and reading it back out:

~$ bin/rails c
Loading development environment (Rails 5.0.1)
>> Rails.cache.read 'test1'
=> nil
>> Rails.cache.write 'test1', 'val-1'
=> 12620580400784211968
>> Rails.cache.read 'test1'
=> "val-1"

Everything there looks good. Next, we’ll boot into the legacy console and see if we can read the same key:

~$ bundle exec rails c
Loading development environment (Rails 3.2.22.5)
>> Rails.cache.read 'test1'
=> "val-1"

That looks good, as well. Just to cover all our bases, let’s invert the order: we’ll write a key from the legacy app and read it from the new app:

~$ bundle exec rails c
Loading development environment (Rails 3.2.22.5)
>> Rails.cache.read 'test2'
=> nil
>> Rails.cache.write 'test2', 'val-2'
=> true
>> Rails.cache.read 'test2'
=> "val-2"

Lastly, we’ll read that value out of the new rails app:

~$ bin/rails c
Loading development environment (Rails 5.0.1)
>> Rails.cache.read 'test2'
=> "\x04\bI\"\nval-2\x06:\x06ET"

Hmm, that’s not right.

I think I’ve seen this type of string before and that gives me a direction to go with debugging. This looks like a byte stream you’d get after running a value through Marshal.dump. Let’s try to load the value using the Marshal library and see if it works:

>> Marshal.load(Rails.cache.read('test2'))
=> "val-2"

Success! Now we know that the problem is the cache store not unmarshalling the value when it should and we can start trying to figure out why that’s happening.

To Marshal or Not To Marshal

The first thing I’d normally do when faced with a problem like this is start digging through the implementations of the Dalli::Client and MemCache classes and see if I can spot the difference, but I’m not familiar with the codebase for either gem and it’d be nice to have a clue as to what I’m looking for.

One quick test we can run is to capture the memcache requests off the network using tcpdump and compare the two tests above to see what’s different. I ran the following command and saved a packet capture of each cache-write-read variation above:

sudo tcpdump -w /tmp/capture.pcap -i eth0 -vv 'port 11211'

First, let’s compare the captures seen reading the two cache entries from the updated rails app. The main thing that stands out here is the Flags field in the Extras section is different:

Read Rails 3 Entry	Read Rails 4 Entry

Next, let’s compare what the writes look like from both apps.

Write from Rails 3	Write from Rails 4

Alright, so that pretty much settles it. When we write a cache entry from the updated app, the request includes an Extras section with a Flags value that seems to be returned when we read the cache entry back, and that likely triggers the behavior in the Dalli client that runs the value through Marshal.load.

Wrapup

I took a quick spin through the Dalli and memcache-client implementations and came across the following to confirm my assumptions:

The Memcached protocol does, in fact, support setting flags with commands
How client implementations use that field is completely implementation-specific. The memcached server only cares that it’s an integer.
dalli uses the flags value when reading values to know if it should uncompress or unserialize (with Marshal) the value
the memcached-client gem looks to not support flags at all. Even if it did, there’s nothing in the memcached protocol binding it to using the same flag values as dalli

At the end of the day, I decided to change the legacy app to use Dalli as well. There wasn’t much risk in me taking that approach, and it’s always nice to make things forward compatibile rather than the other way around if it can be helped.

Like a lot of low-level things, this was a change I poked at for an hour or so at a time across several days. This post is largely just a compilation of the notes I took along the way, but it hopefully demonstrates that some tools we seldom think to use as an interpreted language developer can be invaluable in helping to quickly sort out problems.