Real-time Rails: Foundations for a Better ActionCable

I wanted to hopefully get people interested in an idea for bringing more robust real-time data syncing primitives to Rails without involving a whole slew of third party services and new technologies to make it work.

Proposition: ActionCable is not robust

It's a beautiful API but it suffers many of the shortcomings described in this article. In my (and my company's) particular experience, the following issues are the worst offenders:

Connection Loss / Reconnection is poorly handled

Any messages broadcasted during connection loss are not replayed upon reconnection. This means you have to write your own logic to query or push missed data on client or server or both, perhaps based on some incrementing message sequence or updated_at timestamp for whatever model you're subscribing to. It's a lot of brittle race-conditiony "configuration" where there could be more convention.

Joins, after_commit hooks, observer-y logic

ActionCable exposes stream_for (and stream_from) as a convention-over-configuration-y declarative API for subscribing an ActionCable connection to a stream of updates for a given model/channel (e.g. a user with ID='machty').

Unfortunately, most meaningful payloads broadcasted to clients involve some kind of SQL join of multiple tables to produce a JSON payload to broadcast to a given client; if any one of the tables involved in a join has an update/insert/delete, any live subscriptions / action cables need to be kicked so that they can assemble a freshly queried / joined JSON payload to send to all connected clients.

What this means is that if you're building a real-time feature that involves serializing data from some deeply nested "leaf" table, something in your code has to make sure that changes to that table/model needs to somehow notify all dependent channels that data has changed and new payloads need to be serialized.

One way to do this is with a bunch of after_commit :broadcast_changes on every table whose data might be indirectly serialized in some ActionCable subscription. But that's messy, brittle, and gross, and likely will result in overbroadcasting if multiple records with after_commit hooks are being saved in a single transaction (and afaik there's no concept of debouncing broadcasts in ActionCable).

You could also make the argument that it's an anti-pattern to use after_commit hooks because it means you're mixing multiple responsibilities into a Rails model, and you'd be half-right: Rails models should NOT have to be sullied by change-broadcasting logic, but then again, it seems obviously bad to choose a solution that opens the door to some, but not all, database changes being broadcasted to clients subscribed (directly or indirectly via joins) to a model.

This is why it's not a solution to say "just use a service object to encapsulate both the mutation to the record and the broadcasting of its mutation to subscribed parties"; the root problem is keeping data in sync, and if the only way to do this correctly is to always remember to use the correct service object that knows how to do the change-and-notify dance, then you're just opening the door to data syncing problems somewhere down the line, either because you or some other new developer forgets to use the class, or, pragmatically, you have to heroku console into prod (hopefully super rarely/never) and fix a value, expecting that the fixed value shows up in all real-time apps.

"Dual writes" as an anti-pattern

Speaking of which, whether or not you use after_commit hooks all over the place or the service object pattern, both of these are just examples of the Dual Write (anti-)pattern for keeping two data stores in sync, where one data store is your relational database (e.g. Postgres) and the other is ActionCable (and all the clients currently using your real-time app). For an excellent writeup on why this a doomed-to-fail idea, please read Using logs to build a solid data infrastructure (why dual writes are a bad idea.)

But in short, ActionCable and many other pub-sub Redis queue-y solutions are just sugar over a Dual Write pattern, and I think we can do better.

My Super Hand-Wave-y Vision for the Future

Recent versions of PostgresSQL, MySQL, MongoDB, Oracle, and I'm sure many others, expose a streaming log of committed database transactions, which should be considered the source of truth for driving real-time subscriptions and live queries. I'll be using Postgres terminology because it is what I'm most familiar with.

Since Postgres 9.4 (released 2014-12-18), you can use logical decoding to subscribe to PG's transaction log and do pretty much anything you'd want with it, such as:

  1. Replicate the data to a completely different database system
  2. Write the logs to a file in some proprietary format
  3. Stream the changes to Kafka and build other decoupled real-time analytics platforms.
  4. ...perhaps use this stream to drive arbitrarily complex live real-time queries in some future version of ActionCable?

One of the central criticisms of the Lambda Architecture (whereby one part of your codebase handled servicing real-time queries, and a totally separate stack handled data archiving/analytics) was that keeping the different stacks in agreement about how the data was presented was an (obvious in retrospect) extremely difficult problem to solve. But I actually think Rails is uniquely positioned to leverage a lot of the familiar / classic Rails-y patterns in conjunction with transaction log streams to build out what everyone wished ActionCable was.

My basic idea is that we enhance ActionCable so that it is consistently consuming the database's transaction log and servicing live queries. Instead of just stream_for current_user or stream_from "locations:#{user.location_id}", you could imagine a stream_query to combine an AREL query with a serializer to set up a "live query" for broadcasting updates to each user. Something like:

def subscribed  
   stream_query User.where(location_id: params[:location_id])
                    .includes(:comments),
                PayloadSerializer
end  

Rather than using AREL to perform a database query right now, we're using it to define a live query that ActionCable would be in charge of servicing as database updates come in through the transaction log.

Theoretically, the ActionCable "engine" would be able to intelligently monitor/track subscription-impacting changes coming in via the T-log; in the above example, the ActionCable engine is "taught" to monitor all the following:

  • New/updated/deleted users whose location_id is some static value from the action cable subscription
  • Comments belonging to any of the above users

Any time any of those change, it's a signal that a new payload needs to be sent out to clients, which is where the provided PayloadSerializer comes in.

One nice benefit of this is that since AREL queries are declarative/stateless, it would be (hopefully) possible to reuse the same AREL queries and scopes that pre-existing Rails code is already using to service classic HTTP request/responses. This would hopefully prevent a lot of the split-brained APIs/stacks that were inherent in the Lambda Architecture.

Another benefit is that you wouldn't have to decide between after_commit and service objects, nor would you be going down the "Dual Write" path to data inconsistency in the process; all responsibility for keeping data in sync would be handled by ActionCable or declarative AREL queries.

Major Caveats

While Postgres has had logical decoding for 2+ years, I haven't found a hosted solution that gives you all the tools you need to make use of it (Heroku Postgres doesn't let you access WAL in any way whatsoever, and while Amazon RDS' Postgres engine does let you use logical decoding, you're stuck with built-in "test_decoder" which is practically the same as not having exposing the feature at all).

So to even build something like this you'd likely have to host your own Postgres, which for many startups is a nonstarter to say the least.

Also, I am glossing over so many extremely complex details for query/subscription state management (that would be magically handled by the ActionCable internals) that it's not even funny. But I wanted to at least get some nugget of this idea out into the world to see how folk smarter than I might take it and run with it. Or totally shoot it down. I welcome both.

Inspiration / Resources

Everyone who uses Apache Kafka or Amazon Kinesis or any stream processor driven by a robust, persistent log-based data system.

Things you should read: