Posted by & filed under Content - Highlights and Reviews, Programming & Development.

A guest post by Timothy Pratley, who currently works for Tideworks Technology as a Development Manager building Traffic Control software for logistics clients including SSA Marine, CSX, and BNSF.

Have you ever coded away on a great idea, and reached the point where you needed to store some data? It can be a buzzkill when you consider how the choices you are making in storage will affect your code, and how the storage target might change in the future. Should I design a database schema? Do I want to use a database? If my storage strategy changes, will I have to refactor? In this post I share with you why I default to event sourcing, and how simple it is to do in Clojure.

The Event Sourcing pattern is designed to save every change that is made to your domain model. You can reconstruct the domain model by replaying all of the events. To do this effectively requires discipline. Every change to your domain model must be done with commands that raise events. Commands validate that the caller is producing a valid update, and produce an event that represents the update. The event is sent to a pipeline that stores, publishes, and applies the update to the domain model.

Commands and events

A command validates whether an update can be done, and if so, it gathers up all of the information required to perform the update, and raises this information as an event. In Clojure we represent data in hashmaps in preference to defining structures or objects, so my events will be hashmaps that have the data required to perform the desired transform (Note that the source code for this post is available at https://github.com/timothypratley/cleventing):

Commands look so straightforward that you might be tempted to forget about them and just call raise where you need it. I prefer to keep all of the event processing and commands in pairs together in the same namespace for easy comparison, since the event processing method will be expecting matching fields in the event.

Defining the update itself is the important part. For every event type we define a transformation function of the current domain state into a new domain state. Instead of having the command call the event processor directly, we hand the event off to a pipeline. The pipeline needs to know how to dispatch the function to call by event type.

In Clojure, a polymorphic dispatch can be done with multimethods. We define a signature for accepting a world and an event. Implementations of accept will perform a data transform, returning a new world. Raising events will call accept, which will dispatch to the appropriate implementation. When defining the signature of a multimethod, we provide a function that returns the :event property from an event hashmap (which will be the event type):

Implementing event handlers is a matter of conforming to the signature and providing the event type that it is appropriate for:

When an event is raised it goes through a pipeline that will call accept. That pipeline is where we may make implementation decisions about how and where we want to store data. If we only consider the command/event/transform pattern, there is very little code overhead to conform to this pattern in Clojure.

The trade off is conforming to a command/event/transform pattern in exchange for durability, debugging, logging, history, storage flexibility, arbitrary denormalization, and read separation. In my experience, the catch is that implementing the pattern in C# is tricky. Assuming you get that part right or use a good library you are still stuck with type fatigue.

Clojure alters many implementation pain points:

  • passing around an immutable world is safe and convenient
  • multimethod dispatch is expressive
  • data format matches memory model (edn)
  • philosophical alignment (transform functions, deep nesting, hashmaps and vectors)

The event pipeline

Raising an event is primarily storing it and calling accept to apply the changes to the domain model:

I like to mix in some metadata with the event: when it happened and a sequence number. The event-type is mandatory for dispatch. Clojure does allow you to specify metadata separately, so you can do that if you prefer.

I choose to use an atom to store the domain model in. The semantic I want is one synchronous writer of state. An agent is almost perfect for this except that quite often you want to return a result to the caller indicating success or failure, whereas agents are send and forget. To ensure all events are raised synchronously I use plain old locking on a private object. Your domain semantics might require commands to be processed synchronously for validating against the domain.

We have many options available for storage, but for now we will use the straightforward file based approach:

I append to an events file and write the event data structure using Clojure’s data format. The file has the same base name as the current snapshot.

Storing history is powerful. If you want to mine your data by reprocessing events and calculating some new views, you have all the data to do so. You are not limited to the domain state.

What is publish all about? Right now publish is just a way to define additional functions to be called on each event. One obvious use is to send out notifications to clients of relevant events. Another use is when you want to maintain separate models. For example, your domain logic may not care about statistics, but you might want to build up a view of statistics by processing events as they happen in a separate model. If you have a high traffic website, you might want to have several read servers. You can feed these read servers the events and they can perform denormalization of those events into their local read model (this pattern is called Command Query Responsibility Segregation).

Rebuilding the domain

Rehydrating the domain model is a matter of replaying all of the events that were stored. We specify which state file to load as the initial model, and optionally an event-id for a particular point in time:

This is convenient for testing and debugging. You can set up state files to test a scenario, or load up the domain model just prior to an error occurring to recreate an issue.

If you need to handle a very large number of events, it is convenient to take periodic snapshots of the domain model state. That way when you want to rehydrate a state, you only need to process the subsequent events from the latest snapshot. Clojure data structures are immutable so we can write the snapshot asynchronously without fear of the world changing under our feet.

A new thread is spawned by future, which may take some time to complete storage if the domain model is very large. Subsequent events are written to the new label and may be written even before the snapshot completes. If a snapshot fails to write, we still have all of the data and can rehydrate by going back to the previous snapshot and processing all of those events and then the events with the new label.

Conclusion

Not much effort was required to get durability, as shown in this post. Our storage solution can be changed without changing the logic of our application, and we get some additional benefits in logging, debugging, and history.

Clojure’s multimethods, immutable data structures, readable data syntax, and data transform functions align closely with the command/event/transform pattern. To implement the pattern, define accept methods for every state transition, and commands to create valid events. Working on disk is usually convenient for experimenting. You can migrate to another infrastructure later without touching your logic. Event sourcing is a good default choice during application development when you reach the point where storage is required.

Be sure to look at the Clojure resources that you can find in Safari Books Online.

Safari Books Online has the content you need

Clojure Inside Out is a video where you’ll not only learn how to tackle practical problems with this functional language, but you’ll learn how to think in Clojure—and why you should want to. Neal Ford (software architect and meme wrangler at ThoughWorks) and Stuart Halloway (CEO of Relevance, Inc.) show you what makes programming with Clojure swift, surgical, and accurate.
Clojure Programming, helps you learn the fundamentals of Clojure with examples relating it to the languages you know already—whether you’re focused on data modeling, concurrency and parallelism, web programming, statistics and data analysis, and more.
The Joy of Clojure goes beyond the syntax, and shows how to write fluent, idiomatic Clojure code. You will learn to approach programming challenges from a Functional perspective and master the Lisp techniques that make Clojure so elegant and efficient. This book will help you think about problems the “Clojure way,” and recognize when they simply need to change the way they program.
Practical Clojure is the first definitive reference for the Clojure language, providing both an introduction to functional programming in general and a more specific introduction to Clojure’s features. This book demonstrates the use of the language through examples, including features such as STM and immutability, which may be new to programmers coming from other languages.

About the author

timhead Timothy Pratley currently works for Tideworks Technology as a Development Manager building Traffic Control software for logistics clients including SSA Marine, CSX, and BNSF. He can be reached at timothypratley@gmail.com.

Tags: Clojure, Event Pipeline, Event Sourcing, Hashmaps, multimethods, Rehydrate,

3 Responses to “Event Sourcing in Clojure”

  1. Jan Kronquist

    I have also been exploring event sourcing in Clojure! I like the way you store the data on file and add metadata. However, I really think you should consider avoiding locks and mutable state.

    The way I solve this is simply by returning the events instead of raising/publishing/whatever. This way the business logic in the command handler becomes really well structured: First check the preconditions, then create the events and no mutable state. Both command handles and event handlers become pure functions. Have a look and let me know what you think:

    http://www.jayway.com/2013/04/02/event-sourcing-in-clojure/

    I have also recently implemented storage using EventStore. This is currently only available as a branch in my github repo, but I’m going to write a blog post describing how it works.

  2. niquola

    I like the idea of event sourcing, but i have a question:
    What if the structure of domain model and event logs was changed, should we change/migrate all historical events or make switch by event’s versions in code applying events?

  3. Timothy Pratley

    @Jan: Good observations. Semantically a Command returns a value that indicates success or failure. Imagine a web service that places an order. The consumer of the service needs to get confirmation that the order was successfully processed. The command contains conditionals governing when the event should be accepted, but even if they pass, an exception might still occur in the application of the event. So a command as I define it requires validation, event creation, and a success/fail result of event application. You can certainly structure code such that all web services pass through a generic “command” handler which does the event raising, splitting out the event production into separate functions to enable more concise pairing and testing of the validate/event/accept code.

    Thank you for pointing out your approach. To explain my design choices:

    1) Events are any data. EDN is a system for the conveyance of values. It is not a type system, and has no schemas.

    2) Accept is a multimethod, as the minimalist way to dispatch to state transformation functions.

    3) Snapshots let me see the data and more easily set up for debugging/experimentation.

    I present an event stream solution for saving and restoring state, based upon storing change causing events. Managing state mutation is the mechanism. The path is to move from an in memory only model:

    (dosync (alter world update-in [:things id :attr] inc))
    ;; (f state arguments) => new state ;; managed by STM ;;

    to and in memory + durable model:

    domain + event => saved event, new domain
    ;; (f state event) => new state ;;
    (defmethod accept :event-type [world event] (update-in world [:things (event :id) :attr] inc))

    Recognizing that the function that creates a new state is *mostly* the same in both cases. The motivation for this post is that the important choice to make is establishing those transformation functions.

    @niquola: Absolutely, that is one of the great things about having a precise history. In principle no matter how the events or behavior change, so long as you know when those changes occur you can reprocess the entire event stream. The reason to reprocess old version events is to calculate some new information over old history. I have run into version issues when calculating a metric over old events. It has been more practical to handle the differences based on the data of the event (not a version number). For a change to event processing I snapshot the domain when switching to the new logic to avoid reprocessing old version events. For a completely new domain I calculate it into a snapshot, then hydrate it and begin from there. Versioning is mental overhead for a problem you might solve differently in the future. Yes version management can be very strong and general in an event stream if you do really need it to be.

    Thank you both for your comments