“We can solve any problem by introducing an extra level of indirection” — Event Sourcing is the privileged model to do it.
The idea of Event Sourcing is the following: to capture and save the changes of an application state as events. Before the term has been coined by Martin Fowler, the idea was referred as the Append-Only-Log, Write-Ahead-Log or simply the Log.
Capturing and saving all the application changes has the advantage of allowing to know the state of the application for any given instant and also to know how it got there. Event Sourcing brings two key features: the time dimension and the warranty that no information is lost. The time dimension is obvious to note, as every state change is saved in a time ordered serie. Depending on the events naming, reading an Event Log can be as informal as reading a journal, check this event history:
UserRegistered -> UserEmailValidated -> UserAddedHomeAddress -> UserUploadedProfilePicture -> UserAddedHobby -> UserRequestedToJoinGroup -> UserMarried -> UserChangedHomeAddress -> UserChangedProfilePicture.
It’s obvious the sequence of changes the user performed. We not only see the user changed the home address but we can see he did it after marrying. By saving all changes we can at any time answer questions like: how many times the user changed home? Keeping all changes allows to answer any temporal question the business needs, not only now, but in any point in the future.
Imagine an online store with a shopping cart where some client performs the following actions: AddProductToCart -> AddProductToCart -> StartCheckout -> CancelCheckout -> RemoveProductFromCart -> StartCheckout -> CompleteCheckout.
The client ordered only one product, removing the second one after finding the order total is too high. Is it the same if the client added only one product and performed immediately the checkout? the answer is no! When a client adds a product to a cart he shows interest on the product. Saving that information can be useful for the business, like allowing to target the client on future campaigns related to the product. Although the end result of the cart is the same, if we model the application to only store the ordered products, some information about the behaviour of the client is lost. Adding the feature to memorize removed products can always be done, but all the past history is lost, and we cannot predict how important the data can be in the future.
Corollary: In any application, if it is possible to have two or more different actions ending up in the same state, and the application does not capture those actions, the applications is losing information.
The importance of recording all changes is not new. In fact all mature industries out there keep a history log to track all changes. Try to break a bone and go to the hospital. The x-ray will be appended to your medical report and any subsequent exames will not replace the previous, but will be appended, providing a medical history about the patient. Accountants, lawyers, traders, police, insurance agents, all use append-only models in their processes.
If we expect to find examples of Event Sourcing in software we can find it in the most mature technologies. The Transaction Log is a critical component in the SQL Server architecture, it records all modifications made by all transactions and allow to recover from any failure to a consistent state. It is also used in replication across multiple instances. Git is the most popular source control system and has as core concept the write ahead log. Nowadays virtually any developer works with some technology that uses an Event Sourcing model.
The idea of not losing information may be the most obvious benefit of Event Sourcing and maybe the most advertised feature when someone is selling the idea. But there are so many good benefits from using it that it is unfair not to focus in the other benefits. In the rest of this article I’ll pick one of the most interesting aspects of Event Sourcing and explore it. I already presented the idea in the first paragraph of this article: Event Sourcing is the privileged model for indirection.
The main idea is simple: modern applications are complex, full of use-cases. Providing a single model that supports all use-case is hardly doable. We invariantly end up creating several models. And in those cases, where several models emerge, Event Sourcing shines.
Let’s look to a concrete example. Take the famous hotel reservation site booking.com. When opening the main page we are allowed to perform a text search typing any hotel name, city or region in the world. We can filter by dates and number of guests. Then we land in a result page where we can check the pictures and all kind of description for every result.
In the left panel we can refine the search selecting any preference. We are even presented with a counter for each filter option.
Upon all these features there are the non functional requirements demand. Booking is used by millions of users providing a real time navigation experience with very low latency requests. These kind of requirements are hardly provided by only one database model. A full-text search over a considerable amount of data needs a text index database, something like Lucene, Solr or Elastic Search. The back-office that manages the hotels and the reservation system is probably backed by an ACID capable database to avoid overbooking and support other consistency constraints. The filter’s counters probably use some faceted search API or aggregation feature of some specialised database, or even an in-memory implementation based on bitmap structures. The detail page is probably served by a document database that is very good for nesting relationships and edition.
A universal truth about every database is that all of them suck in some way. Picking the right database for the job is choosing the tradeoffs we are willing to pay. In some cases, using only a database is not ideal or even possible. Modern systems are growing in complexity and scalability challenges. Adopting several models for the application can be a matter of need. The old fashion relational database with some CRUD services upon just won’t do the work in many cases.
After violating the first law of object distribution (don’t distribute) we can just pray for things to end up alright, or we put the hands to work and steal the best ideas from the giants. Choosing to replicate the application data in several database models does not comes for free. We lose data consistency, and we need a lot of work to integrate the data for all those fancy databases. In the part 2 of this article I will present several iterations over a naive solution to this problem until we finally arrive to the Event Sourcing solution. Possibly the reader may find some familiarity with some of the steps I will present here, I believe most of us ended up at a given time developing some of the exemplified solutions.
I find more value if instead of just dropping the Event Sourcing solution immediately, to get there by small steps of tinkering over what I call a naive solution. This process of small iterations will allow to discuss all the problematica we face with data distribution until we finally get to a reasonable solution. Finally we will be allowed to extract the general attributes from the solution and pack it with a fancy name (Event Sourcing) and be confident to use it when we face these particular (but not so rare) genre of problems.