📆 Event sourcing at Pactio
[Author: Will]
It’s important to read this blog post with the end in mind: event sourcing is the key technological infrastructure that has enabled the success of our current products. It’s then tempting to judge things purely based on outcomes. For us, these have been very fast development, maximum architectural flexibility, and the ability to almost forget about the technology and focus on customer problems. After almost 3 years of development, we have been able to always meet requirements, adapt, expand, and barely ever have we thought about a rewrite. We do, however, have some hard-earned lessons to share, and they are all very important to understand to be productive in our code base. I initially wrote this as an internal guide but believe there is broader utility as a cautionary tale for other startups. I have been mulling over this blog post for the past couple of years and finally, I have set aside some time to write it all down!
It was the worst of paradigms
One crucial thing that we like to keep in mind when designing systems at Pactio is whether we need event sourcing at all. A single event stream that informs all services of what is going on in the world might be tempting, but has a lot of downsides.
First of all, Event Sourcing is a complicated paradigm. Anything CQRS is a complicated paradigm. Keeping things simple always has a value in itself. A simple system is easy to reason about, easy to troubleshoot, easy to maintain, easy to onboard people on. Would I be writing this document if we had a CRUD app built on SQL?
There is a reason why simple, SQL based, CRUD architectures are omnipresent. A lot of solutions that try to surpass them, tend to throw away the baby with the bath water. Transactions become a nightmare, mistakes are easy to make, performance is a challenge. ES has the same problems.
At Pactio, it’s important to know that we have already paid the iron price of a complicated setup for one of our systems already. Therefore, in a lot of cases, it does make sense to keep using a consistent approach, be it for time to market reason, or engineering consistency.
If you start fresh, I think the conversation is a lot more interesting. You can get very far with a very naive architecture. If you want to diverge from a tried pattern, you should be able to clearly articulate why you need something more complicated, and at least convince yourself, if not the people around you. I’d like to help you by debunking some myths that I naively believed when we started on this journey, before I present our case for event sourcing.
The audit log myth
Building an Audit Log is not a good enough upside for justifying the cost of ES. If you need an audit log, build an audit log.
When you are building an ES system you are actually building an immutable log of events. This is no silver bullet. It comes with tradeoffs and they have to be kept in mind.
When we had to build a user/authorization service at Pactio, we carefully considered whether to use ES. We had a requirement for an access log, but besides that we actually realized that event sourcing would have introduced more complications than what it solved. We built this with purely a CRUD API, and that is the source of truth. We still publish a message for our audit log service to consume, but the data flow is inverted.
The breaking changes myth
Avoiding breaking changes is a policy and not a benefit of event sourcing. If you want to stick to the same policy on any traditional approach you can. If you don’t want to do it in ES you don’t have to. In fact, at Pactio we have learnt through sweat and tears that breaking changes are almost inevitable, especially for early stage startups. You start with so little domain knowledge, and you have a categorical imperative to move fast. The only thing you can do is embrace this, and adapt.
This is very tied to another commonly presented benefit of ES: no migrations. I think this it’s either misleading or at least overrated. The elephant in the room at Pactio is that we have made breaking changes, and written migrations, even though not in SQL (at least for the core system). Our current approach so far has been to:
- version events
- write adapters to a new contract
That is effectively a migration with extra steps. Whilst it works, and it might be more in the comfort zone of a fullstack dev or a web engineer, or sometimes even a backend engineer, my hard earned advice here is to learn how to write good SQL (or the idiom on your DB tech of choice). Embrace that it’s part of your job, and a highly effective one. A few lines of SQL or some polyfilling logic are much less work than the complexities introduced by ES. I appreciate that running migrations might look scary, but remember that you should always have written a reverse migration (we call them up and down at Pactio), and you will have a DB backup. You already have two back-out strategies. How much more safety do you need?
The event driven myth confusion
An Event Sourced architecture and Event Driven architecture are fundamentally different in many ways. Needing to inform other systems asynchronously that something has happened is not a good enough reason to do ES. Does this message need to be consumed in order? Is there a causal dependency with something that has happened before? Do you just need to trigger a side effect?
Basically every sizeable backend system I have worked with was event driven. Very few of them were event sourced. They all solved the problems they were set up to solve, and none of them was particularly better than the other. The two paradigms are massively different, and it’s important to fully understand what you are after before making a technological choice.
It was the best of paradigms
You’ll be shocked to discover that our core system is event sourced! Why you might ask? Well, some systems are just a very good fit for the paradigm
I think we made a great choice to use ES for the core deal service as we have a lot of the correct requirements.
For this core system, which manages Pactio projects (you can visualise these projects like a mix between Excel, Powerpoint and Word) we had, amongst others, the following product requirements:
- Project audit log (yes, I know, it’s not enough)
- Restore a project to any point in time
- Versioning (and merges down the line) a project
- Undo / redo any action
- Collaboration and some conflict detection / resolution
- Offline mode
- In order consumption of the events created (on a project) by other systems
- More broadly, constructing multiple views of the data, at different timelines in the present, past, and future
- As a fun extra, this same functionality needed to be run in the same manner in the Front End and Back End, in order to achieve a level of performance that our users would be happy with, through a fully optimistic UI
We still don’t support all of the above to this day, although we support most. The above are actually very tricky engineering challenges, and if they sound interesting to you, let’s chat!
The perfect example of an ES system is git
. If you find yourselves designing a system that would be perfectly handled by hacking a product on top of a repo, then you have yourself a very good candidate for an ES system. Congrats 🎉
Our system is essentially behaving in the same way. A user is essentially running her fork of a project locally and always merging to master. The server (just like a GitHub would), acts an arbiter but is not the single source of events. They can look at their projects at any time in the past, they can revert, they can create a new version from any point in time. They can undo or redo their actions, and they get notified when someone has performed a specific action on their projects, just like a review or a merge in GitHub.
Whilst we could have maybe achieved the same with SQL Server temporal tables, or a client side DB that we would sync with a compatible technology in the backend. I do believe we would have struggled to match all of the requirements purely on top of DB technology, especially at the time in which we started.
At present, we are running our framework for dealing with Event Sourcing. Please note that we were not affected by a “not invented here” syndrome, when we choose to roll our own. I believe that there are (or at least there were) no good Open Source alternatives or products that would achieved the same results we are experiencing at the same cost. A full analysis of the offering is outside of the scope of this post, but there are now some new technologies emerging in the space, and if you are starting fresh it’s worth doing your own research.
We are consciously treating this framework as a tech debt item. This is not reckless tech debt, as we are willingly committing to it. We know that being very simple, it’s still fit for the current purpose, size and scope of the company. It might become a limiting factor as we scale, but the golden rule in startups is that a little (or a lot) of tech debt is a great price to pay for time to market. And the great news is that we are hiring the best people, and I have complete trust in the team being able to make the call on whether to maintain, re-write or outsource when the time comes. Our engineers are fully trusted and empowered to make decisions in terms of technical direction, and broadly speaking everything engineering 🚀
Wrapping it all up
Event sourcing is one of the key ingredients of our success, it has been very very useful to write a lot of code very very quickly (trust me, we have an insanely fast bunch here 🏎️) but we need to keep carefully considering whether new features or products need to be built on top of the same technology, in its current iteration.
I would be keen to chat with more founding engineers that had similar pacts with the devil with some of their choices, and share experiences! If you are one, please reach out. If you’ll ever work with me, you’ll quickly learn that one of my software engineering tenets is that there is no silver bullet (if you haven’t, read this paper, especially relevant in the age of AI), and at this stage, this particular lead bullet has served us well.