Jul 20, 2021 — By Igor Krasnik
I hear more concerns regarding MongoDB recently and see growing interest in SQL databases amongst the developers.
MongoDB is not a hyped technology anymore like it was 5+ years ago and it is just another boring popular production-grade database. At the same time, general-purpose SQL databases made significant progress during the last few years. ex. PostgreSQL is a great database with powerful features and easy integration for BI and handling big data use-cases (and you can use Timescale for time series).
On Paralect products, we default to MongoDB for the reasons we agreed on in the past. Here's what stands behind this choice.
The Problem
The problem is pretty straightforward but it is very important to understand its specifics. It's the difference between the data format in the relational model (RDBMS
) and objects in memory (object model
).
This problem is fundamental in Computer Science and is named object-relational impedance mismatch
[see wiki]. It's fundamental because there's no trivial solution to it. The widespread "typical" solution is applying ORM Frameworks
— the frameworks that abstract data layer and allows to access relational data as objects in code.
The issue is, ORMs are (often) applied badly. ORM doesn't solve the mapping problem completely, while the ORMs users often treat ORM as a complete mapping solution. Developers start using complicated and messy mapping abstractions to map flexible complex in-memory data structures to relational rows.
While instead, with ORM, developers should strive to keep objects closer to the underlying relational model and make the code models 'aware' of abstraction rather than trying to hide it.
The detailed justification is well stated in Martin Fowler's iconic article — ORM Hate.
A framework that allows me to avoid 80% of that is worthwhile even if it is only 80%. The problem is in me for pretending it's 100% when it isn't. David Heinemeier Hansson, of Active Record fame, has always argued that if you are writing an application backed by a relational database you should damn well know how a relational database works. Martin Fowler, ORM Hate
The Solution
There's a great quote in Fowler's article
To avoid the mapping problem you have two alternatives. Either you use the relational model in memory, or you don't use it in the database. Martin Fowler, ORM Hate
The first solution is to use a relational model (plain objects) in the code without abstracting underlying data to complex objects with nesting. Requests to DB can be sent without an ORM using plain SQL or SQL builders because you no longer need complex transformations. This solution isn't friendly to the programming language as it forces using inconvenient data structures. However, it can work out well if you build a table-like view as it maps directly to the database (ex. spreadsheets).
The second solution is more applicable for a programming language — store objects natively in the database. In other words, use a document-oriented database. It doesn't mean exactly "use MongoDB" but MongoDB is the most popular and loved general-purpose document-related database out there. [StackOverflow survey 2020]. Other options include Couchbase, Raven, Firebase.
The Advanced Solution
There's an alternative architectural solution, which is more complicated yet scalable. This solution supposes using 2 databases — normalized storage for 'writes' and validations and denormalized one for reading.
This pattern is called CQRS and it was pushed by Greg Young back in the day. Check his CQRS Documents.
CQRS isn't an ideal solution too — it's a complicated architectural pattern that requires a deep understanding to implement and support. Additionally, a deep understanding of DDD concepts is required (Domain Driven Design by Eric Evans is a classic book on that).
CQRS is often tied with Event Sourcing to sync data between databases, which complicates the architecture further. Today, you can use a change data capture + debezium in any database as an event source that makes the implementation simpler, but still, you'll need a queue component (like Kafka) for a production system.
Even though the pattern itself is complicated, it offers the solution for fault-tolerant systems at scale and is based on the fundamental principles — it's convenient to read from the denormalized data source (ex. MongoDB); while normalized datasource (ex. PostgreSQL) works well as a source of truth for making transactional decisions and validating business logic.
Conclusion
Paralect adopted MongoDB from the early versions and admired us because of an easy horizontal scaling and developer-friendly outstanding UX (DX if you wish).
MongoDB is a great tool that helped Paralect teams to start tenth of products easily and scale well. It helped to avoid the mapping problem and saved dozens of hours of unnecessary technical discussions while focusing on delivery and business results.
We still love and use MongoDB on production but for sure, we're not limited with it. We complement it with relational databases, along with other specialized databases (like Redis for cache or ElasticSearch for full-text search).
We use ORMs on some projects too so don't take "ORM Hate" too seriously.
People tend to categorize things rather as bad and good, while I believe a good tool can be bad and a bad tool can be good if applied correctly.
NoSQL vs SQL NoSQL + SQL for the win
Further reading
- Do you need an ORM — more thoughts on ORM and its bad and good parts
- CQRS: What? Why? How? — a detailed long read with a practical implementation of CQRS
Did you like this article?
👉 Subscribe for more updates on Technology, Creativity and Shipping things.