How To Think About Data Consistency in Monolithic and Microservices Architectures

How to rewire your way of thinking about data transactions when you are designing applications with a unique data store and when you are designing microservices with multiple data stores.

Photo by Brett Jordan on Unsplash

When I started working with microservices architectures one of the most challenging things was how to adapt my way of thinking about data consistency among services. In this article, I want to share one hard lesson I learned while working with such architectures: you might not always reach the perfect data consistency. But eventually, you can.

The monolithic/single data store approach

Photo by Michael Dziedzic on Unsplash

This concept may look strange to those who are working daily with monolithic architectures and a single, well-recognizable, data store. In fact, when working within the boundaries of a monolithic architecture you should design your system to manage your data transactions respecting ACID properties. I am not suggesting you melt down any piece of hardware instead, I am talking about Atomicity, Consistency, Isolation, and Durability.

You have Atomicity when your transactions are either committed — and, thus, written to your data store — or they are rolled back.

A transaction may fail and require a rollback. For example, when you are signing up a new user, you write user info into different tables on your database; if some error occurs in the middle of this transaction the user will receive a 500 error but some tables may be already written and, perhaps, your user could not be able to signup with the same email again. You will lose a user. This is bad for your business. Ensuring atomicity, if an error occurs in the middle of this transaction, you will roll back each writes you have performed and the user will be able to sign up again with the same email. Your user will be happy (and even your boss) 🙂

Consistency guarantees that any transactions will bring the database from one valid state to another one.

What does valid state mean? Who defines what a valid state is? The answer to the latter is you. You define what is a valid state by placing foreign keys, uniqueness constraints, and triggers in your tables.

For example, you might want to allow signing up your users with an email just once. You do not want to have different users with the same email. Then, you may define a uniqueness constraint in your user’s table. This is how you ensure consistency with a single data store.

Isolation means that transactions should be isolated from other transactions that are running concurrently.

Databases usually achieve this by executing concurrent transactions as if they were executed sequentially. They achieve this by implementing locks.

For example, in our signup process, we may have two different users that are trying to register themselves with the same email at the same time. When the first one tries to register its account, the underlying data transaction will acquire a lock to execute in isolation from other concurrent transactions. It will look for the email and will write it to the table. The next transaction, the one for the other user that is signing up, will execute after this one and will find the email already present in the data store.

Without this property, we could have an inconsistency in our data store allowing two users to register themselves with the same email.

Durability means that as soon as a transaction has been committed, it will remain committed.

For example, if the database will crash after the registration of a particular user, this particular user will exist in our database.

Microservices and Distributed Data Stores

Photo by Shubham’s Web3 on Unsplash

Unfortunately, when you decide to split up your monolith into many different services, you cannot respect ACID anymore. In fact, when you split your application into microservices, you will probably need to distribute your data among many different databases: your data store becomes a distributed one.

When you distribute your database, you introduce a new variable between the data stores: the network.

In computer science, there is a theorem that states that any distributed data store can provide at most only two of the following three properties:

  • Consistency: In this context, it means that every read will always receive the most recent state of any entities, or an error;
  • Availability: means that every read will never receive an error, even at the cost of reading an old version of the data;
  • Partition Tolerance: means that if you have networking issues (message drops, network failures, …) in your infrastructure your system must continue to operate.

This is referred to as the CAP Theorem or Brewer’s Theorem.

Photo by Jordan Harrison on Unsplash

The mysterious difference between the two approaches (single and distributed) lies in one important factor: the network.

Networking introduces an important property which is called latency. You always have some amount of latency, even in a single data store scenario, but usually, in that scenario, the latency is so small to be considered not significant — we are engineers after all.

When you distribute data across a network, the latency becomes bigger and bigger, thus becoming relevant. The trade-off between Consistency and Availability in the CAP Theorem can be derived from this consideration: if you have a huge amount of latency, your data store might not have the most recent state of the data it is providing to its microservice.

Another important consideration can be done about the reliability of your network. If your network fails often, data is not flowing between your data stores. Thus, once again, they do not have the latest version of the data they are providing to their microservices. And, to respect the Partition Tolerance principle, they must continue to operate either by providing an error or an older version of the data to their microservices.

Photo by Lukas Blazek on Unsplash

In this case, if you want to reach 100% availability you must relax the consistency constraint. In fact, the data store cannot throw an error to its microservice telling it does not have the latest version of the data. But, hopefully, if the network works correctly and has a minimum amount of latency, it can catch up with the latest version of the data very quickly and, eventually, be consistent.

This is what is called eventual consistency: if you are looking at a certain time T some of your data stores may not be aligned with the latest version of the data, but you are sure that eventually, they will.

There are systems this is not an issue, and there are systems this must be avoided at all costs. As always said in the architectural world, it depends on your context and what you want to achieve when designing your application. If you are designing a distributed banking application, you may want to relax the availability constraint and preserve consistency. 😅

Conclusions

Single stores and Distributed stores have different strengths and weaknesses. Single Data Stores may be harder to scale but they can achieve ACID guarantees easily. Distributed Data Stores can be easier to scale but they are subject to latencies, network failures, and the famous CAP Theorem and you must choose between availability and consistency.

You must be aware of the context when designing your application to make these choices, and this is the hard part of taking architectural decisions. 💚

From monolithic to composable software with Bit

Bit’s open-source tool help 250,000+ devs to build apps with components.

Turn any UI, feature, or page into a reusable component — and share it across your applications. It’s easier to collaborate and build faster.

Learn more

Split apps into components to make app development easier, and enjoy the best experience for the workflows you want:

Micro-Frontends

Design System

Code-Sharing and reuse

Monorepo

Learn more


How To Think About Data Consistency in Monolithic and Microservices Architectures was originally published in Bits and Pieces on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Bits and Pieces - Medium and was authored by Matteo Pampana

How to rewire your way of thinking about data transactions when you are designing applications with a unique data store and when you are designing microservices with multiple data stores.

Photo by Brett Jordan on Unsplash

When I started working with microservices architectures one of the most challenging things was how to adapt my way of thinking about data consistency among services. In this article, I want to share one hard lesson I learned while working with such architectures: you might not always reach the perfect data consistency. But eventually, you can.

The monolithic/single data store approach

Photo by Michael Dziedzic on Unsplash

This concept may look strange to those who are working daily with monolithic architectures and a single, well-recognizable, data store. In fact, when working within the boundaries of a monolithic architecture you should design your system to manage your data transactions respecting ACID properties. I am not suggesting you melt down any piece of hardware instead, I am talking about Atomicity, Consistency, Isolation, and Durability.

You have Atomicity when your transactions are either committed — and, thus, written to your data store — or they are rolled back.

A transaction may fail and require a rollback. For example, when you are signing up a new user, you write user info into different tables on your database; if some error occurs in the middle of this transaction the user will receive a 500 error but some tables may be already written and, perhaps, your user could not be able to signup with the same email again. You will lose a user. This is bad for your business. Ensuring atomicity, if an error occurs in the middle of this transaction, you will roll back each writes you have performed and the user will be able to sign up again with the same email. Your user will be happy (and even your boss) :)

Consistency guarantees that any transactions will bring the database from one valid state to another one.

What does valid state mean? Who defines what a valid state is? The answer to the latter is you. You define what is a valid state by placing foreign keys, uniqueness constraints, and triggers in your tables.

For example, you might want to allow signing up your users with an email just once. You do not want to have different users with the same email. Then, you may define a uniqueness constraint in your user’s table. This is how you ensure consistency with a single data store.

Isolation means that transactions should be isolated from other transactions that are running concurrently.

Databases usually achieve this by executing concurrent transactions as if they were executed sequentially. They achieve this by implementing locks.

For example, in our signup process, we may have two different users that are trying to register themselves with the same email at the same time. When the first one tries to register its account, the underlying data transaction will acquire a lock to execute in isolation from other concurrent transactions. It will look for the email and will write it to the table. The next transaction, the one for the other user that is signing up, will execute after this one and will find the email already present in the data store.

Without this property, we could have an inconsistency in our data store allowing two users to register themselves with the same email.

Durability means that as soon as a transaction has been committed, it will remain committed.

For example, if the database will crash after the registration of a particular user, this particular user will exist in our database.

Microservices and Distributed Data Stores

Photo by Shubham's Web3 on Unsplash

Unfortunately, when you decide to split up your monolith into many different services, you cannot respect ACID anymore. In fact, when you split your application into microservices, you will probably need to distribute your data among many different databases: your data store becomes a distributed one.

When you distribute your database, you introduce a new variable between the data stores: the network.

In computer science, there is a theorem that states that any distributed data store can provide at most only two of the following three properties:

  • Consistency: In this context, it means that every read will always receive the most recent state of any entities, or an error;
  • Availability: means that every read will never receive an error, even at the cost of reading an old version of the data;
  • Partition Tolerance: means that if you have networking issues (message drops, network failures, …) in your infrastructure your system must continue to operate.

This is referred to as the CAP Theorem or Brewer’s Theorem.

Photo by Jordan Harrison on Unsplash

The mysterious difference between the two approaches (single and distributed) lies in one important factor: the network.

Networking introduces an important property which is called latency. You always have some amount of latency, even in a single data store scenario, but usually, in that scenario, the latency is so small to be considered not significant — we are engineers after all.

When you distribute data across a network, the latency becomes bigger and bigger, thus becoming relevant. The trade-off between Consistency and Availability in the CAP Theorem can be derived from this consideration: if you have a huge amount of latency, your data store might not have the most recent state of the data it is providing to its microservice.

Another important consideration can be done about the reliability of your network. If your network fails often, data is not flowing between your data stores. Thus, once again, they do not have the latest version of the data they are providing to their microservices. And, to respect the Partition Tolerance principle, they must continue to operate either by providing an error or an older version of the data to their microservices.

Photo by Lukas Blazek on Unsplash

In this case, if you want to reach 100% availability you must relax the consistency constraint. In fact, the data store cannot throw an error to its microservice telling it does not have the latest version of the data. But, hopefully, if the network works correctly and has a minimum amount of latency, it can catch up with the latest version of the data very quickly and, eventually, be consistent.

This is what is called eventual consistency: if you are looking at a certain time T some of your data stores may not be aligned with the latest version of the data, but you are sure that eventually, they will.

There are systems this is not an issue, and there are systems this must be avoided at all costs. As always said in the architectural world, it depends on your context and what you want to achieve when designing your application. If you are designing a distributed banking application, you may want to relax the availability constraint and preserve consistency. 😅

Conclusions

Single stores and Distributed stores have different strengths and weaknesses. Single Data Stores may be harder to scale but they can achieve ACID guarantees easily. Distributed Data Stores can be easier to scale but they are subject to latencies, network failures, and the famous CAP Theorem and you must choose between availability and consistency.

You must be aware of the context when designing your application to make these choices, and this is the hard part of taking architectural decisions. 💚

From monolithic to composable software with Bit

Bit’s open-source tool help 250,000+ devs to build apps with components.

Turn any UI, feature, or page into a reusable component — and share it across your applications. It’s easier to collaborate and build faster.

Learn more

Split apps into components to make app development easier, and enjoy the best experience for the workflows you want:

Micro-Frontends

Design System

Code-Sharing and reuse

Monorepo

Learn more


How To Think About Data Consistency in Monolithic and Microservices Architectures was originally published in Bits and Pieces on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Bits and Pieces - Medium and was authored by Matteo Pampana


Print Share Comment Cite Upload Translate Updates
APA

Matteo Pampana | Sciencx (2023-03-07T06:50:51+00:00) How To Think About Data Consistency in Monolithic and Microservices Architectures. Retrieved from https://www.scien.cx/2023/03/07/how-to-think-about-data-consistency-in-monolithic-and-microservices-architectures/

MLA
" » How To Think About Data Consistency in Monolithic and Microservices Architectures." Matteo Pampana | Sciencx - Tuesday March 7, 2023, https://www.scien.cx/2023/03/07/how-to-think-about-data-consistency-in-monolithic-and-microservices-architectures/
HARVARD
Matteo Pampana | Sciencx Tuesday March 7, 2023 » How To Think About Data Consistency in Monolithic and Microservices Architectures., viewed ,<https://www.scien.cx/2023/03/07/how-to-think-about-data-consistency-in-monolithic-and-microservices-architectures/>
VANCOUVER
Matteo Pampana | Sciencx - » How To Think About Data Consistency in Monolithic and Microservices Architectures. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2023/03/07/how-to-think-about-data-consistency-in-monolithic-and-microservices-architectures/
CHICAGO
" » How To Think About Data Consistency in Monolithic and Microservices Architectures." Matteo Pampana | Sciencx - Accessed . https://www.scien.cx/2023/03/07/how-to-think-about-data-consistency-in-monolithic-and-microservices-architectures/
IEEE
" » How To Think About Data Consistency in Monolithic and Microservices Architectures." Matteo Pampana | Sciencx [Online]. Available: https://www.scien.cx/2023/03/07/how-to-think-about-data-consistency-in-monolithic-and-microservices-architectures/. [Accessed: ]
rf:citation
» How To Think About Data Consistency in Monolithic and Microservices Architectures | Matteo Pampana | Sciencx | https://www.scien.cx/2023/03/07/how-to-think-about-data-consistency-in-monolithic-and-microservices-architectures/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.