Transactions

Based on https://www.baeldung.com/cs/transactions-intro#2-cap-theorem

What is a transaction ?

In programming, we refer to a transaction as a group of related actions that need to be performed as a single action. In other words, a transaction is a logical unit of work whose effect is visible outside the transaction either in its entirety or not at all. We require this to ensure data integrity in our applications.

A typical requirement in event-based architecture is to update the local database and produce an event for consumption by other services: Here, we’d like these two operations to either happen together or not happen at all. We can achieve this by wrapping these operations into a single transaction.

What are the 2 types of transactions ?

Earlier Transaction Models

Transactions adhering to ACID properties are guaranteed to be atomic and serializable. A transaction processing system is responsible for ensuring the ACID properties. This worked very well for flat transactions with short execution time, fewer concurrent users, and a single database system.

But soon, as the demand started to surge, the complexities started to grow as well. Applications started to require long-living and complex transactions. This resulted in complex transaction models like sub-transaction and transaction groups.

Advanced Transaction Models

The next phase of evolution in transactions came through the support of distributed and nested transactions. The applications grew more complex and often required transactional access to multiple database systems. The distributed transaction takes a bottom-up approach while the nested transaction takes a top-down approach to decompose a complex transaction into subtransactions.

The other important evolution for transactions included chained transactions and sagas. While nested transactions worked well for federated database systems, it still was not suitable for long-lived transactions. Chained transactions presented the idea to decompose such transactions into small, sequentially executing sub-transactions.

Local vs Distributed Transactions

Operations that are part of a transaction can all execute in a single participating resource or span across multiple participating resources. Hence, transactions can be local or distributed.

In local transactions, operations execute in the same resource. While in distributed transactions, operations are spread across multiple resources:

A transaction can involve multiple independent resources like databases, message queues, or web services. These resources can execute on the same virtual machine, on different virtual machines in the same physical machine, or different physical machines altogether. The number and location of participating resources is a crucial aspect in implementing transactions.

Transaction Guarantees

One of the fundamental reasons to use transactions in handling data is to ensure data integrity. Data integrity has been well defined by a set of guarantees that every transaction is supposed to provide.

ACID Properties

Atomicity: Atomicity ensures that all changes we make to the data as part of a transaction, we make them as a single entity and operation. This effectively means that either we perform all the changes or none of them.
Consistency: Consistency ensures that we execute all the data changes while maintaining a consistent state at the start and the end of a transaction. A consistent state of data must conform to all the constraints that we define for data.
Isolation: Isolation ensures that we keep the intermediate states of a transaction invisible to other transactions. This gives concurrently running transactions an effect of being serialized. The degree to which a transaction must be isolated from other transactions is defined by isolation levels.
Durability: Durability ensures that when a transaction completes, we persist changes to the data, and any other transaction doesn’t revert those changes. Although not necessary, this also may require the data changes to be saved on the disk.

A transaction doesn’t need to provide all of them !

CAP Theorem

Consistency: Consistency is a guarantee that in a distributed data system, every node returns the most recent and successfully written value. In effect, every node has the same view of the data at all times. We must not confuse this with the Consistency in ACID, they are different concepts.
Availability: Availability demands that every non-failing node returns a non-error response to the read and write requests in a reasonable amount of time.
Partition-tolerance: Partition tolerance refers to the fact that a data system continues to function even when an arbitrary number of messages gets dropped or deployed between nodes.

CAP theorem states that a distributed data system can’t provide all three of consistency, availability, and partition tolerance simultaneously.

BASE Systems

Many distributed data systems chose to favor consistency over availability.

Basically-Available: This guarantee favors availability over consistency as per the CAP theorem. The data system will produce a response to a request, even though the response can be stale.
Soft-state: This refers to the fact that the state of the system can change over time even without any input being received. Hence, the system always remains in a soft state moving towards eventual consistency.
Eventual consistency: This is a guarantee that the system will eventually become consistent once it stops receiving any input. The data changes will eventually propagate to all nodes and all nodes will have the same view of data.

BASE is diametrically opposite to ACID in terms of the consistency model they propose. While ACID enforces consistency at the end of every transaction, BASE accepts that the consistency may be in a state of flux at the end of the transaction.

Which Distributed Commit Protocols ?

Almost all popular relational databases provide support for transactions by default. Since a local transaction involves just one database, the database can manage such transactions directly. Moreover, the application can control the transaction boundary through relevant APIs.

However, it starts to get complicated when we talk about distributed transactions. Since there are multiple databases or resources involved here, a database can’t manage such a transaction exclusively.

What we need here is a transaction coordinator and individual resources like a database to cooperate in the transaction.

2-Phase Commit

It is a widely-used distributed algorithm to facilitate the decision to commit or rollback a distributed transaction. The protocol consists of two phases:

Prepare Phase: This phase consists of the transaction coordinator asking all participants to prepare for commit, the individual resource manager can reply affirmatively or negatively.
Commit Phase: This phase involves the transaction coordinator asking all participants to either commit or rollback based on individual responses in the previous phase.

3-Phase Commit

mmm

On this page.