Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this page.

Table of Contents

What is a transaction ?

In programming, we refer to a transaction as a group of related actions that need to be performed as a single action. In other words, a transaction is a logical unit of work whose effect is visible outside the transaction either in its entirety or not at all. We require this to ensure data integrity in our applications.

A typical requirement in event-based architecture is to update the local database and produce an event for consumption by other services: Here, we’d like these two operations to either happen together or not happen at all. We can achieve this by wrapping these operations into a single transaction.

On this page.

toc

What are the 2 types of transactions ?

Earlier Transaction Models

Transactions adhering to ACID properties are guaranteed to be atomic and serializable. A transaction processing system is responsible for ensuring the ACID properties. This worked very well for flat transactions with short execution time, fewer concurrent users, and a single database system.

But soon, as the demand started to surge, the complexities started to grow as well. Applications started to require long-living and complex transactions. This resulted in complex transaction models like sub-transaction and transaction groups.

Advanced Transaction Models

The next phase of evolution in transactions came through the support of distributed and nested transactions. The applications grew more complex and often required transactional access to multiple database systems. The distributed transaction takes a bottom-up approach while the nested transaction takes a top-down approach to decompose a complex transaction into subtransactions.

The other important evolution for transactions included chained transactions and sagas. While nested transactions worked well for federated database systems, it still was not suitable for long-lived transactions. Chained transactions presented the idea to decompose such transactions into small, sequentially executing sub-transactions.

Local vs Distributed Transactions

Operations that are part of a transaction can all execute in a single participating resource or span across multiple participating resources. Hence, transactions can be local or distributed.

In local transactions, operations execute in the same resource. While in distributed transactions, operations are spread across multiple resources:

Tip

A transaction can involve multiple independent resources like databases, message queues, or web services. These resources can execute on the same virtual machine, on different virtual machines in the same physical machine, or different physical machines altogether. The number and location of participating resources is a crucial aspect in implementing transactions.

Transaction Guarantees

One of the fundamental reasons to use transactions in handling data is to ensure data integrity. Data integrity has been well defined by a set of guarantees that every transaction is supposed to provide.

ACID Properties

  • Atomicity: Atomicity ensures that all changes we make to the data as part of a transaction, we make them as a single entity and operation. This effectively means that either we perform all the changes or none of them.

  • Consistency: Consistency ensures that we execute all the data changes while maintaining a consistent state at the start and the end of a transaction. A consistent state of data must conform to all the constraints that we define for data.

  • Isolation: Isolation ensures that we keep the intermediate states of a transaction invisible to other transactions. This gives concurrently running transactions an effect of being serialized. The degree to which a transaction must be isolated from other transactions is defined by isolation levels.

  • Durability: Durability ensures that when a transaction completes, we persist changes to the data, and any other transaction doesn’t revert those changes. Although not necessary, this also may require the data changes to be saved on the disk.

Tip

A transaction doesn’t need to provide all of them !

CAP Theorem

  • Consistency: Consistency is a guarantee that in a distributed data system, every node returns the most recent and successfully written value. In effect, every node has the same view of the data at all times. We must not confuse this with the Consistency in ACID, they are different concepts.

  • Availability: Availability demands that every non-failing node returns a non-error response to the read and write requests in a reasonable amount of time.

  • Partition-tolerance: Partition tolerance refers to the fact that a data system continues to function even when an arbitrary number of messages gets dropped or deployed between nodes.

Tip

CAP theorem states that a distributed data system can’t provide all three of consistency, availability, and partition tolerance simultaneously.

BASE Systems

Many distributed data systems chose to favor consistency over availability.

  • Basically-Available: This guarantee favors availability over consistency as per the CAP theorem. The data system will produce a response to a request, even though the response can be stale.

  • Soft-state: This refers to the fact that the state of the system can change over time even without any input being received. Hence, the system always remains in a soft state moving towards eventual consistency.

  • Eventual consistency: This is a guarantee that the system will eventually become consistent once it stops receiving any input. The data changes will eventually propagate to all nodes and all nodes will have the same view of data.

BASE is diametrically opposite to ACID in terms of the consistency model they propose. While ACID enforces consistency at the end of every transaction, BASE accepts that the consistency may be in a state of flux at the end of the transaction.

Which Distributed Commit Protocols ?

Almost all popular relational databases provide support for transactions by default. Since a local transaction involves just one database, the database can manage such transactions directly. Moreover, the application can control the transaction boundary through relevant APIs.

However, it starts to get complicated when we talk about distributed transactions. Since there are multiple databases or resources involved here, a database can’t manage such a transaction exclusively.

Tip

What we need here is a transaction coordinator and individual resources like a database to cooperate in the transaction.

2-Phase Commit

It is a widely-used distributed algorithm to facilitate the decision to commit or rollback a distributed transaction. The protocol consists of two phases:

  • Prepare Phase: This phase consists of the transaction coordinator asking all participants to prepare for commit, the individual resource manager can reply affirmatively or negatively.

  • Commit Phase: This phase involves the transaction coordinator asking all participants to either commit or rollback based on individual responses in the previous phase.

Tip

For a participant to participate in a two-phase commit, it must understand and support the protocol.

3-Phase Commit

One of the key problems of the 2-Phase Commit is that it can not dependably recover from a failure of both the coordinator and one of the participants during the commit phase. The 3-Phase Commit is a refinement over the two-phase commit protocol which addresses this issue. It introduces the third phase by splitting the commit phase into pre-commit and commit phases:

The pre-commit phase here helps to recover from the failure scenario where either a participant fails or both the coordinator and a participant fails during the commit phase. The recovery coordinator can use the pre-commit phase to safely decide if it has to proceed with the commit or abort.

Warning

The biggest challenge with these protocols is that these are blocking protocols which, as we’ll see later, isn’t always suitable.

Long-Running Transactions

Traditional techniques using resource locking don’t agree well with modern applications that require business transactions in a loosely-coupled, asynchronous environment; for instance, business transactions in an application built with microservices architecture. There have been several attempts to define patterns and specifications to address long-running transactions.

Saga Interaction Pattern

The saga interaction pattern attempts to break a long-running business process to multiple small and related business actions and interactions. Further, it coordinates the whole process by managing based on messages and timeouts.

Contrary to an ACID transaction, we can not rollback in the case of Saga when a failure occurs.  Here, what we do instead is called counteraction, or compensating action. A counteraction is, however, just a best effort to undo the effect of the original action.

OASIS WS-BA

The saga interaction pattern finds a great fit for SOA-based architecture with SOAP service-based interactions. Several protocol extensions have been defined for SOAP to address specific communication requirements. These collectively fall under WS* and include protocols for supporting distributed transactions.

Web Services – Business Activity (WS-BA) defines an orderly protocol and states for both the participating services and the coordinator in a Saga-based business process. WS-BA defines two protocols:

  • Business Agreement with Coordinator Completion: This is a more ordered protocol where the coordinator decides and notifies the participants when to complete

  • Business Agreement with Participant Completion: This is a more loosely-coupled protocol where the participants decide when they have to complete

OASIS BTP

Business Transaction Process (BTP) provides a common understanding and a way to communicate guarantees and limits on guarantees between organizations. This provides formal rules for the distribution of parts of the business process outside the boundaries of an organization.

It relies on local compensating actions from participating organizations. BTP provides two different protocols:

  • BTP Atomic Transactions: Also known as atoms, is similar to transactions in tightly-coupled systems. Here, one atom coordinator and zero or more sub-coordinators coordinate a transaction, each managing one or more participants. The outcome of an atom is atomic.

  • BTP Cohesive Transactions: Also know as cohesions,  contrary to atoms, this may deliver different termination outcomes to its participants. Consistency here is determined by the agreement and interaction between the client and the coordinator.

Advanced Consensus Protocols

The problem is to achieve system reliability in the presence of random failures. Consensus refers to a process for distributed processes to agree on some state or decision. Other such decisions include leader election, state machine replication, and clock synchronization.

What we need to solve the consensus problem is a consensus protocol. A consensus protocol must provide eventual termination, data integrity, and agreement between distributed processes or nodes. Different consensus protocols can prescribe different levels of integrity here and used for different failure scenarios.

Tip

The distributed commit protocols that we have discussed so far like two-phase commit and three-phase commit, all are types of consensus protocols.

Paxos

It’s for solving consensus problem in an asynchronous network of unreliable processes : provides durability even with the failure of a bounded number of replicas in the network.

  • Prepare (Phase 1A): A Proposer creates a message identified by a unique number (n) which should be the greatest used so far

  • Promise (Phase 1B): An Acceptor receives the message from a Proposer and checked its number (n), if the number is the greatest received so far, Acceptor returns a Promise to the Proposer

  • Accept (Phase 2A): The Proposer may receive a majority of Promises from a Quorum of Acceptors and has to send an Accept message with chosen value to the Quoram of Acceptors

  • Accepted (Phase 2B): The Acceptor receives the Accept message from the Proposer and has to accept if it has not promised to a proposal with a higher identifier already

Raft

It stands for Reliable, Replicated, Redundant, And Fault-Tolerant. It offers a generic way to distribute a state machine across a cluster of computing nodes. Further, it ensures that each node in the cluster agrees upon the same series of state transitions. Raft works by keeping a replicated log and only a single node, the leader can manage it.

  • Leader Election: Every node can stay in any of the three states, namely, Leader, Candidate, and Follower. There can’t be more than one leader at any point in time. A node always starts as the follower and expects a heartbeat from the leader. When it does not receive the heartbeat, it transitions into candidate state and requests for votes for transitioning into the leader state.

  • Log Replication: When a leader receives a request, it first appends it to the log and then sends a request to every follower so that they can do the same thing. The leader on getting confirmation from a majority of the nodes can commit the message then respond to the client. The followers commit the message on receiving the next heartbeat from the leader.

  • Safety: It’s important to ensure that every log is correctly replicated, and commands are executed in the same order. For this, Raft uses several safety mechanisms. These include Log Matching Property and Election Restriction.

Tip

Raft is equivalent to Paxos in fault-tolerance and performance. Like Paxos, Raft is also formally proven to be safe.