High Availability
An Active-Active Cluster Configuration.
High Avaialability
High Availability

High Availability (HA) is a term used to describe the state of the TCPWave DDI appliances to ensure that they are resilient enough to provide a seamless service without any intermittent disruption or a total failure. With numerous services that are embedded in the TCPWave DDI ecosystem, this page describes how the redundant components are designed. An SLA (Service Level Availability) score of 100% is what TCPWave delivers with the HA design. TCPWave's HA design is also known as T-Mesh technology. The DDI management layer leverages T-Mesh technology. The DDI management devices in a T-Mesh cluster process the DDI transactions by performing database operations on a replicated database. The T-Mesh technology provides the enterprises with a management layer that does not have any system downtime. It is common for a data center to have ISP blackouts, power failures, network failures, etc. These unexpected events could bring down the services that are local to that specific data center. Since a typical T-Mesh cluster is spread across different data centers and different regions, it is immune from getting impacted by an outage that is particular to a specific site. The cluster is also designed to scale up to handle increased load and high levels of traffic. The DDI transactions processed by the T-Mesh HA cluster the ACID (Atomicity, Consistency, Isolation, and Durability) model. This model ensures that a large volume of concurrent DDI transactions takes place with the highest degree of reliability.

The T-Mesh Technology

The T-Mesh technology supports multiple TCPWave DDI Management appliances serving the global DDI remotes. The underlying database uses a write-set synchronous replication. The T-Mesh consists of a single floating HA master and multiple HA members. When the floating HA master fails, the next available member automatically assumes the role of a master without any human intervention. The franchise critical DDI transactions do not see any impact when a single floating HA master or a HA member goes down. When a temporary network interruption takes place, the T-Mesh technology auto-recovers, and the DDI management ecosystem is synchronized with a degree of transparency that the end-users see no impact. The T-Mesh cluster maintains a cache to expedite the recovery of a member that has fallen out of sync. The remote DDI appliances are designed to auto-sense the failure of their preferred DDI manager. A HA member failure will automatically prompt an election on the connected remotes to choose their next best DDI manager. The DDI administrator also has the UI ability to swing the DDI remotes from one management node to another without causing any service disruption. The T-Mesh cluster technology is designed to operate with the least amount of configuration changes from the end-users. The T-Mesh cluster self-tunes its configurations every thirty minutes to deliver maximum performance. It is recommended to have three nodes in a T-Mesh. The TCPWave monitoring engine periodically monitors the health of the T-Mesh cluster.

Business Advantages

While the T-Mesh technology eliminates the single points of failure in an enterprise, it also provides the DDI administrators with an easy to use user interface to maintain and monitor the cluster. Joining a member or removing a member can be done easily using the web interface. Updates and upgrades, configuration changes, viewing the remote's logs on the web interface, restarting the services, etc. of any DDI remote can be performed from any T-Mesh DDI controller. Internally, the T-Mesh cluster uses a delegate method that hands over the management activity of a given remote from one management node to another. Even though the action is initiated from one management node, the actual node that performs the appropriate action on the remote is the HA node that is directly connected to the remote using the T-Message Secure Tunnel. The transport layer used in the T-Mesh ecosystem is encrypted using the highest degree. The management traffic is encrypted using SSL over a unique TCP port. It is important to note that the nodes in the T-Mesh HA cluster need a proper clock discipline. The clock offset between the nodes of a T-Mesh ecosystem plays a significant factor in the overall stability.

Performance with Reliability

The T-Mesh technology comes with a built-in conflict resolution logic. When two nodes have a dispute, the third node automatically acts as an arbitrator. The dispute resolution takes place in milliseconds. TCPWave recommends that all the nodes on the cluster be on the same network speed, hardware type, patch levels, etc. Since the T-Mesh uses a replicated write-set to guarantee data consistency, a given write operation must be performed on all the nodes. Every DDI operation that uses any of the 1490 plus REST API calls, gets processed within a few milliseconds on the performance-optimized database.

Disaster Recovery

TCPWave's IPAM provides a sophisticated "Disaster Recovery" (DR) mechanism, just in case the IPAM goes down for any reason, to provide a stable service to the users. There are two IPAM appliances in the setup and one IPAM manages all the DNS and DHCP appliances. This active IPAM appliance is known as the Master IPAM appliance. The other IPAM appliance, however, is passive and is known as a slave IPAM appliance. All the data from the Master IPAM appliance gets replicated to the slave IPAM appliance which forms an active-passive setup. If for any reason, the master IPAM appliance goes down, the slave IPAM appliance can be brought up and all of the DNS and DHCP appliances automatically get connected to it for management purposes. Switching between the master and slave IPAM appliances is seamless. The IPAM uses the latest technologies like Galera Cluster Technology to replicate the data from master to slave appliances and provide a superior disaster recovery solution to the enterprises.