High Avaliability Configuration ================================================================================ Overview -------------------------------------------------------------------------------- Deployed on Kubernetes, QKDLite achieves high availability and continues to deliver keys with minimal downtime. The following configurations each provide this guarantee. Configurations -------------------------------------------------------------------------------- The configurations described here are deployed and tested on Azure Kubernetes Service (AKS). In these examples, a node inherits the availability of its underlying ``Virtual Machine``, so node-level availability is bounded by the VM SLA (see https://aka.ms/CSLA). 99.5% ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this configuration, QKDLite is deployed on a single node per site, with no redundancy. A node failure takes the site's QKDLite instance down until the node recovers. .. figure:: images/high_avaliability/minimal.png :alt: QKDLite minimal setup :align: center 99.9% ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this configuration, QKDLite is deployed across two nodes per site, with a Kubernetes cluster at each site managing the deployment and uptime of QKDLite and its nodes. Each cluster is responsible for scheduling the QKDLite workloads onto its nodes, restarting failed containers, and rescheduling workloads onto the surviving node if a node becomes unavailable. .. figure:: images/high_avaliability/default.png :alt: QKDLite 99.9% Setup :align: center 99.95% ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this configuration, QKDLite is deployed in a hot-warm setup. Both the hot and warm setups deploy QKDLite identically to the 99.95% configuration, so each setup retains the node-level redundancy and Kubernetes-managed failover described there. The two setups are then layered behind load balancers, which provides seamless failover to the warm setup when the hot setup goes down, with minimal downtime, adding site-level redundancy on top of the node-level redundancy already present in each setup. .. warning:: To support automatic failover for load balancing, volumes should be replicated across sites out-of-band for both Alice and Bob, so that key delivery continues uninterrupted when failover occurs. Where volume replication is not possible, routing at the load balancers must be synchronised from the hot to the warm site. This coordination prevents mismatched endpoints (e.g. a Warm Alice paired with a Hot Bob). .. figure:: images/high_avaliability/ha.png :alt: QKDLite 99.95% Setup :align: center