Monzo is an online bank serving nearly 3.5 million customers, a tremendous growth achieved primarily by leveraging open source Cloud Native technologies.
AWS has been the obvious choice for Monzo’s foundation. In this video, Monzo engineers Chris Evans and Suhail Patel walk through how services like the Amazon EC2, Amazon S3, Amazon GuardDuty, AWS IAM and AWS Shield enable the process of payments with tight latency and with a great level of reliability tolerance.
In addition, the video also explains how Monzo is run with very minimal physical infrastructure and how it linked to the AWS Direct Connect to establish secure and reliable transmit payments.
Suhai Patel begins to discuss how the Compute cluster was built on top of Amazon EC2 and Kubernetes. He adds that Monzo was able to deploy over 1500 microservices with thousands of replicas.
At 9:49 Suhail explains that the engineers at Monzo trigger a deployment using the local deployment tool known as the shipper. The deployment and the rolling services are present within the Kubernetes cluster. The deployment service validates code, does the statistical analysis and also builds a container image. The rolling deployments are being invoked and completed through Kubernetes.
Service discovery and Service communication
At 12:11, the presenters begin to demonstrate how the concept of service discovery is incorporated at their company. He begins by stating the ‘envoy config provider’ is an ideal component of service discovery. This config provider’s main responsibility is to update all the envoy processes.
The envoy config provider has a hook directly into the Kubernetes API. The Kubernetes API will in turn provide information to config providers which can then be propagated to all of the running envoys that are listening for updates. At 13:00 the presenter mentions service to service RPC calls(Service communication) in the envoy world becomes easy as all the networking capabilities are taken care of by the envoy. At 14:04, the way the data gets stored at Monzo is discussed.
The data at Monzo is stored in Cassandra which runs outside Kubernetes on top of EC2. He adds that Cassandra is a masterless system, as a result, no particular node becomes responsible for coordinating the jobs. In addition, it is ensured that all the nodes have the required information.
At 17:32 Suhail talks on how messaging works at Monzo. He mentions that asynchronous messaging and processing is taken care of by NSQ/Kafka. The message queues like NSQ and Kafka offer asynchronous flows with at least once message delivery semantics. Both of them are capable of high throughput as well as high available message queues. At 18:26 he adds how distributed locking with etcd are incorporated in their system.
He states the ‘Etcd’ is a highly available, consistent, distributed, and reliable key-value store that is primarily helpful in implementing the distributed lock. In addition, it also guarantees high throughput and low latency. He adds that all the reads and writes will go through the leader node in order to ensure the most consistent view. The leader would then make sure that the information is propagated to the majority of nodes in the system. The leader node will ensure that all the nodes receiving the propagated information have written the logs to the durable storage.
At 21:46 Chris mentions that a combination of Prometheus and an open-source project called Thanos is used for the purpose of monitoring and gathering useful metrics. He adds that Prometheus is a flexible time-series data store and a query engine. Thanos helps in treating many Prometheus servers as a whole with infinite retention.
The combination of Prometheus and Thanos has been very helpful for asynchronous processing, distributed locking, monitoring network use, RPC request, and response cycles. At 23:57 he adds that Thanos Query aggregates data across multiple Prometheus servers. Thanos store will act as a proxy in front of S3 and is a vital part when it comes to gathering metrics and monitoring.
This video illustrates how service discovery, communication and asynchronous messaging happens within the Kubernetes framework at Monzo. In addition, the presenters also illustrate how distributed locking is employed and reiterate that all information gets propagated through all the nodes.
The way the huge data gets stored in Cassandra is also briefly explained. Information about the Prometheus as well as Thanos is provided along with their core functionality which includes monitoring and collecting useful success and failure metrics. The presenters summarize the different methods Monzo has used to leverage the open source Cloud Native technologies to expand its growth.