Cloud Infrastructure Summary
In the years 2000th when internet boomed, the need for architecture to build massive services also increased. The first architecture concepts with a focus on cloud got introduced in 2003 with the release of the Domain Driven Design concept. Detailed on what kind of architecture pattern could be used evolved with the introduction of Fowler domain logic pattern the same year but also in 2005 the introduction of Cockburn hexagonal architecture where the micro-service architecture is probably derived from.
I believe that cloud infrastructure architecture is an adaption of the DDD concept with a strong focus on the cloud.
Like any architecture process, the quality goals are the drivers to build a great architecture. When it comes to cloud, the focus is on (1) Availability (2) Scalability and (3) Maintainability.
How to read this post?
This is an aggregate of knowledge and notes from a software architecture perspective on how to build a cloud infrastructure.
This means I will follow a more arc42 architecture process (which refines the C4 model), focusing on (1) How to design such infrastructure, (2) How to maintain such infrastructure.
Cloud Infrastructure Development
The first step is to detail and group in topics what the system should do. This will give us valuable hints on how to formulate the scenarios for the quality goals (next chapter).
The most common quality goals applied to a cloud infrastructure are Reliability, Scalability and Maintainability. When it comes to details, it is important to describe them using scenarios. This ensure that on a later point of time (time where we may have forgotten what we meant here), we can retrieve the information.
In a generic way, it can be described as following:
- The system shall enable operator to perform changes while keeping him away from creating faults and hard-failures
- The system shall identify and recover from partial failures
- The system shall be able to absorb a known and chosen load
- The system shall be manageable and extensible in O(log(N)) worst case, where N represent the growing load
- The system shall be updatable or roll back without major impact on the entire system
Several more quality goals could be added such as security or availability which will also influence your architecture later on.
Per definition, an architecture constraint is an external factor that will influence the solution to build. They are usually "SHALL"s, else dramatic consequences could result (e.g. regarding the legal constraints). In a less aggressive way, you could see them as lessons learned from the industry over the years.
When it comes to cloud infrastructure, building a solution compliant to the following entities are highly encouraged.
OCI - Open Container Initiative
The OCI was founded with the goal to harmonize what is (1) a container image (2) a container runtime. This way, different containers type can be created (proxy container, virtual machine container, ...) and run by the same engine.
ISA - Independent Systems Architecture
The ISA is a collection of strong recommended practices with the focus on how to build a micro-service architecture. It tries to be the SOLID principles of the micro-service architecture.
CNCF - Cloud Native Computing Foundation
The CNCF seeks to drive adoption of paradigms by fostering and sustaining an ecosystem of open source, vendor-neutral projects. They democratize state-of-the-art patterns to make these innovations accessible for everyone.
Are there constraints regarding the type of environment that needs to be built?
- Private (on premise)
- Public (AWS, Azure, ...)
- Infrastructure as a Service (IaaS)
- Platform as a Service (PaaS)
- Software as a Service (SaaS)
- (new) Mobile "Backend" as a Service (MBaaS)
- (new) Container as a Service (CaaS)
- (new) Function as a Service (FaaS)
When using public cloud, what to take care:
- Advantages and disadvantages (low start cost, usage costs, maintenance by 3rd party, scalability needs,...)
- Region & Zone concept (different zones, different laws apply)
- VM selection (virgin or specialized) (CPU, RAM, ROM)
- Type of connection (Virtual Private Cloud, VPN)
Law and regulation
Additional to the above ones, some can be industry specific (KWG, PSD2,...)
Deep-dive "Cloud Native" as constraint
A separate chapter is needed to understand what it means for a solution to be cloud native. The application of a micro-service architecture does not necessary means that the solution is cloud native.
Cloud native is more an eco-system of services that fullfil specific constraints.
These constraints are applicable on any so called service layer (IaaS to FaaS).
It can be summarized to tree constraints:
- HW resources need to be abstracted.
- Applications built on top of this HW resources and other abstractions need to horizontally and vertically scale.
- An orchestrator exist to deploy or duplicate any application on any abstracted HW resource.
Let us explore the different constraints in the next sub-chapters.
Any application that run, need resources. Some application may need more resources than others, finding the right resources is very important.
The core parameters when it comes to resources declaration are
- CPU (What kind of CPU the application needs to run on?)
- Memory (How much RAM does the application need?)
- Storage (What kind of storage does the hardware provide?)
For special resources, additional information can be added
- labels (This resource has a special GPU/HW and can also accept special applications)
- taints (This resource has a special GPU/HW and only accept application that do need it)
When it comes to k8s, such a hardware resource is also called node.
Each node added to the cluster can have its resources registered and centrally managed (via a "resource abstraction"). The cluster management entity can then dispatch/remove applications on them (on demand or automatically).
They can, but must not, consume other services.
In the micro-service architecture, common rules to build an application-service are the twelve factor which was extended by (13) API first, (14) Telemetry and (15) Authentication/authorization.
This twelve plus 3 factors directly impacts how a service looks like, similar to ISA.
Let us take a look at major building blocks for such application.
What kind of storage does the application need?
Different applications have different storage needs. The most common one would be:
- Persistent volumes (idea of HDD, SSD. After the destruction of the application, the data is kept)
- Projected volumes (idea of mapping several volume sources into the same directory)
- Ephemeral volumes (idea of RAM, RAMDISK, RAMDISK. It is a volume that is destroyed after the application is destroyed)
- Dynamic volumes (idea of a volume that is created on demand. E.g. if an application needs additional volumes, it can can request it, it can get it without intervention of any operator)
- Volume snapshots (idea of a volume that is a snapshot of another volume, usually a persistent volume)
In addition to the above type of volumes, some orchestrator may allow us to add add our own.
Container (application environment)
What kind of environment does the application need?
Through the work of the OCI, containers API are standardized which improve container isolation. The container shall, but is not limited, to share the following resources:
The host can secure itself with:
- Capabilities, rule-based execution: with AppArmor, fine grained security policies are possible
- Kernel proxy: is able to limit "real" sys-calls
- Machine-level virtualization: emulate a "guest kernel" that do the sys-calls
- Privileged processes: allow to bypass kernel permission checks, else, the process is subject to full permission checks (UID, GID, group lists)
Known container are:
- docker (classic)
- kata (QEMU emulation with several layers of abstraction on top of the kernel) -> Machine-level virtualization
- gVisor (reduce sys-calls) -> kernel proxy
Known container builder are:
- buildah (specialized for OCI images)
- kaniko (can run on the cluster too)
- jib (more for java application)
Due to java mostly being used, many application suffers of performance issues. The concept application containerization tries to resolve it by developing:
- bytecode (efficient instruction set for interpreter)
- GraalVM (performance modern java)
Network (application interface)
All application need to communicate with each other. This is the base of the micro-service architecture. Therefor defining the right network is also needed.
- Host Network: It represents the network between hosts or VMs.
- Overlay Network: It represents a network that is built on top of another network. It allows to have better application isolation, encryption, firewalls and load balancing rules! Commonly used on k8s are Canal and Cilium (which contain encryption protocols but also has a lower bandwidth).
The role of an orchestrator is to find a resource for any application that needs to run. Let's also note that if the orchestrator propose a "service" API, it can also be used by the application to ensure its scaling and reliability.
We will discuss next several functionalities and concepts that are part of what an orchestrator should do.
It corresponds to an application running in an orchestrator.
Different application may need different workloads to be executed. The following workloads were identified:
- Stateless workload (the application do not need to reach any state to function, it can directly execute its request)
- Stateful workload (e.g. need a unique network id, need storage, need graceful deployment)
- Finite workload (run until completion, more for complex execution)
- Host daemon workload (ensure that all functionalities that need to run on the same node, does it. E.g. monitoring of the application running on the node)
There are also additional workloads specific to cloud solutions that are not included in the above list:
- Load balancing (balance the load between different duplicates application)
- Service discovery (allow the discovery of different services which are part of the solution)
- Ingress (glue between outside word and our services)
When it comes to "auto-scale" the solution, a metrics collector which feed a scale controller is needed. Common metrics to scale the solution are:
- CPU usage
Small parenthesis on the load balancers, defining which node will process the input follows a pre-defined strategy:
- Round-robin (time-slicing) (default)
- Least connection
- Destination hashing (a table of the destination exist, we just increment a value which shall correctly distribute between the destination)
- Source hashing (input define randomly, via hashing, the target)
- Shortest expected delay (assign input to the node where the queue is the shortest)
When a user want an application to run, he needs to demand/request resources and a specific environment. He will also need to define their Quality of Service Class or its Priority and Preemption needs. This is usually done via a chart (named by Helm).
- The orchestrator will then search for any node/resource matching the request.
- If it finds one that fulfill the requirements and is available , it places and starts the application on it. --> Done
- If it does not find one, it checks if it can evict another application (in k8s: Pod)
- If its priority is too low, the application stays in a pending state until something change in the orchestrator managed resources (also called cluster)
What is overprovisioning?
Its when the cluster provide the application with more resources it actually need (specified in the chart).
What is overcommitment?
Each chart can also specify a resource limit which define a hard limit on how much resource the application can use. When several application are running on the same node, the cluster manager ensure that the requested resources are met, but not the limits.
The sum of the limits minus the node max-capacity is called overcommitment.
An application using an amount of resources between "requested" and "limit" may be considered by the cluster as problematic and can get evicted.
When it comes to placement, in addition to resource, label and taint, affinities can be defined:
- node affinity (specific for a node)
- pods affinity, also called inter-pod affinity (touches several nodes)
- anti affinity (force specific constellation between nodes to not happen to avoid single point of failures or competitions for resources)
There is here 2 types of affinities:
- hard affinity rules means "required during scheduling, ignored during execution"
- soft affinity rules means "preferred during scheduling, ignored during execution"
As defined in Arc42, architecture/solution strategies contain:
- Technology decisions
- Decisions about the top-level decomposition of the system
- Decisions on how to achieve key quality goals
- Monolith (One UI, one database, one core containing all capabilities)
- Microservice (Running on its own process, standardized communication APIs, build around business capabilities, independently deployable, centralized management minimal)
Monolith has the bad habit to be associated with legacy code but this is not true. I like the definition of working effectively with legacy code: "Legacy code is source code inherited from someone else or inherited from an older version of the software". I even like to extend this definition to "Legacy code is source code that is not tested". Monolith solution can be well tested, can be well documented, maintain its trajectory when it comes to software developed.
The patterns explored under are more applicable for micro-service architecture.
The main difference between both is that DDD is decomposing the system per business needs where SCS decompose it per user needs. For this reason, every SCS contains a UI where every sub-domain does not necessary has one.
Any sub-domain or SCS can contain many "micro" services.
Pattern applied on macro level (domain, entire solution)
It is the collection of patterns that impacts the entire solution and all (micro) services.
Orchestration defines a setup of rules on how the system regulates himself. These rules can be optimized towards architecture drivers.
- Operator / Controller pattern (focus: deal with load)
- Operator - Operate a complex workload (e.g. Elastic search)
- Controller - Control load and adjust needed resources for operator to perform (crone job or event monitoring. e.g. Prometheus)
- Resilience pattern (focus: deal with any instabilities)
- Expectations 1: modules, network, nodes can fail. User is unpredictable
- Expectations 2: faults (slow down, memory leaks, ...) and failures (inability of a system to perform its job) will occur
- Resolution 1: gracefully deal with timeouts and errors (escalate only if really necessary) circuit breaker pattern
- Resolution 2: gracefully handle many similar requests idempotent communication pattern: "recognize that similar requests were sent and reduce it to one request"
- Resolution 3: gracefully handle overloads via rate limiting and throttling. e.g. Reroute requests that cannot be processed.
- Resolution 4: intensive testing through simulation and chaos monkey
- Resolution 5: monitoring and alerting needed to catch any failures
- Resolution 6: do canary releases with the canary deployment pattern: "Small set of customer will be rerouted to the new release to validate the stability of the update"
- Service meshes pattern (focus: hyper-scaling)
- Observe, route, secure inter-communication and services to enable scaling where it is needed
- Intensive use of proxy pattern between the different nodes (Istio, Linkerd 2, Cilium, ...)
Who says micro-service, says services communicating with each other! There are here different possibilities on how you want the customer to interact with the systems.
But also, on how you want the elements of your system to interact with each other!
- Remote Procedure Invocation (e.g. REST), request/reply based protocol
- Messaging (e.g. Kafka)
- publish/subscribe pattern (emitter publish, consumer subscribe)
- broker pattern (central component that manage in/out messages) (e.g. MQTT)
- mesh pattern (each node is potentially a gateway which forward messages)
- Idempotent Consumer means you can identify and deal with multiple identical request/messages (e.g. if a retry occurred)
How do you want to interact with your database? How do you want to emit events out of transactions in the database?
- Transactional outbox
- The database contains an "outbox" table containing the performed database changes
- Some resource is notified of the changes, reads the "outbox" table and publish it into a message broker
- Limitation: how to un-send when the changes in the database was roll back?
- Transactional log tailing
- Instead of reading an "outbox" table, we read the database log instead
- Less intrusive then the outbox (no SQL request sent nor outbox table managed) but then, it is more complicated to overcome the limitation of un-sending a message
- Polling publisher
- Same as the transactional outbox with the difference, not changes are published, a crone job pulls with a defined interval.
There are mainly 2 patterns available:
Client-side service discovery: (e.g. Netflix Eureka)
- The client query from a Service Registry the list of services instances it needs. Afterwards, it load-balances its request to the services.
- (Plus) The client can define its own load-balancing. Less participants in the discovery process.
- (Minus) Service Registry needs to be robust and not changing!
Server-side service discovery: (e.g. AWS Elastic Load Balancer (ELB))
- The client send a request to a Router which query to the Service Registry the list of active service instances. Afterwards, it manages the load-balancing.
- (Plus) The router is kept simple: list of service-registry and list of active service instances. Each can evolve dynamically.
- (Minus) More failing places in the system. Router needs to support all necessary communication protocols.
Let us go through different vocabularies:
- Service Registry - Dynamic database of "active" services (name, instance, location). It also invoke health checks APIs from the registered services to ensure that they are operational.
- Router - Usually the service entry that reroute its request to the right service within its network.
- Self-Registration - Mechanisms of a service to register itself to a service registry.
- 3rd Party Registration - A third party entity register the services into the service registry. I see it kind of a "batch processing" services into a registry.
Even if the deployment responsibility could be part of the micro level, a general deployment strategy to keep the system up and running is needed.
Let us take a look at them:
- Recreate - It corresponds for a service to be shutdown, upgrade and boot up again.
- Rolling Update - Graduate switch of the instances: create an updated instance, test it, shutdown the old instance. Both versions get traffic, no API breaking changes allowed.
- Blue-Green Deployment - Ramp up the new deployment fully in parallel of the old deployment, ensure functionality in real-world scenario, ramp down old one if the new deployment works as expected.
- Canary Release - Replace a subset of the already deployed service with the new deployment, test it in real-world scenario (both versions get the traffic at the same time). Continue the gradual deployment of the service.
Cross cutting concerns patterns
Let us assume, you want to contribute to a specific part of a running system. All the above patterns need to be respected! To help you here, a template or a default configuration could exist as base for your new service.
- Microservice chassis - Creation of a base for all services in the solution. Any service needs to be build on top of it.
- Service template - Template that allows to "duplicate", "modify", "deploy" new services
- externalized configuration - Manage as single source all the configuration needed for the services to perform
Pattern applied on micro level (subdomain, SCS level)
It may feel at first that the macro level architecture defines a lot and therefor a certain level of freedom is lost when it comes to the micro level. This is partially true. Let us take an analogy.
To make a human, we need a brain, hands, legs... Should a hand grow on the leg? Probably not, it has a defined place. But do we have all same hands? No, else the fingerprint scanner would not work.
Where macro impacts how the overall (and the business goals) should be realized, the use-cases are not realized yet.
When it comes to complex transactions that may involve several services, we need to ensure that all the services have a common understanding of the transaction state. If a wrong state leads to a payment, trust of the client will be broken.
- Event sourcing - Name given to a specialized Broker Pattern. Specialized because it forward special messages: events
- Saga - Coordination, via asynchronous messaging, a sequence of services local transactions. There are 2 types of ways to implement it:
- Choreography, services exchange events to distribute a change of state
- Orchestration, one service coordinate the transaction. It contains the state-machine of the transaction and communicate with all the participating services to fullfil it.
When it comes to DDD, some other vocabularies / patterns are used:
- Aggregate - It is kind of a facade (also called aggregate root entity) and an aggregate of entities that fullfil a specific functionality. Internally, it may use the saga-orchestration pattern.
- Domain event - Aggregates can emit events to other aggregates. Internally, it may use the event sourcing pattern.
All our systems are using databases to persist information, events, states, ... But identifying our real use-cases may help us optimize our solution.
For example, do we really read and write the same amount of data? Are exactly the same type of data used? Do we need to read from the same database?...
Command Query Responsibility Segregation may be the solution. It consists of separating queries (read requests) and commands (change requests) so that each can scale independently.
One last question is left, which database would fit the best to our needs? Many different databases exist out there, here the most common of them:
- Relational database (e.g. PostgressSQL)
- Document-oriented database (e.g. mongoDB, elasticSearch)
- Graph-like database (e.g. neo4j)
- Triplestore database - A data entity is composed by (subject, predicate, object) like "Leo is 4". The database optimized itself to retrieve the data.
- Time-series database (e.g. InfluxDB) - Time based, trace changes of states. Useful for monitoring.
Application lifecycle management
When the solution is up and running, maintaining, extending it is needed. Let us take a look at the different ways to manage such running system.
See [deployment](##### Deployment patterns)
The customer expect SLAs (Service Level Agreements) for the solution, objectives to be fulfilled (SLOs) based on indicators (SLIs).
From these indicators, alerts could be created when an issue occur.
The following monitoring levels exist:
- Business Process - Does the customer like it (e.g. spend a lot of time in it), how is the adoption rate?...
- Customer Service - Can the customer log himself in? How is the reactivity of some functionalities?
- Application Level - How many requests per services? Which API is more used?
- Operating System - How many processes run? Is there any errors?
- Hardware - Network I/O? CPU usage?
Even if the focus should mainly be on business monitoring, the Four Golden Signals should always be implemented:
- Latency - What is the percentile delay between a request and an answer?
- Traffic - How often does something happen?
- Errors - How many errors and where?
- Saturation - How much resources are used?
To access the information, there are 2 possibilities:
- Push - Observers push information to a monitoring service ( e.g. InfluxDB).
- Pull - The monitoring service, at regular interval, pull the information it needs (e.g. node-exporter for prometheus) from the services.
Usually, for such a solution, a time series database is used to store the data and a dashboard like Grafana is used to see the different metrics.
An alert should always mean that something needs to be done by a human now!
The most common alerts are so called active heartbeat and inform us if a system is still online.
When a heartbeat is missing, a human should put it online again.
Alerts usually respect the following steps:
- Alert rules - A specific signal is captured that indicates an anomaly.
- Alerting grouping - The incoming alert is classified to understand who should be the recipients.
- Paging scheduling - Define when the alert should be sent to the recipients (How critical is the alert?).
- Paging - Inform the recipients about the alert.
- Manage ask - Provide with the alert what is expected from the recipients.
Observability is needed to understand how the system is performing! Usually, experiments, deployments states, trends, error traces of the system can help us improving drastically our solution and its monitoring.
This enable us to take critical decisions in the future to ultimately fulfill the business goals.
Common sources of observation are:
- Server logs (OS, network,...)
- Application logs (runs)
- Monitoring data
- database data
All this sources are aggregated by (1) a data-collector (e.g. Fluentd), (2) stored in a way they can be analyze (e.g. Elastic search) and (3) visualized and manipulated when needed (e.g. Kibana).
Special case: Tracing
For big distributed-systems, identifying an issue is complex. A big effort may be put on using the logs to generate a trace of the system activities. Some tool exist to help us here:
Pitfalls to Observability and Monitoring
- Use average and ignore the centile! If 50 % of users are getting a slow answer and 50% an extremely fast one, the average may be good...
- Underestimating the measurement resolution. If the information you search is not seen, you may know there is a problem, you may never find it...
- Aggregate the wrong data. You aggregate data that gives you the information your system is running good but in reality it is failing and there is not way for you to see it with the collected information.
With automation is meant the way to "automate" actions that needs to be repeated on many machines or many times in a row.
Changes can be done in the following ways:
- Sneaker admin - Access physically the devices to perform any change.
- Remote admin - Remote connect to the devices to perform any change.
- Shell script admin - Run a script that connects to the device and perform the needed change.
- Continuous configuration automation (CCA) - Uses a framework and descriptive languages to document and perform any change.
As CAA models, there is:
- Push - The CAA master connect to each node and preform the changes.
- Pull - The node connects to the CAA master and ask if there is any update to perform. If yes, it pulls the scripts and apply the changes.
The most popular CAA framework are now Infrastructure as Code (IaC) base.
Configuration management tool
- CFEngine - Godfather, written in C, pull model
- Puppet - Declarative, DSL possible, pull model, nice UI
- Chef - Declarative, DSL possible, pull model, reporting, can automatically setup VMs and containers
- Ansible - Yaml based, push model, reporting, opinionated software
Infrastructure provisioning tool
- AWS, CloudFormation
- Google, Cloud Deployment Manager
- Microsoft, Azure Resource Manager
- Terraform - Provider agnostic but dependent of the provider resource names (you will not deploy a system on AWS within minutes in Azure)
- Pulumi - Tries to fix the limitations of Terraform (multi-programming language support, deployment cross cloud provider agnostic)
Kubernetes provisioning tool
List of tools specialized in provisioning k8s systems.
- Helm - De-facto standard, Imperative, Templating as base concept, can become complex, Steep learning curve, many functionalities.
- Kustomize - Declarative, Now native to kubectl v1.14, yaml patching (means you create layers that overwrite / patch other layers).
- Pulumi (again) - Declarative and imperative (depending on the task), high integration.
Cloud-native provisioning tool
Growing projects which focus on cloud-native and its rules.
- Ballerina - Programming language with focus on cloud native constraints. Create nice graphs for improved visibility.
- Metaparticle - Package that plugs into any other languages. Deployment is done when the script is called. How to manage big programs?
Additional known tools.
- Ranger - To deploy new clusters in public clouds.
- OpenShift - RedHat flavored k8s.
- kubeadm - To learn how to use k8s.
CI / CD
"The one that knows best how to deploy, are the one that wrote what needs to be deployed" -> DevOps philosophy.
Note: I will not focus on software development methodologies here.
Your coding guidelines should of course define what for a development flow you use, TDD, which type of test levels (how strong the regression) you need to validate your services.
To be able to deploy and resolve issues, the developer need access to the cluster its service will be deployed later on.
- Kubefwd - Forward pods to /etc/hosts to access the services.
- Telepresence - Two-way network proxy to access the cluster.
- Eclipse Theia - Create an IDE on the cluster that can be accessible to develop services.
To ensure the contributed code can be integrated. Standard verification steps would be code convention, linter, uni-tests but also integration-tests.
For the last part, it should be easy to deploy the newly build application in a staging-cluster-environment and run some specific test scenarios on it or via mirroring real requests, very the stability of the contribution.
The following tools could be used:
- Gitlab CI
After a merge, the packaging is automatically run and forwarded to a registry. The deployment to a staging or production environment is done manually.
Goes one step farther and ensure a continuous deployment of the package in a production environment.
The deployment strategy is defined during the architecture strategy phase.
Depending on the strategy, we could have a single cluster containing different namespaces like DEV, TEST, PROD or we could have multiple clusters to have a clear separation of concerns.
It is really tricky to have so many moving parts when it comes to designing a solution. For example, the effort to take the path of micro-services need to be well justified.
In my journey of creating this document, I fell on two Martin Fowler posts that are here relevant:
- Make sure you understand the trade-offs of using the micro-service and ensure that they are not bourdons for your solution: https://martinfowler.com/articles/microservice-trade-offs.html
- Make sure your problem is complex enough that your solution will be really complex: https://martinfowler.com/bliki/MicroservicePremium.html
- Provide schematics for some important concepts describe above. e.g. Discovery pattern