Best Practices
Chapter 1. Setting Up a Basic Service
Managing resource manifests
Resource manifests are the declarative state of an application.
- Store resource manifests in Git (version control)
- Manage Git repository on GitHub (code review)
- Use a single top-level directory for all manifests of an application
- Use subdirectories for subcomponents (e.g. services) of the application
- Use GitOps: ensure that contents of cluster match content of Git repository
- Deploy to production only from a specific Git branch using some automation (e.g. a CI/CD pipeline)
- Use CI/CD from the beginning (difficult to retrofit into an existing application)
Managing container images
- Base images on well-known and trusted image providers
- Alternatively, build all images "from scratch" (e.g. with Go)
- The tag of an image is immutable (i.e. images with different content have different tags
- Combine semantic versioning with the hash of the corrsponding Git commit (e.g. v1.0.1-bfeda01f)
- Avoid the latest tag (is not immutable)
Deploying applications
- Set resource requests equal to limits: provides predictability at the expense of maximising the resource utilisation (the app can't make use of excess idle resources)
- Set requests and limits to different values only when you have more experience
- Always use an Ingress for exposing services, even for simple applications (for production, you don't need to use an Ingress for experimentation)
- Manage configuration data that is likely to be updated during runtime separately from the code in a ConfigMap
- Put a version number in the name of each ConfigMap (e.g.
myconfig-v1
). When you make an update to the configuration, create a new ConfigMap with an increased version number (e.g.myconfig-v2
), and then update the application to use the new ConfigMap. This ensure that the new configuration is loaded into the application, no matter how the ConfigMap is mounted in the app (as an env var or a file, and in the latter case, if the app watches the file for changes)- Don't delete the previous ConfigMap (e.g.
myconfig-v1
). This allows to roll back to a previous configuration at any time.
- Don't delete the previous ConfigMap (e.g.
- If deploying an application to multiple environments, use a template system (don't maintain multiple copies of resource manifest directories)
- Helm, kustomize, Kapitan, ...
Chapter 2. Developer Workflows
Creating a development cluster (where developers can deploy and test the applications they are working on).
- One cluster per organisation or team (10-20 people)
- One namespace for each developer
- Use an external identity systems for cluster user management (Azure Active Directory, AWS IAM)
- Authentication with bearer token and API server validates token with the external service
- Grant developers the
edit
ClusterRole for their namespace with a RoleBinding (not ClusterRoleBinding) - Grant developers the
view
ClusterRole for the entire cluster with a ClusterRoleBinding (not RoleBinding) - Assign a ResourceQuota to each namespace
- Make developer namespaces transient by assigning them a time to live (TTL) after which they are automatically deleted
- So that no unused resources accumulate in the cluster
- Assign the following annotations to each namespace: TTL, assignee, resource quotas, team, purpose
- You could define a CRD that creates a namespace with all this metadata
Chapter 3. Monitoring and Logging in Kubernetes
Monitoring
- Run monitoring system in a dedicated "utility cluster" (to avoid problems with the target cluster affecting the monitoring system)
Logging
Collect and centrally store logs from all the workloads running in the cluster and from the cluster components themselves.
- Implement a retention and archival strategy for logs (retain 30-45 days of historical logs)
- What to collect logs from:
- Nodes (kubelet, container runtime)
- Control plane (API server, scheduler, controller mananger)
- Kubernetes auditing (all requests to the API server)
- Applications should log to stdout rather than to files
- Allows a daemon on each node to collect the logs from the container runtime (if logging to files, a sidecar container for each pod might be necessary)
- Some log aggregation tools: EFK stack (Elasticsearch, Fluentd, Kibana), DataDog, Sumo Logic, Sysdig, GCP Stackdriver, Azure Monitor, AWS CloudWatch
- Use a hosted logging solution (e.g. DataDog, Stackdriver) rather than a self-hosted one (e.g. EFK stack)
Alerting
- Only alert on events that affect service level objectives (SLO)
- Only alert on events that require immediate human intervention
- Automate remediation of events that don't require immediate human intervention
- Include relevant information in the alert notification (e.g. link to troubleshooting playbook, context information)
Chapter 4. Configuration, Secrets, and RBAC
ConfigMaps and Secrets
- Use ConfigMaps and Secrets to inject configuration into pods
- PodPresets: automatically mount a ConfigMap or Secret to a pod based on annotations
- In the application, watch the configuration file for changes, so that the configuration can be changed at runtime by updating the ConfigMap or Secret
- When using values from a ConfigMap/Secret as environment variables, the environment variables in the containers are NOT updated when updating the ConfigMap/Secret
- Use CI/CD pipeline that restarts pods whenever a ConfigMap/Secret is updated (this ensures that the new data is being used by the pods, even if the application does not watch the configuration file for changes or if the configuration data is mounted as environment variables)
- Alternatively, include a version name in the name of the ConfigMap and when configuration changes, create a new ConfigMap and update applications to use the new ConfigMap (see Chapter 1).
- Always mount Secrets as volumes (files), never as env vars
- Avoid stateful applications in Kubernetes
- Use SaaS/cloud service offerings for stateful services
- If running on premises and public SaaS is not an option, have a dedicated team that provides internal stateful SaaS to the rest of the organisation
RBAC
- Use specific service accounts for all "users" of the Kubernetes API that are assigned tailored roles with the least amount of privileges to do the job
Chapter 5. Continuous Integration, Testing, and Deployment
Common steps of a CI/CD pipeline: (1) push code to Git repository, (2) build entire application code, (3) running tests against the built code, (4) building the container images, (5) push the container images to a container registry, (6) deploy the application to Kubernetes (use one of various deployment strategies, such as rolling update, blue/green deployment, canary deployment, or A/B deployment), (7) run tests against the deployed application (e.g. a chaos experiment)
- Keep production code in the master branch
- Keep container images sizes small (use scratch images with multistage builds, distroless base images, or optimised base images, e.g. Alpine, Debian Slim)
- Use an image tagging strategy: each image that is built by the CI system should have a unique tag (image tags should be immutable, that is, if two images have differing content, they can't have the same tag, see Chapter 1)
- Use the build ID as part of the tag
- Use the Git commit hash as part of the tag
- Minimise CI build times
- Include extensive tests in CI (build should fail if any test fails)
- Set up extensive monitoring in the production environment
Chapter 6. Versioning, Releases, and Rollouts
The true declarative nature of Kubernetes really shines when planning the proper use of labels.
By properly identifying the operational and development states by the means of labels in the resource manifests, it becomes possible to tie in tooling and automation to more easily manage the complex processes of upgrades, rollouts, and rollbacks.
- Version: increments when the code specification changes
- Release: increments when the applicatoin is (re)-deployed (even if it's the same version of the app)
- Rollout: how a replicated app is put into production (this is taken care of automatically by the Deployment resource when there are changes to the
deployment.spec.template
field) - Rollback: revert an application to the state of a previous release
Best practices:
- Label each resource with at least:
app
,version
,environment
- Pods can additionally be labelled with
tier
and top-level objects like Deployments or Jobs should be labelled withrelease
andrelease-number
- Pods can additionally be labelled with
- Use independent versions for container images, Pods, and Deployments
- E.g. if Pod specification changes, update only the Pod and Deployment version, but not the container image version
- Use a
release
(e.g.frontend-stable
,frontend-alpha
) andrelease-number
(e.g.34e57f01
) label for top-level objects (e.g. Deployment, StatefulSet, Job)- If the same version of the app is deployed again, it results in a new release number
- The release number is created by the CI/CD tool that deploys the application
Compare this with the officially recommended labels in the Kubernetes documentation.
app.kubernetes.io/name
app.kubernetes.io/instance
app.kubernetes.io/version
app.kubernetes.io/component
app.kubernetes.io/part-of
app.kubernetes.io/managed-by
Chapter 7. Worldwide Application Distribution and Staging
Deploying app in multiple regions around the world (for scaling, reduced latency, etc.).
Distributing container images, load balancing, canary regions, testing...
Chapter 8. Resource Management
Advanced scheduling
- Pod affinity
- Pod anti-affinity
- Node selector
- Taints and tolerations
Pod resource management
- Resource requests
- Resource limits
- Quality of Service (automatically determined by values for requests and limits)
- PodDisruptionBudget
- ResourceQuota
- LimitRange
- Cluster autoscaler
- Horizontal pod autoscaler
- Vertical pod autoscaler
Chapter 9. Networking, Network Security, and Service Mesh
Services and Ingresses
Service:
- ClusterIP (headless service has no label selector but an explicitly assigned Endpoint; is not managed by kube-proxy; has no ClusterIP address but creates a DNS entry for every Pod in the Endpoint)
- NodePort
- ExternalName
- LoadBalancer
Ingress:
Provides HTTP application-level routing in contrast to level 3/4 of services.
Ingress controller enables the use of Ingress resources (all of them are third-party)jjj
- All services that don't need to be accessed from outside the cluster should be ClusterIP
- Use Ingress for external-facing HTTP services and choose appropriate ingress controller
NetworkPolicy
Defines how pods within the cluster are allowed to communicate with each other.
- Requires a CNI that suppports NetworkPolicy (Calico, Cilium, Weave Net)
- Start with restricting ingress, then restrict egress if needed
- Create a deny-all policy in all namespaces
- Try to restrict inter-pod communication to within namespaces (avoid cross-namespace communication)
Service meshes
Manage traffic between services of an application (or multiple applications).
- Most probably only needed for large deployments with hundreds of services and thousands of endpoints
Chapter 10. Pod and Container Security
PodSecurityPolicy
Centrally enforce security-sensitive fields in pod specifications.
Many fields of PodSecurityPolicy match those of securityContext in the Pod specifications.
- Using PodSecurityPolicies requires the PodSecurityPolicy admission controller to be enabled, but in most Kubernetes deployments it is not enabled. As soon as the PodSecurityPolicy admission controller is enabled, you need appropriate PodSecurityPolicy resources to allow any Pods to be created.
- You also need to grant "use" access to the created PodSecurityPolicies to the service account of the workload or the controller of the workload (you can use
system:serviceaccounts
group which compromises all controller service accounts).
- You also need to grant "use" access to the created PodSecurityPolicies to the service account of the workload or the controller of the workload (you can use
- Use https://github.com/sysdiglabs/kube-psp-advisor to generate PodSecurityPolicies automatically based on exisiting Pods
RuntimeClass
Allow to specify which container runtime to use for a Pod (if there are multiple ones configured) based on the amount of isolation between containers that is required for this pod.
- Set the
runtimeClassName
field in the pod specification - Only use it if you have workloads that require different amounts of workload isolation on the host (for security or compliance)
Other
- Use DenyExecOnPrivileged or DenyEscalatingExec admission controllers as an easier alternative to PodSecurityPolicies -> however this is not a best practice as these are deprecated and it is recommended to use PodSecurityPolicies
- Use Falco to enforce security policies within the container runtime
Chapter 11. Policy and Governance for Your Cluster
Explanation:
Only allow compliant Kubernetes resources (of any kind) to be applied to the cluster (compliant with the defined policies).
- Open Policy Agent (OPA): policy engine
- Gatekeeper
- Validating admission control webhook
- Kubernetes Operator for installing, configuring and managing Open Policy Agent policies
Example policies that can be implemented with Gatekeeper:
- Services must not be exposed publicly on the internet
- Allow containers only from trusted container registries
- All containers must have resource limits
- Ingress hostnames must not overlap
- Ingresses must use only HTTPs
Chapter 12. Managing Multiple Clusters
How to manange multiple clusters, making application in different clusters interact with each other, deploying applications to multiple clusters at once, Kubernetes Federation...
Chapter 13. Integrating External Services and Kubernetes
- Application in Kubernetes consuming a service from outside the cluster
- Application outside the cluster consuming a service in Kubernetes
- Application in Kubernetes consuming a service in another Kubernetes cluster
Chapter 14. Running Machine Learning in Kubernetes
Apparently, Kubernetes is "perfect environment toenable the machine learning workflow and lifecycle".
Chapter 15. Building Higher-Level Application Patterns on Top of Kubernetes
Develop higher-level abstractions in order to provide more developer-friendly primitives on topof Kubernetes.
Chapter 16. Managing State and Stateful Applications
Basic volumes
Mounting directories from the host into containers.
- Use
emptyDir
for sharing data between containers in the same pod - Use
hostPath
if the data needs to be accessed also by agents running on the node
Storage managed by Kubernetes
Kubernetes support for managing persistent storage.
- PersistentVolume: a "disk" that exists independently from any nodes in the cluster and has its own Kubernetes resource
- PersistentVolumeClaim: a request for a PersistentVolume referenced from a Pod spec. This exists to prevent that specific PersistentVolumes must be referenced from a Pod spec (making the Pod spec non-portable) by referencing the generic PersistentVolumeClaim instead.
- StorageClass: defines a provisioner to create the disk backing a PersistentVolume to automate the creation of PersistentVolumes. A StorageClass name is referenced from PersistenVolumeClaim.
- Default StorageClass: used by any PersistentVolumeClaim that doesn't explicitly define a StorageClass name. Requires the DefaultStorageClass admission controller to be enabled.
Best practices:
- Avoid managing state in the cluster if you can: use an outside service for persisting state
- Even if it involves modifying the app to become stateless
- Define a default StorageClass named
default
(because this is often used by default in Helm charts) - If cluster is distributed across multiple availability zones, ensure that PersistentVolumes and Pod using them are in the same availability zone
- By properly labelling all objects and using node affinity, etc.
Running stateful applications
- Check if an operator exists for the type of application, and if yes, use it
Chapter 17. Admission Control and Authorization
Admission control
- Recommended set of admission controllers to enabled:
NamespaceLifecycle, LimitRanger, ServiceAccount, DefaultStorageClass, DefaultTolerationSeconds, MutatingAdmissionWebhook, ValidatingAdmissionWebhook, Priority, ResourceQuota, PodSecurityPolicy
- If you use multiple mutating admission control webhooks, don't modify the same fields of the same resources (the order in which admission control webhook are called is undefined)
- If you use mutating admission webhooks, also create a validating admission webhook that verifies that the resources have been modified in the way you expected
- Define the least amount of requests to be sent to an admission webhook (avoid
resources: [*]
, etc.) - Always use the
namespaceSelector
field in MutatingWebhookConfiguration/ValidatingWebhookConfiguration, which causes the admission control webhook to be only applied in certain namespaces. Select the least amount of namespaces that are necessary. - Always exclude the
kube-system
namespace from the scope of an admisson control webhook by the means of thenamespaceSelector
field - Don't give anyone RBAC rules to create MutatingWebhookConfiguration/ValidatingWebhookConfiguration unless it's really needed
- Don't send Secret resources to an admission control webhook if it's not needed (scope the requests that are passed to the webhook to the bare minimum)
Authorization
- Only use the default RBAC mode (there's also ABAC and webhook, but don't use them)
- For RBAC best practices, see Chapter 4
Table of contents
- 1. Setting Up a Basic Service
- Application Overview
- Managing Configuration Files
- Creating a Replicated Service Using Deployments
- Setting Up an External Ingress for HTTP Traffic
- Configuring an Application with ConfigMaps
- Managing Authentication with Secrets
- Deploying a Simple Stateful Database
- Creating a TCP Load Balancer by Using Service
- Using Ingress to Route Traffic to a Static File Server
- Parameterizing Your Application by Using Helm
- Deploying Services Best Practices
- Summary
- 2. Developer Workflows
- 3. Monitoring and Logging in Kubernetes
- 4. Configuration, Secrets, and RBAC
- 5. Continuous Integration, Testing, and Deployment
- 6. Versioning, Releases, and Rollouts
- 7. Worldwide Application Distribution and Staging
- 8. Resource Management
- 9. Networking, Network Security, and Service Mesh
- 10. Pod and Container Security
- 11. Policy and Governance for Your Cluster
- Why Policy and Governance Is Important
- How Is This Policy Different?
- Cloud-Native Policy Engine
- Introducing Gatekeeper
- Example Policies
- Gatekeeper Terminology
- Defining Constraint Templates
- Defining Constraints
- Data Replication
- UX
- Audit
- Becoming Familiar with Gatekeeper
- Gatekeeper Next Steps
- Policy and Governance Best Practices
- Summary
- 12. Managing Multiple Clusters
- 13. Integrating External Services and Kubernetes
- 14. Running Machine Learning in Kubernetes
- 15. Building Higher-Level Application Patterns on Top of Kubernetes
- 16. Managing State and Stateful Applications
- 17. Admission Control and Authorization
- 18. Conclusion