I attended KubeCon + CloudNativeCon North America 2022 in Detroit, and it was a quite enjoyable conference with many interesting topics surrounding CI/CD, DevOps, observability, service mesh, and security. If you aren't familiar with the conference, it centers around the open source projects that are part of the Cloud Native Computing Foundation (CNCF), the biggest of which is Kubernetes.
As of this conference, there are well over 100 projects that are part of the CNCF:
- 18 "Graduated" (most mature)
- 37 "Incubating" (mature enough to be used in production)
- 83 "Sandbox" (least mature)
Some of the sessions I most enjoyed are detailed below. The CNCF has posted all of the sessions on their YouTube Channel if anything below piques your interest.
Backstage: Shaping the future of the developer experience
Backstage is an open platform for building developer portals. Spotify created the project and donated it to the CNCF and is currently "Incubating". Here's a quick demo/pitch video that helps explain why it would be useful.
This looks like a great tool that can not only be used for onboarding new engineers or applications, but also as a central portal for basically everything that engineers do on a day-to-day basis. One example they showed in a demo was a portal to create a new application based on a template (for example, a React application or a Spring Boot application). The portal pulls down skeleton files, creates a GitHub repo, and registers it in a software catalog. This could be used to quickly scaffold CI/CD configurations, such as Kubernetes or Helm configurations.
There are many community plugins available. If you don't see one you need, you can write your own!
Don't be greedy - Right size your Kubernetes cluster with Prometheus
This was a great session on a process to determine the right resource allocation to set on deployment. Although the monitoring tool used in the session was Prometheus, you can apply the same concepts using any application performance monitoring tool.
Benefits of right-sizing resources:
- Gain a better understanding of the application
- Unravel hidden problems in the application that were masked by the high resource availability
- Make the most of the resources
- Save some money
One of the slides in the session mentioned "Pod Eviction" when limits were set too high, but they didn't go into detail. So, I got curious and Googled the topic and found a great article that was written by the same company where the presenters were from (Sysdig).
It's definitely worth a read for a deeper dive into how Kubernetes classifies pods and handles evictions: Understanding Kubernetes Evicted Pods.
The article mentions the three QoS classes which were also talked about in the session:
- Guaranteed which sets memory and CPU limits for all containers and is unlikely to be evicted
- Burstable which applies to at least one container and sets limits on either memory or CPU and is less likely to evicted
- BestEffort which does not set CPU or memory limits and is most likely to be evicted
The process they laid out for right-sizing had the following high-level steps:
- Monitor/Calculate - get baselines, create graphs showing average usage and percentage of unused memory (Memory Requested - Memory Usage / Memory Requested)
- Resize - using info from unused memory, downsize the requests
- Monitor/Tweak - repeat as needed
Istio today and tomorrow: Sidecars and beyond
Istio is a service mesh that has been around for a few years and is quite mature at this point. However, it was just accepted into the CNCF as "Incubating" in September 2022.
Istio traditionally worked via a sidecar container which was an Envoy proxy that intercepts all ingress/egress network traffic. Recently, a "side-car-less" Ambient mesh was announced that operates with a waypoint proxy (per service account or namespace) instead of a sidecar proxy to address some of the challenges with sidecars:
- Requires injection
- Startup/shutdown sequence between app containers and sidecars
- Sidecar upgrades require restarting applications
- Not supported for jobs
- Incremental adoption not possible
- Overprovision of resources
Incremental adoption will be easier with the Ambient mesh. For example, you could adopt just Layer 4 security (mTLS) but not Layer 7 if you wanted to maintain your own JWT Layer 7 security.
JWT claim-based routing is a new feature recently released which sounded quite interesting.
Knative: More than just serverless containers
Knative is a platform-agnostic solution for running serverless deployments.
Why use it? Knative simplifies application development and deployment, which reduces developer toil and cognitive overhead.
There are three supported patterns for Knative:
- Serving - request-driven compute that can scale to zero - Sample Spring Boot implementation
- Eventing - declarative, event-driven applications that can hook into Kafka, RabbitMQ
- Functions - even more simple. Write small amount of code and the framework will wrap an HTTP server around it and build a container image. Functions support templates in the following languages/frameworks: Node.js, Python, Go, Quarkus, Rust, and TypeScript.
Writing reliable, scalable, fault-oblivious code on K8s the easy way
This session centered around using Dapr which is short for Distributed Application Runtime. Dapr isn't considered a service mesh, but it seems to provide many of the same capabilities.
The framework has the following features which you can selectively implement:
- service discovery
- service invocation resiliency
- secure transport (mTLS)
- message broker integration (message guarantee)
- distributed lock management
- state management
- secrets management
Dapr relies on a sidecar running in the same pod as an application using it. The application communicates with the sidecar, and the sidecar takes care of the concerns.
Dapr does require an operator to inject the sidecar as detailed in the Kubernetes Overview, so trying to use it would require cluster admin privileges. The Service Discovery component offers three different flavors of Name resolution including HashiCorp Consul, Native Kubernetes DNS, or mDNS.
In addition to the sessions from above, I came away with a few other key takeaways from the conference:
Having a service mesh is a no-brainer for service discovery, observability, increased security, and reliability.
The chaos engineering landscape is also getting more entries into the space. Chaos Mesh and Litmus are worth evaluating.
Engineers should consider ways to minimize the attack surface (for example, remove bash shell). When engineers need to debug, look at using ephemeral debug containers.