debugging-crashed-kubernetes-container

Debugging crashed Kubernetes container

When working with Kubernetes, you’re often amazed how easy container, service or secret configurations can be deployed with a simple kubectl add -f command. At least as long as it works. The moment your Kubernetes container doesn’t start, you are forced to start debugging and the simplicity to create resources based on community yaml files fires back.

Now you can dig deep into troubleshooting the container and trust me – that’s can be no fun at all.

Therefore, we start the post with one of the most annoying issue running your container – a CrashLoopBackoff error. That error can really drive you nuts.

Start to debug

You need to know some very important commands before you can even think of debugging. Next to the common kubectl get command, the kubectl describe is your way into detailed information about your environment.

Sometimes a kubectl cluster-info dump helps getting started.

Using kubectl get all or kubectl get pods, you can find the explore your environment, but let’s go step by step.

kubectl describe

Of course, first you need to find out what happens to the container and what’s the container (pod) name.

kubectl get pods does a good job at that for the current namespace, kubectl get pods –namespace monitoring for the defined namespace and if you want only want pods of all namespace just use kubectl get pods –all-namespaces.

When you know the pod name, run a kubectl describe pod/podname to find the reason of the current status. Sometimes its an simple reason, as resources are not available. Examples are not enough memory or no volumes, persistentvolumeclaims available. 

In this example, you just need to create a volume and a PersistentVolumeClaim for the container to be able to start:

Kubernetes container missing PersistentVolumeClaim

But there are much uglier error messages like a CrashLoopBackoff that points to the inside of the pod and not the outside environment.

CrashLoopBackoff

CrashLoopBackoff is a nasty one, as its a bucket that offers a big set of errors, all nicely hidden behind the same error condition. Our example container is called metrics-prometheus, installed with a helm install command and named metrics.

First thing is to try kubectl describe pod to get some more details. Unfortunately, most of the time, the CrashLoopBackoff isn’t showing any outside information. But maybe you’re lucky and find some error in the output.

Log files

Lets try to get some logs

kubectl logs pod or kubelet logs

As the Kubernetes container keeps crashing, the logs command doesn’t help either as its only useful for running container.

We need to get into the Kubernetes container node. To find more information about the container we need to dig deeper using the wide parameter.

kubectl get pods -o wide

kubernetes container

As we use juju to manage kubernetes, we can directly ssh into the node. But first we need to find the node name using the command juju status.

juju status

SSH into the node

The node where the crashed container was last time is called kubernetes-worker/1:

juju ssh kubernetes-worker/1

Now we need to find the log files of the crashed container and the log files are stored under /var/log/containers.

sudo tail /var/log/container/metrics-prometheus ….

Just walk through the best options and check for errors. Bingo:

prometheus error

It seems that the used prometheus.yml configuration file has syntax errors we need to fix before the Prometheus server can start.

So lets delete the whole setup again. As we installed it with helm, lets keep using that great tool using the same name used for the install metrics:

helm del –purge metrics

In the case of Prometheus, there are plenty of issues that can happen because of an invalid config file, therefore you should use the following project to check your config before deploying into the deep kubernetes cluster.

https://www.robustperception.io/how-to-check-your-prometheus-yml-is-valid

After fixing the Prometheus configuration file and redeploying the whole metrics setup using helm, all went well and the container was running smoothly.

CNIL
Metrics and Logs

(formerly, Opvizor Performance Analyzer)

VMware vSphere & Cloud
PERFORMANCE MONITORING, LOG ANALYSIS, LICENSE COMPLIANCE!

Monitor and Analyze Performance and Log files:
Performance monitoring for your systems and applications with log analysis (tamperproof using immudb) and license compliance (RedHat, Oracle, SAP and more) in one virtual appliance!

Subscribe to Our Newsletter

Get the latest product updates, company news, and special offers delivered right to your inbox.

Subscribe to our newsletter

Use Case - Tamper-resistant Clinical Trials

Goal:

Blockchain PoCs were unsuccessful due to complexity and lack of developers.

Still the goal of data immutability as well as client verification is a crucial. Furthermore, the system needs to be easy to use and operate (allowing backup, maintenance windows aso.).

Implementation:

immudb is running in different datacenters across the globe. All clinical trial information is stored in immudb either as transactions or the pdf documents as a whole.

Having that single source of truth with versioned, timestamped, and cryptographically verifiable records, enables a whole new way of transparency and trust.

Use Case - Finance

Goal:

Store the source data, the decision and the rule base for financial support from governments timestamped, verifiable.

A very important functionality is the ability to compare the historic decision (based on the past rulebase) with the rulebase at a different date. Fully cryptographic verifiable Time Travel queries are required to be able to achieve that comparison.

Implementation:

While the source data, rulebase and the documented decision are stored in verifiable Blobs in immudb, the transaction is stored using the relational layer of immudb.

That allows the use of immudb’s time travel capabilities to retrieve verified historic data and recalculate with the most recent rulebase.

Use Case - eCommerce and NFT marketplace

Goal:

No matter if it’s an eCommerce platform or NFT marketplace, the goals are similar:

  • High amount of transactions (potentially millions a second)
  • Ability to read and write multiple records within one transaction
  • prevent overwrite or updates on transactions
  • comply with regulations (PCI, GDPR, …)


Implementation:

immudb is typically scaled out using Hyperscaler (i. e. AWS, Google Cloud, Microsoft Azure) distributed across the Globe. Auditors are also distributed to track the verification proof over time. Additionally, the shop or marketplace applications store immudb cryptographic state information. That high level of integrity and tamper-evidence while maintaining a very high transaction speed is key for companies to chose immudb.

Use Case - IoT Sensor Data

Goal:

IoT sensor data received by devices collecting environment data needs to be stored locally in a cryptographically verifiable manner until the data is transferred to a central datacenter. The data integrity needs to be verifiable at any given point in time and while in transit.

Implementation:

immudb runs embedded on the IoT device itself and is consistently audited by external probes. The data transfer to audit is minimal and works even with minimum bandwidth and unreliable connections.

Whenever the IoT devices are connected to a high bandwidth, the data transfer happens to a data center (large immudb deployment) and the source and destination date integrity is fully verified.

Use Case - DevOps Evidence

Goal:

CI/CD and application build logs need to be stored auditable and tamper-evident.
A very high Performance is required as the system should not slow down any build process.
Scalability is key as billions of artifacts are expected within the next years.
Next to a possibility of integrity validation, data needs to be retrievable by pipeline job id or digital asset checksum.

Implementation:

As part of the CI/CD audit functionality, data is stored within immudb using the Key/Value functionality. Key is either the CI/CD job id (i. e. Jenkins or GitLab) or the checksum of the resulting build or container image.

White Paper — Registration

We will also send you the research paper
via email.

CodeNotary — Webinar

White Paper — Registration

Please let us know where we can send the whitepaper on CodeNotary Trusted Software Supply Chain. 

Become a partner

Start Your Trial

Please enter contact information to receive an email with the virtual appliance download instructions.

Start Free Trial

Please enter contact information to receive an email with the free trial details.