kubernetes-configure-prometheus-node-exporter-to-collect-numa-information

Every reader of our blog knows in the meantime that we care about NUMA architecture and optimization. Despite the fact that the architecture itself is quite easy to understand, the daily operations struggle to optimize NUMA balancement.

It’s already hard to track the current NUMA node balancement in a pure VMware vSphere environment, but monitoring it with the additional Kubernetes/Container layer on top makes it even harder.

We’re going to cover much more NUMA and other critical resource monitoring and optimization facts in the future, but let’s start with the simplest part. How to get the NUMA details of your Kubernetes Node using the popular Prometheus Node exporter.

We assume that you already have Prometheus Node Exporter running in your Kubernetes environment. If not there are plenty of great blog posts already and some nice Helm charts as well.

Here is one of the posts:

https://medium.com/devopslinks/trying-prometheus-operator-with-helm-minikube-b617a2dccfa3

Helm and Daemonset

The typical helm charts creates a daemonset that is deploying Node exporter Pods on every Kubernetes Node. That way you don’t need to think about deploying a Node exporter pod each time you add a new node to your cluster. The good think, every exporter pod has labels set, so Prometheus is automatically scraping the metrics once the pod is up and running.

Check Prometheus

You can easily check your Prometheus server if the node is being scraped already:

Prometheus target

Simply access the prometheus server and check if Kubernetes-nodes are listed as targets.

Check Node Exporter metrics

To check the gathered metrics, simply type node_ and select a metric.

Kubernetes Prometheus node exporter

But when you start searching for NUMA metrics, this list will remain empty (unless you already made the change this blog post is about).

Enable NUMA collector

When you read through the official documentation, you’ll notice that some metrics are not collected by default. That is also true for all NUMA related metrics that can the typically found under /proc/meminfo_numa.

meminfo_numa

To enable the NUMA collection you would need set some arguments for the Node Exporter pods within the daemonset. To get that information, lets first search for the daemonset.

kubectl get daemonset

Next step is to edit the daemonset you want to change.

kubectl edit daemonset.apps/prometheus-node-exporter

Here you should find a spec section for the containers including the args – add – –collector.meminfo_numa and save the file.

spec:
containers:

  • args:
    • –path.procfs=/host/proc
    • –path.sysfs=/host/sys
    • –collector.meminfo_numa

When saving the daemonset all related Pods will automatically be terminated, deleted and created with the new parameterset.

Wait some minutes and search for NUMA again within the Prometheus database and you should see some metrics coming in.

node_memory_numa_interleave_hit_total is one example. 

numa metrics in Prometheus

Perfect – now we can add some meaningful charts to our dashboards. You can either use your own Grafana instance or our Performance Analyzer product that has all build in (Kubernetes integration is still beta, but the final release only some days away). Just contact us, if you want to give it a try.

kubernetes node numa metrics

numa details

  1. NUMA Nodes
  2. NUMA Interleave hit
  3. NUMA Hit (Number/Byte)
  4. NUMA Home Miss
  5. NUMA Foreign
  6. NUMA local
  7. NUMA other

are just some of the metrics you can monitor and align with the NUMA metrics of VMware vSphere. Just to make sure: The NUMA metrics for vSphere VMs are part of Performance Analyzer and not of the node exporter.

You can also find some more details about Linux NUMA counters here:

  • numa_hit: Number of pages allocated from the node the process wanted.
  • numa_miss: Number of pages allocated from this node, but the process preferred another node.
  • numa_foreign: Number of pages allocated another node, but the process preferred this node.
  • local_node: Number of pages allocated from this node while the process was running locally.
  • other_node: Number of pages allocated from this node while the process was running remotely (on another node).
  • interleave_hit: Number of pages allocated successfully with the interleave strategy.

https://blog.tsunanet.net/2011/06/clarifications-on-linuxs-numa-stats.html

CNIL
Metrics and Logs

(formerly, Opvizor Performance Analyzer)

VMware vSphere & Cloud
PERFORMANCE MONITORING, LOG ANALYSIS, LICENSE COMPLIANCE!

Monitor and Analyze Performance and Log files:
Performance monitoring for your systems and applications with log analysis (tamperproof using immudb) and license compliance (RedHat, Oracle, SAP and more) in one virtual appliance!

Subscribe to Our Newsletter

Get the latest product updates, company news, and special offers delivered right to your inbox.

Subscribe to our newsletter

Use Case - Tamper-resistant Clinical Trials

Goal:

Blockchain PoCs were unsuccessful due to complexity and lack of developers.

Still the goal of data immutability as well as client verification is a crucial. Furthermore, the system needs to be easy to use and operate (allowing backup, maintenance windows aso.).

Implementation:

immudb is running in different datacenters across the globe. All clinical trial information is stored in immudb either as transactions or the pdf documents as a whole.

Having that single source of truth with versioned, timestamped, and cryptographically verifiable records, enables a whole new way of transparency and trust.

Use Case - Finance

Goal:

Store the source data, the decision and the rule base for financial support from governments timestamped, verifiable.

A very important functionality is the ability to compare the historic decision (based on the past rulebase) with the rulebase at a different date. Fully cryptographic verifiable Time Travel queries are required to be able to achieve that comparison.

Implementation:

While the source data, rulebase and the documented decision are stored in verifiable Blobs in immudb, the transaction is stored using the relational layer of immudb.

That allows the use of immudb’s time travel capabilities to retrieve verified historic data and recalculate with the most recent rulebase.

Use Case - eCommerce and NFT marketplace

Goal:

No matter if it’s an eCommerce platform or NFT marketplace, the goals are similar:

  • High amount of transactions (potentially millions a second)
  • Ability to read and write multiple records within one transaction
  • prevent overwrite or updates on transactions
  • comply with regulations (PCI, GDPR, …)


Implementation:

immudb is typically scaled out using Hyperscaler (i. e. AWS, Google Cloud, Microsoft Azure) distributed across the Globe. Auditors are also distributed to track the verification proof over time. Additionally, the shop or marketplace applications store immudb cryptographic state information. That high level of integrity and tamper-evidence while maintaining a very high transaction speed is key for companies to chose immudb.

Use Case - IoT Sensor Data

Goal:

IoT sensor data received by devices collecting environment data needs to be stored locally in a cryptographically verifiable manner until the data is transferred to a central datacenter. The data integrity needs to be verifiable at any given point in time and while in transit.

Implementation:

immudb runs embedded on the IoT device itself and is consistently audited by external probes. The data transfer to audit is minimal and works even with minimum bandwidth and unreliable connections.

Whenever the IoT devices are connected to a high bandwidth, the data transfer happens to a data center (large immudb deployment) and the source and destination date integrity is fully verified.

Use Case - DevOps Evidence

Goal:

CI/CD and application build logs need to be stored auditable and tamper-evident.
A very high Performance is required as the system should not slow down any build process.
Scalability is key as billions of artifacts are expected within the next years.
Next to a possibility of integrity validation, data needs to be retrievable by pipeline job id or digital asset checksum.

Implementation:

As part of the CI/CD audit functionality, data is stored within immudb using the Key/Value functionality. Key is either the CI/CD job id (i. e. Jenkins or GitLab) or the checksum of the resulting build or container image.

White Paper — Registration

We will also send you the research paper
via email.

CodeNotary — Webinar

White Paper — Registration

Please let us know where we can send the whitepaper on CodeNotary Trusted Software Supply Chain. 

Become a partner

Start Your Trial

Please enter contact information to receive an email with the virtual appliance download instructions.

Start Free Trial

Please enter contact information to receive an email with the free trial details.