NUMA Home Node vs Remote Node and What to Look for

NUMA (non-uniform memory access) is a multiprocessing memory architecture in which memory access time is dependent on the memory location relative to the processor, and a CPU may access its own local memory faster than non-local memory. The benefits of NUMA are limited to specific workloads, especially on servers where data is regularly linked to individual processes or users.

NUMA systems are high-performance server solutions that use a multi-system bus. They may be able to integrate a large number of processors into a single system image, resulting in improved price-performance ratios.

NUMA delivers considerable benefits when used in a virtualized environment (VMware ESXi, Microsoft Hyper-V, etc.) as long as the Guest VM does not utilize more resources than the NUMA nodes.

Architecture of NUMA

The NUMA architecture contains several CPU modules, each with multiple CPUs and its own local memory, I/O slots, and other characteristics. Each CPU has access to the whole system’s memory since its nodes can utilize an interconnection module to link and share information. Local memory access is much faster than remote memory access, which is also the source of inconsistent memory access.

In many aspects, NUMA and MPP are structurally similar. They’re made up of a lot of different nodes. Each node has its own processor, memory, and I/O ports. Nodes can transfer information via the node connection approach. There is, however, a fundamental distinction:

Node Interconnection Mechanism: It is the NUMA node connectivity technique that is implemented on the same physical server. When a CPU needs to access external memory, it must wait. This is the main reason why the NUMA server can’t offer linear performance expansion as CPU power increases.

Memory Access Mechanism: The NUMA node connectivity technique and is implemented on the same physical server. When a CPU needs to access external memory, it must wait. This is the main reason why the NUMA server can’t offer linear performance expansion as CPU power increases.

NUMA Home Node

NUMA Home Node is a logical representation of the local memory and the CPU and is critical in initial placement because if a VM’s CPU or Memory allocation exceeds the NUMA home node, the VM will be forced to balance over two nodes.

Each virtual machine that the NUMA scheduler supervises is given a home node. In a system with processors and local memory, a home node is a NUMA node.

When allocating memory to a VM, the ESXi host will always try to allocate memory from the home node. The VM’s virtual CPUs are constrained to running on the home node to enhance memory locality. When needed or viable, the VMkernel NUMA scheduler can dynamically shift a VM’s home node to respond to changes in system demand.

However, the VMkernel is limited by physical and technological constraints, and misconfigurations might result in performance problems and cannot be relied on the VMkernel for effective load balancing of VMs.

As a result, it is necessary to start evaluating the current NUMA state such as.

  • Is NUMA remote access possible?
  • How frequently do VMs migrate their Home node?
  • How much memory is moved when NUMA migration takes place and how many VMs are affected?
  • Is this a generic or ESXi-specific problem?

Then, on a per-VM or per-ESXi basis, begin changing default VMkernel settings or correcting misconfigurations, and track all improvements over time.

VMs have been severely impacted by thousands of migrations every day. Because every migration also causes a memory content migration (the VMkernel tries to optimize by relocating distant node memory to home node memory), the entire ESXi host can quickly slow down.

CNIL Metrics and Logs monitors the NUMA Home Node performance in percentage; the lower this value, the greater the danger that NUMA locality is causing a performance problem. If the number is less than 80%, you should be concerned.

NUMA Remote Node

NUMA Remote Node Access displays the amount of memory accessed via the remote node in bytes (slowest memory access). Don’t be concerned about single-digit Mbyte numbers. However, Gigabytes… take action!

The amount of memory accessible by the VM from a distant NUMA node. The greater this value is, the more likely NUMA is to blame for a performance issue. If the amount exceeds several Megabytes, you should be concerned.

The NUMA Remote Node and the amount of memory accessed from a remote NUMA node by the VM are monitored by CNIL Metrics and Logs. The greater this value is, the more likely NUMA is to blame for a performance issue. If the amount exceeds several Megabytes, you should be concerned.

How does the VMware ESXi Host use NUMA?

ESXi dynamically balances processor load to maximize memory locality using a sophisticated NUMA scheduler. Each VM that the NUMA scheduler supervises is given a home node. In a system with processors and local memory, a home node is a NUMA node.

The ESXi host always tries to receive RAM from the home node when a VM requires it. The VM’s virtual CPUs are constrained to running on the home node to enhance memory locality.

The VMkernel NUMA scheduler may dynamically relocate a virtual machine’s home node to respond to changes in system demand when needed or feasible. However, the VMkernel is limited by physical and technological constraints, and misconfigurations might result in performance problems. For successful load balancing of your VMs, you can’t rely just on the VMkernel.

How to detect NUMA performance issues

There are a variety of methods to encounter NUMA performance difficulties, but monitoring them without third-party software is difficult. The NUMA Home Node utilization is the most critical item to examine, as we already know.

Check out CNIL Metrics and Logs if you’re searching for a simpler solution that keeps and visualizes information over a lengthy period and for all of your ESXi hosts and VMs. To get started, take advantage of the 30-day free trial. In the Starter, you’ll find the most significant metrics: Under Virtual Machine Memory Access Slowdown indications on the VMware Virtual Machine Dashboard.

The proportion of memory access that stays in the NUMA Home Node is shown in NUMA Home Node percent (the fastest memory access). That number should always be 100% or extremely close to 100%. If it falls below 90% for an extended time, you should begin optimizing.

NUMA Remote Node Access displays the amount of memory accessed via the remote node in bytes and doesn’t be concerned about single-digit Mbyte numbers.

Start your free Trial!

Wrap Up

When memory is accessed locally, optimum performance is achieved for most workloads. To get the most out of the system, the VM vCPU and RAM setup should mirror the workload needs. VMs should typically be small enough to fit on a single NUMA node. When VM setup spans several NUMA nodes, NUMA optimizations come in handy, but if possible, stick to a single CPU package architecture.

Check out the other blog posts:
https://codenotary.com/blog/how-to-detect-and-optimize-numa-in-vmware-infrastructure/
https://codenotary.com/blog/how-to-activate-vmware-vnuma-with-lower-vcpu-count/
https://codenotary.com/blog/virtual-machine-performance-trends/

CNIL
Metrics and Logs

(formerly, Opvizor Performance Analyzer)

VMware vSphere & Cloud
PERFORMANCE MONITORING, LOG ANALYSIS, LICENSE COMPLIANCE!

Monitor and Analyze Performance and Log files:
Performance monitoring for your systems and applications with log analysis (tamperproof using immudb) and license compliance (RedHat, Oracle, SAP and more) in one virtual appliance!

Subscribe to Our Newsletter

Get the latest product updates, company news, and special offers delivered right to your inbox.

Subscribe to our newsletter

Use Case - Tamper-resistant Clinical Trials

Goal:

Blockchain PoCs were unsuccessful due to complexity and lack of developers.

Still the goal of data immutability as well as client verification is a crucial. Furthermore, the system needs to be easy to use and operate (allowing backup, maintenance windows aso.).

Implementation:

immudb is running in different datacenters across the globe. All clinical trial information is stored in immudb either as transactions or the pdf documents as a whole.

Having that single source of truth with versioned, timestamped, and cryptographically verifiable records, enables a whole new way of transparency and trust.

Use Case - Finance

Goal:

Store the source data, the decision and the rule base for financial support from governments timestamped, verifiable.

A very important functionality is the ability to compare the historic decision (based on the past rulebase) with the rulebase at a different date. Fully cryptographic verifiable Time Travel queries are required to be able to achieve that comparison.

Implementation:

While the source data, rulebase and the documented decision are stored in verifiable Blobs in immudb, the transaction is stored using the relational layer of immudb.

That allows the use of immudb’s time travel capabilities to retrieve verified historic data and recalculate with the most recent rulebase.

Use Case - eCommerce and NFT marketplace

Goal:

No matter if it’s an eCommerce platform or NFT marketplace, the goals are similar:

  • High amount of transactions (potentially millions a second)
  • Ability to read and write multiple records within one transaction
  • prevent overwrite or updates on transactions
  • comply with regulations (PCI, GDPR, …)


Implementation:

immudb is typically scaled out using Hyperscaler (i. e. AWS, Google Cloud, Microsoft Azure) distributed across the Globe. Auditors are also distributed to track the verification proof over time. Additionally, the shop or marketplace applications store immudb cryptographic state information. That high level of integrity and tamper-evidence while maintaining a very high transaction speed is key for companies to chose immudb.

Use Case - IoT Sensor Data

Goal:

IoT sensor data received by devices collecting environment data needs to be stored locally in a cryptographically verifiable manner until the data is transferred to a central datacenter. The data integrity needs to be verifiable at any given point in time and while in transit.

Implementation:

immudb runs embedded on the IoT device itself and is consistently audited by external probes. The data transfer to audit is minimal and works even with minimum bandwidth and unreliable connections.

Whenever the IoT devices are connected to a high bandwidth, the data transfer happens to a data center (large immudb deployment) and the source and destination date integrity is fully verified.

Use Case - DevOps Evidence

Goal:

CI/CD and application build logs need to be stored auditable and tamper-evident.
A very high Performance is required as the system should not slow down any build process.
Scalability is key as billions of artifacts are expected within the next years.
Next to a possibility of integrity validation, data needs to be retrievable by pipeline job id or digital asset checksum.

Implementation:

As part of the CI/CD audit functionality, data is stored within immudb using the Key/Value functionality. Key is either the CI/CD job id (i. e. Jenkins or GitLab) or the checksum of the resulting build or container image.

White Paper — Registration

We will also send you the research paper
via email.

CodeNotary — Webinar

White Paper — Registration

Please let us know where we can send the whitepaper on CodeNotary Trusted Software Supply Chain. 

Become a partner

Start Your Trial

Please enter contact information to receive an email with the virtual appliance download instructions.

Start Free Trial

Please enter contact information to receive an email with the free trial details.