How to Detect and Optimize NUMA in VMware Infrastructure?

NUMA (non-uniform memory access) is a memory architecture that supports multiprocessing where the memory access time is dependent on the memory location relative to the processor, and a CPU may access its own local memory quicker than non-local memory under NUMA. NUMA’s advantages are confined to specific workloads, particularly on servers where data is frequently connected with specific processes or users.

NUMA systems are multi-system bus high-performance server solutions. They may combine many processors into a single system picture, resulting in better price-performance ratios.

When implemented in a virtualized environment (VMware ESXi, Microsoft Hyper-V, etc.), NUMA provides significant benefits as long as the Guest VM does not consume more resources than the NUMA nodes.

NUMA Architecture

There are numerous CPU modules in the NUMA architecture, each of which has multiple CPUs and has its own local memory, I/O slots, and other features. Because its nodes may link and share information using an interconnection module, each CPU has access to the whole system’s memory (for example, a crossbar switch). Accessing local memory is obviously much faster than accessing remote memory (memory from other nodes in the system), which is also the cause of non-consistent memory access.

NUMA and MPP are architecturally comparable in many ways. They are made up of many nodes. Each node is equipped with its own processor, memory, and I/O. The node connectivity technique allows nodes to share information. However, there is a significant difference:

Node Interconnection Mechanism: This mechanism is implemented on the same physical server as the NUMA node interconnection mechanism. A CPU must wait when it needs to access external memory. This is the primary reason why the NUMA server is unable to support linear performance expansion as the CPU power grows.

Memory Access Mechanism: This mechanism is implemented on the same physical server as the NUMA node interconnection mechanism. A CPU must wait when it needs to access external memory. This is the primary reason why the NUMA server is unable to support linear performance expansion as the CPU power grows.

NUMA Function

NUMA is perceived by the higher layer, and resource allocation inside the node can considerably increase performance on data and specialized activities, as well as user-related business servers.

NUMA Affinity

To access data, memory access between NUMA regions must travel through both a memory bus and an inter-region bus. On the one hand, this access lengthens the latency; on the other hand, a secondary collision on the QPI bus is possible. Furthermore, Intel’s unique high-speed connectivity bus, led by the QPI/UPI, is a patented technology. Although the bus’s access speed is slower than the memory bus’s, the Intel x86 architecture’s CPU does not create a significant bottleneck.

As a result, NUMA affinity should be established for VMs with good memory access efficiency, particularly ARM CPUs. You may choose a NUMA topology for NUMA affinity to limit the NUMA area where CPUs assigned to VMs are placed, as well as the number of CPUs and RAM allotted in each region.

Memory use, memory ballooning, and memory shifting of ESXi hosts in the VMware vSphere system are all monitored by CNIL Metrics and Logs

The percentage of RAM that is used locally. The lower this value, the more likely NUMA locality is to blame for performance issues. If the number is less than 80%, you should be concerned.

When NUMA affinity is off, the CPU access to another NUMA area can be minimized to increase efficiency, as opposed to when the CPU and memory are randomly distributed. In the worst-case scenario, the NUMA area containing the CPU assigned to the VM is set to 1. The CPU and RAM assigned to the VM are in the same NUMA area in this example. There is no cross-NUMA memory access in this situation. However, it can also result in resource waste. (For instance, NUMA 0 has two CPUs, NUMA 1 has two CPUs, and there is a four-processor virtual machine.) The VM cannot be deployed if the NUMA area where the CPU assigned to the VM is set to 1.

NUMA affinity refers to the fact that VMs on the same NUMA node share memory and CPU resources. That is, each VM’s CPU and memory must be located on the same NUMA node. NUMA affinity implies that VMs on the same NUMA shared memory, CPU, and PCI resources.

Detect NUMA Performance Issues with CNIL Metrics and Logs

The NUMA scheduler assigns a home node to each virtual machine it manages. A home node is a NUMA node in a system that has processors and local memory.

The ESXi host will always try to allocate memory from the home node when allocating memory to a virtual machine. To maximize memory locality, the virtual machine’s virtual CPUs are limited to executing on the home node.

The VMkernel NUMA scheduler may dynamically adjust a VM’s home node to react to changes in system load when needed or practical. However, the VMkernel is constrained by physical and technical limits, and misconfigurations can cause performance issues. For efficient load balancing of your VMs, you can’t solely rely on the VMkernel.

As a result, you should begin by assessing your present NUMA condition.

Software that identifies and tracks NUMA KPIs, as well as your optimization efforts, is required to back up your approach.

Metrics and Logs gather and visualize all critical NUMA metrics for all your ESXi hosts and VMs over a lengthy period. As a result, NUMA faults become obvious rather than buried.

Optimize VMware Infrastructure for NUMA

When dealing with NUMA, the following are some crucial factors to keep in mind:

  • In the BIOS of your server, disable Node interleaving!
  • Order or configure physical server hardware so that each NUMA node has the same amount of RAM.
  • Assign vCPUs to VMs with a total number of physical cores that is less than or equal to the total number of physical cores in a single CPU Socket (stay within 1 NUMA Node). Hyperthreading isn’t included in the package!
  • Examine your virtual architecture, in general, to ensure that it is optimized for your servers’ physical NUMA node restrictions. Keep an eye out for Monster-VMs!
  • Avoid having a single VM consume more vCPUs than a single NUMA node, since this may cause memory access deterioration if it is scheduled over many NUMA nodes.
  • If a single or several VMs consume more RAM than a single NUMA node, the VMkernel will span a part of the memory content in the distant NUMA node, resulting in lower performance.
  • For VMs with 9 or more vCPUs, vNUMA (virtual NUMA) is enabled by default. Caution! When you enable “hot add CPU/memory” or configure CPU affinity, vNUMA is immediately disabled.
  • Every 2 seconds, the VMkernel NUMA rebalances.

CNIL
Metrics and Logs

(formerly, Opvizor Performance Analyzer)

VMware vSphere & Cloud
PERFORMANCE MONITORING, LOG ANALYSIS, LICENSE COMPLIANCE!

Monitor and Analyze Performance and Log files:
Performance monitoring for your systems and applications with log analysis (tamperproof using immudb) and license compliance (RedHat, Oracle, SAP and more) in one virtual appliance!

Subscribe to Our Newsletter

Get the latest product updates, company news, and special offers delivered right to your inbox.

Subscribe to our newsletter

Use Case - Tamper-resistant Clinical Trials

Goal:

Blockchain PoCs were unsuccessful due to complexity and lack of developers.

Still the goal of data immutability as well as client verification is a crucial. Furthermore, the system needs to be easy to use and operate (allowing backup, maintenance windows aso.).

Implementation:

immudb is running in different datacenters across the globe. All clinical trial information is stored in immudb either as transactions or the pdf documents as a whole.

Having that single source of truth with versioned, timestamped, and cryptographically verifiable records, enables a whole new way of transparency and trust.

Use Case - Finance

Goal:

Store the source data, the decision and the rule base for financial support from governments timestamped, verifiable.

A very important functionality is the ability to compare the historic decision (based on the past rulebase) with the rulebase at a different date. Fully cryptographic verifiable Time Travel queries are required to be able to achieve that comparison.

Implementation:

While the source data, rulebase and the documented decision are stored in verifiable Blobs in immudb, the transaction is stored using the relational layer of immudb.

That allows the use of immudb’s time travel capabilities to retrieve verified historic data and recalculate with the most recent rulebase.

Use Case - eCommerce and NFT marketplace

Goal:

No matter if it’s an eCommerce platform or NFT marketplace, the goals are similar:

  • High amount of transactions (potentially millions a second)
  • Ability to read and write multiple records within one transaction
  • prevent overwrite or updates on transactions
  • comply with regulations (PCI, GDPR, …)


Implementation:

immudb is typically scaled out using Hyperscaler (i. e. AWS, Google Cloud, Microsoft Azure) distributed across the Globe. Auditors are also distributed to track the verification proof over time. Additionally, the shop or marketplace applications store immudb cryptographic state information. That high level of integrity and tamper-evidence while maintaining a very high transaction speed is key for companies to chose immudb.

Use Case - IoT Sensor Data

Goal:

IoT sensor data received by devices collecting environment data needs to be stored locally in a cryptographically verifiable manner until the data is transferred to a central datacenter. The data integrity needs to be verifiable at any given point in time and while in transit.

Implementation:

immudb runs embedded on the IoT device itself and is consistently audited by external probes. The data transfer to audit is minimal and works even with minimum bandwidth and unreliable connections.

Whenever the IoT devices are connected to a high bandwidth, the data transfer happens to a data center (large immudb deployment) and the source and destination date integrity is fully verified.

Use Case - DevOps Evidence

Goal:

CI/CD and application build logs need to be stored auditable and tamper-evident.
A very high Performance is required as the system should not slow down any build process.
Scalability is key as billions of artifacts are expected within the next years.
Next to a possibility of integrity validation, data needs to be retrievable by pipeline job id or digital asset checksum.

Implementation:

As part of the CI/CD audit functionality, data is stored within immudb using the Key/Value functionality. Key is either the CI/CD job id (i. e. Jenkins or GitLab) or the checksum of the resulting build or container image.

White Paper — Registration

We will also send you the research paper
via email.

CodeNotary — Webinar

White Paper — Registration

Please let us know where we can send the whitepaper on CodeNotary Trusted Software Supply Chain. 

Become a partner

Start Your Trial

Please enter contact information to receive an email with the virtual appliance download instructions.

Start Free Trial

Please enter contact information to receive an email with the free trial details.