When VMs migrate NUMA Nodes too often

NUMA (non-uniform memory access) is a multiprocessing memory architecture in which memory access time is affected by memory location relative to the processor, and a CPU can access local memory quicker than non-local memory. NUMA’s advantages are restricted to specific workloads, particularly on servers where data is tied to individual processes or users frequently.

NUMA systems are multi-system bus-based high-performance server solutions. They could be able to combine a large number of CPUs into a single system picture, improving price-performance ratios.

NUMA delivers considerable benefits when used in a virtualized environment (VMware ESXi, Microsoft Hyper-V, etc.) as long as the Guest VM does not utilize more resources than the NUMA nodes.

The moment NUMA node migration happens too often, you see a massive drop in the virtual machine performance. You’ll learn how to detect that in this post.

NUMA’s architecture

The NUMA architecture is made up of numerous CPU modules, each with its own local memory, I/O slots, and other features. Because its nodes may use an interconnection module to link and share information, each CPU has access to the whole system’s memory. Local memory access is significantly faster than distant memory access, which is also a source of memory access inconsistency.

In many aspects, NUMA and MPP are structurally similar. They’re made up of a variety of nodes. Each node is equipped with its own CPU, memory, and input/output ports. The node connection strategy allows nodes to exchange data. However, there is one much important distinction that is to be made:

Node Interconnection Mechanism: It’s a NUMA node connection technology that’s used on a single physical server. When accessing external memory, a CPU must wait. This is the major reason why the NUMA server’s performance does not scale linearly with CPU power.

Memory Access Mechanism: On the same physical server, the NUMA node connection mechanism is used. When accessing external memory, a CPU must wait. The performance of the NUMA server does not grow linearly as CPU power rises because of this.

NUMA Node Migration

When it comes to managing physical memory and calculating where the optimum spot for a virtual machine’s memory is based on how busy each NUMA node in the physical server is, vSphere is quite active. If a virtual machine is operating on a busy NUMA node, the ESXi kernel will automatically move it to a less busy NUMA node on the server to improve performance.

While NUMA node migration can be a good thing, if it happens too often (hundreds or even thousands of times a day) it damages your virtual machine performance massively.

There are two primary reasons for ESXi to migrate a VM from one NUMA node (Home) to another.

Balanced Migration: To alleviate CPU congestion within a NUMA node, ESXi may migrate.

Locality Migration: When ESXi detects that most of the virtual machine’s memory is in a distant node, it’s usually better to move/schedule the VM to run from the NUMA node where most of the virtual machine’s memory resides, as long as this doesn’t create CPU contention in that NUMA node.

NUMA Scheduling in VMware ESXi

To dynamically balance processor demand and memory locality or processor load balance, ESXi employs a sophisticated NUMA scheduler.

The NUMA scheduler assigns a home node to each virtual machine managed by it. According to the System Resource Allocation Table, a home node is one of the system’s NUMA nodes with processors and local memory. When a virtual machine requires memory, the ESXi host allocates it from the home node first. To maximize memory locality, the virtual machine’s virtual CPUs are limited to executing on the home node.

The NUMA scheduler may modify the home node of a virtual machine dynamically in response to changes in system demand. To balance the processing load, the scheduler may move a virtual machine to a new home node. Because this may result in more of the virtual machine’s memory being remote, the scheduler may dynamically relocate the virtual machine’s memory to its new home node to enhance memory locality. When memory locality is improved, the NUMA scheduler may shift virtual machines across nodes.

To avoid performance difficulties in a VMware-based virtual environment, CNIL Metrics and Logs monitors NUMA Locality Swaps and NUMA Balance Migration on an hourly basis.

Try Metrics and Logs for free:

The ESXi NUMA scheduler does not handle all virtual machines. If you manually change a virtual machine’s processor or memory affinity, for example, the NUMA scheduler may not be able to handle it. Virtual machines that are not handled by the NUMA scheduler continue to function as expected. They do not, however, benefit from ESXi NUMA improvements.

The NUMA scheduling and memory allocation policies in ESXi can transparently manage all virtual machines, removing the need for administrators to explicitly address the complexity of virtual machine balancing between nodes.

Regardless of the type of guest operating system, the optimizations operate well. Even virtual machines that don’t support NUMA hardware, such as Windows NT 4.0, get NUMA support through ESXi. As a consequence, even with outdated operating systems, you can make use of modern hardware.

Wrap Up

It is possible to operate a virtual machine with more virtual processors than the number of physical processor cores accessible on a single hardware node. A virtual machine that spans NUMA nodes is accommodated by the NUMA scheduler. That is, it is divided into numerous NUMA clients, each of which is allocated to a node and then maintained as a regular, non-spanning client by the scheduler. Certain memory-intensive applications with a high degree of the locality may benefit from this. For details on customizing this feature’s functionality.

The standard initial placement strategy is combined with a dynamic re-balancing mechanism in ESXi. The system evaluates loads of the various nodes regularly (every two seconds by default) to see whether it needs to equalize the load by transferring a virtual computer that is transferred from one node to another.

To increase performance without breaching fairness or resource entitlements, this algorithm takes into consideration the resource settings for virtual machines and resource pools.

When you turn on a VM, ESXi assigns it a home node. A virtual machine can only execute on processors from its home node, and its freshly allocated memory is also sourced from there. A virtual machine utilizes only local memory unless its home node changes, avoiding the performance penalties associated with distant memory accesses to other NUMA nodes.

CNIL Metrics and Logs may also be used to monitor NUMA Locality Swaps and NUMA Balance Migration on an hourly basis in a VMware-based virtual environment to minimize performance concerns.

CNIL
Metrics and Logs

(formerly, Opvizor Performance Analyzer)

VMware vSphere & Cloud
PERFORMANCE MONITORING, LOG ANALYSIS, LICENSE COMPLIANCE!

Monitor and Analyze Performance and Log files:
Performance monitoring for your systems and applications with log analysis (tamperproof using immudb) and license compliance (RedHat, Oracle, SAP and more) in one virtual appliance!

Subscribe to Our Newsletter

Get the latest product updates, company news, and special offers delivered right to your inbox.

Subscribe to our newsletter

Use Case - Tamper-resistant Clinical Trials

Goal:

Blockchain PoCs were unsuccessful due to complexity and lack of developers.

Still the goal of data immutability as well as client verification is a crucial. Furthermore, the system needs to be easy to use and operate (allowing backup, maintenance windows aso.).

Implementation:

immudb is running in different datacenters across the globe. All clinical trial information is stored in immudb either as transactions or the pdf documents as a whole.

Having that single source of truth with versioned, timestamped, and cryptographically verifiable records, enables a whole new way of transparency and trust.

Use Case - Finance

Goal:

Store the source data, the decision and the rule base for financial support from governments timestamped, verifiable.

A very important functionality is the ability to compare the historic decision (based on the past rulebase) with the rulebase at a different date. Fully cryptographic verifiable Time Travel queries are required to be able to achieve that comparison.

Implementation:

While the source data, rulebase and the documented decision are stored in verifiable Blobs in immudb, the transaction is stored using the relational layer of immudb.

That allows the use of immudb’s time travel capabilities to retrieve verified historic data and recalculate with the most recent rulebase.

Use Case - eCommerce and NFT marketplace

Goal:

No matter if it’s an eCommerce platform or NFT marketplace, the goals are similar:

  • High amount of transactions (potentially millions a second)
  • Ability to read and write multiple records within one transaction
  • prevent overwrite or updates on transactions
  • comply with regulations (PCI, GDPR, …)


Implementation:

immudb is typically scaled out using Hyperscaler (i. e. AWS, Google Cloud, Microsoft Azure) distributed across the Globe. Auditors are also distributed to track the verification proof over time. Additionally, the shop or marketplace applications store immudb cryptographic state information. That high level of integrity and tamper-evidence while maintaining a very high transaction speed is key for companies to chose immudb.

Use Case - IoT Sensor Data

Goal:

IoT sensor data received by devices collecting environment data needs to be stored locally in a cryptographically verifiable manner until the data is transferred to a central datacenter. The data integrity needs to be verifiable at any given point in time and while in transit.

Implementation:

immudb runs embedded on the IoT device itself and is consistently audited by external probes. The data transfer to audit is minimal and works even with minimum bandwidth and unreliable connections.

Whenever the IoT devices are connected to a high bandwidth, the data transfer happens to a data center (large immudb deployment) and the source and destination date integrity is fully verified.

Use Case - DevOps Evidence

Goal:

CI/CD and application build logs need to be stored auditable and tamper-evident.
A very high Performance is required as the system should not slow down any build process.
Scalability is key as billions of artifacts are expected within the next years.
Next to a possibility of integrity validation, data needs to be retrievable by pipeline job id or digital asset checksum.

Implementation:

As part of the CI/CD audit functionality, data is stored within immudb using the Key/Value functionality. Key is either the CI/CD job id (i. e. Jenkins or GitLab) or the checksum of the resulting build or container image.

White Paper — Registration

We will also send you the research paper
via email.

CodeNotary — Webinar

White Paper — Registration

Please let us know where we can send the whitepaper on CodeNotary Trusted Software Supply Chain. 

Become a partner

Start Your Trial

Please enter contact information to receive an email with the virtual appliance download instructions.

Start Free Trial

Please enter contact information to receive an email with the free trial details.