Difference between revisions of "Performance Optimization"
| Windows7ge (talk | contribs) | |||
| (2 intermediate revisions by one other user not shown) | |||
| Line 6: | Line 6: | ||
| <span style="color: red;">'''Note: When hugepages are configured this portion of memory is taken away from the Host. This means the Host will no longer be able to use it. Keep this in mind.'''</span> | <span style="color: red;">'''Note: When hugepages are configured this portion of memory is taken away from the Host. This means the Host will no longer be able to use it. Keep this in mind.'''</span> | ||
| − | ===  | + | === Debian (Ubuntu/Mint/Lubuntu/PopOS/etc) === | 
| First check if Linux isn't already using Hugepages with: <code>cat /proc/meminfo | grep Huge</code>. | First check if Linux isn't already using Hugepages with: <code>cat /proc/meminfo | grep Huge</code>. | ||
| If the output resembles the following: | If the output resembles the following: | ||
| Line 96: | Line 96: | ||
| vcpupin is the process where-in each vCPU assigned to the VM is tied to a physical core/thread. Configuring this has the most profound impact when dealing with a system that has multiple NUMA Nodes because it forces requests to memory to stay on one node without having to cross Intel's QPI links or AMD's Infinity Fabric. This also helps by tying the vCPUs to the Node that is directly connected to the GPU. | vcpupin is the process where-in each vCPU assigned to the VM is tied to a physical core/thread. Configuring this has the most profound impact when dealing with a system that has multiple NUMA Nodes because it forces requests to memory to stay on one node without having to cross Intel's QPI links or AMD's Infinity Fabric. This also helps by tying the vCPUs to the Node that is directly connected to the GPU. | ||
| − | <span style="color: red;">'''NOTE: In many cases a function in the BIOS known as Memory Interleave will obfuscate the NUMA Nodes making the system treat multiple sockets or multiple dies as one UMA(Uniform Memory Access) Node. This is a problem for multiple reasons. To fix this find Memory Interleave in your BIOS and set it from Auto -> Channel. If Memory Interleave does not have these options on a single socket system then chances are the system only operates in UMA mode.'''</span> | + | <span style="color: red;">'''NOTE: In many cases a function in the BIOS known as Memory Interleave will obfuscate the NUMA Nodes making the system treat multiple sockets or multiple dies as one UMA(Uniform Memory Access) Node. This is a problem for multiple reasons. To fix this find Memory Interleave in your BIOS and set it from Auto -> Channel. If Memory Interleave does not have these options on a single socket system then chances are the system only operates in UMA mode. In such case this is fine.'''</span> | 
| === Identifying CPU Affinity === | === Identifying CPU Affinity === | ||
| Line 134: | Line 134: | ||
|   7      11 |   7      11 | ||
| </syntaxhighlight> | </syntaxhighlight> | ||
| − | In the event that your system only uses UMA(Uniform Memory Access) the impact is definitely less pronounced but some benefit can still be gained. | + | In the event that your system only uses UMA(Uniform Memory Access) the impact of pinning threads is definitely less pronounced but some benefit can still be gained. | 
Latest revision as of 03:56, 11 September 2021
After the initial creation of your Virtual Machine there are a number of performance tweaks you can make to your Guest's .XML file and/or the Host system to greatly increase the Guest's performance.
Contents
hugepages[edit]
Hugepages are a function that lets the system kernel use larger pages when reading or writing information to memory (RAM). When this is enabled and the Guest is configured to use them the performance can be greatly increased. How to enable Hugepages depends on your GNU/Linux distribution.
Note: When hugepages are configured this portion of memory is taken away from the Host. This means the Host will no longer be able to use it. Keep this in mind.
Debian (Ubuntu/Mint/Lubuntu/PopOS/etc)[edit]
First check if Linux isn't already using Hugepages with: cat /proc/meminfo | grep Huge.
If the output resembles the following:
AnonHugePages:       2048 kB
ShmemHugePages:         0 kB
HugePages_Total:        0
HugePages_Free:         0
HugePages_Rsvd:         0
HugePages_Surp:         0
Hugepagesize:        2048 kB
Hugetlb:                0 kBthen Hugepages aren't enabled.
To enable Hugepages first check /etc/sysctl.conf for the following entries:
vm.nr_hugepages=
vm.hugetlb_shm_group=If they don't exist they can be appended to the end of the file.
The general rule of thumb is 1 Hugepage for every 2MB of of RAM to be assigned to the VM:
vm.nr_hugepages=8192
vm.hugetlb_shm_group=48The above example will set aside approx 16GB. After saving the changes reboot the system.
Now to verify the changes rerun: cat /proc/meminfo | grep Huge
The output should resemble the following:
AnonHugePages:          0 kB
ShmemHugePages:         0 kB
HugePages_Total:     8192
HugePages_Free:      8192
HugePages_Rsvd:         0
HugePages_Surp:         0
Hugepagesize:        2048 kB
Hugetlb:         16777216 kBHugepages are now enabled.
It's also recommended by RedHat to disable the old transparent hugepages. This can be done with:
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabledNow restart the computer.
Assigning Hugepages to VM[edit]
To make the VM use Hugepages enter the VM's .XML file and add <memoryBacking><hugepages/></memoryBacking> to the memory section:
...
<memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
...Hugepages are now enabled.
hyperv[edit]
Hyperv has a number of variables that can change the way in which the VM interacts with system resources. A few variables that can help preserve resources for the system are by appending:
<vpindex state='on'/>
<runtime state='on'/>
<synic state='on'/>
<stimer state='on'/>to the hyperv section of the VM's .XML file.
...
<hyperv>
  <related state='on'/>
  <vapic state='on'/>
  <spinlocks state='on' retries='8191'/>
  <vpindex state='on'/>
  <runtime state='on'/>
  <synic state='on'/>
  <stimer state='on'/>
...This edit doesn't help the VM perform better as much as it helps preserve system resources if the plan is to run multiple simultaneous instances.
vcpupin[edit]
vcpupin is the process where-in each vCPU assigned to the VM is tied to a physical core/thread. Configuring this has the most profound impact when dealing with a system that has multiple NUMA Nodes because it forces requests to memory to stay on one node without having to cross Intel's QPI links or AMD's Infinity Fabric. This also helps by tying the vCPUs to the Node that is directly connected to the GPU.
NOTE: In many cases a function in the BIOS known as Memory Interleave will obfuscate the NUMA Nodes making the system treat multiple sockets or multiple dies as one UMA(Uniform Memory Access) Node. This is a problem for multiple reasons. To fix this find Memory Interleave in your BIOS and set it from Auto -> Channel. If Memory Interleave does not have these options on a single socket system then chances are the system only operates in UMA mode. In such case this is fine.
Identifying CPU Affinity[edit]
Their are a couple of options available to determine what PCI_e device and CPU threads are connected to which NUMA Node. On Debian lscpu & lspci -vnn can both be used to determine which node a PCI_e device is connected to and which threads belong to that node.
Another option is lstopo (the hwloc package). This application provides a GUI overview of the cores/threads and what PCI_e device(s) are connected to which.
Assigning CPU Affinity[edit]
To tie cores/threads to a VM open the VM's .XML file and find the vcpu placement section. Beneath this you will append cputune and modify it's configuration:
<vcpu placement='static'>16</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='9'/>
    <vcpupin vcpu='6' cpuset='10'/>
    <vcpupin vcpu='7' cpuset='11'/>
  </cputune>CPU tune works on a per thread basis. If NUMA Node 0 or NUMA Node 1 were threads 0-3,8-11 then you would assign these to the VM's vCPU's with the above example.
Verifying CPU Affinity[edit]
To verify if the CPU threads have been successfully pinned to the VM the virsh vcpupin name-of-vm command can be used. The output for a VM with 8 vCPUs configured as above would look like this:
 VCPU   CPU Affinity
----------------------
 0      0
 1      1
 2      2
 3      3
 4      8
 5      9
 6      10
 7      11In the event that your system only uses UMA(Uniform Memory Access) the impact of pinning threads is definitely less pronounced but some benefit can still be gained.
