Calypto's Latency Guide
Calypto's Latency Guide
Latency is the time between a cause and an effect. An example of latency is input lag, or the time between moving your mouse and the cursor moving on the screen. A good portion
of latency comes from the operating system. In this guide, I list methods to decrease input lag. This guide is mostly oriented towards gamers, but would help for any realtime
application on Windows. Google is your friend if you’re not sure about something in this guide (avoid forums and Reddit). These tweaks aren’t listed in any particular order, but they
are all important, otherwise I wouldn’t bother listing them. Individually, many of these tweaks probably won’t produce a perceivable difference, but if you do every single tweak you
will end up with a significantly more responsive system, even if you usually can’t tell.
You’ll have to change the way you use a PC. In terms of programs, you will need a minimalistic approach. Don’t run anything in the background that you don’t absolutely need.
Heavy programs such as your web browser (Spotify and Discord are reskinned Google Chrome) will slow down your system and cause stuttering. Close them before gaming and
reopen them when you’re done. This goes for other programs. Windows will allocate CPU time to any service or program that is running in the background and will halt all other
programs until the designated program gets its CPU time. This is how multitasking works on operating systems. If you’re curious about scheduling and multitasking, read this, or this.
The averages are quite low. The averages are what you are looking to improve. Intel will have lower averages than AMD. Different timers (TSC/HPET/PMT etc.) will give different results.
The Tweaks:
Disable Hyper-threading / Simultaneous Multithreading (SMT) in UEFI
This feature allows the operating system to see a physical core as two virtual cores. Although good for highly-threaded loads such as rendering or compiling, this feature massively
increases the system’s latency. This is because cores only have one execution unit, which is exacerbated by the operating system attempting to spread the load across both virtual
processors of the same core, which creates a stall while the core’s execution unit is busy with the second virtual processor.
It is ideal to simply disable HT/SMT if you have more cores than your game requires, or force the game to run on separate cores by changing the affinity to every other logical
processor in Task Manager or Process Lasso (example: CPUs 1,3,5,7+ or 0,2,4,6+ etc.). If you have eight or more cores, you can safely turn it off for almost all games. If you have
six or fewer cores, you might be forced to leave it on and change the affinity of the game to prevent contention between the logical CPUs. Another benefit to disabling SMT is lower
power consumption, which raises overclocking headroom.
- Latency test of HT on vs. off
If you happened to buy a mutli-CCX Ryzen, you have a few options to minimize latency:
- Use Downcore Control in UEFI to disable a CCX (Zen 1/2) or CCD on Zen 3 (5900X, 5950X)
- Intercore latencies: Zen 1 / Zen+ / Zen 2 / Zen 3
- Windows 10 1903 has a scheduler update to group threads to CCXs, but this does not have the same effect as disabling a CCX. Another drawback is that you have to use
Windows 10
- If you absolutely need all 8 cores, set affinity to 0-3 or 4-7 (SMT off) in Task Manager to minimize inter-CCX communication, use alternate logical CPUs if SMT is on
(0/2/4/6 or 8/10/12/14 - odd or even doesn’t matter)
- If compatible with your motherboard, you can buy the 5800X3D which has 32MB L3 like the 5800X but also has 64MB L3 off-die to help reduce memory latency
Disabling a CCX will reduce latency since only local cores are available
BCDEdit
Run Command Prompt as admin and paste these italicized commands:
- To undo a command in BCDEdit, do bcdedit /deletevalue X (where X is tscsyncpolicy, useplatformtick, etc.)
bcdedit /set disabledynamictick yes (Windows 8+)
- This command forces the kernel timer to constantly poll for interrupts instead of wait for them; dynamic tick was implemented as a power saving feature for laptops but hurts
desktop performance
bcdedit /set hypervisorlaunchtype off
- Disables the hypervisor which is unneeded on a gaming PC
Device Manager
Open Device Manager (devmgmt.msc) and disable anything you’re not using. Be careful not to disable something you use. Uninstalling a driver via Device Manager will most likely
result in it reinstalling after reboot. In order to completely disable a driver, you must disable it instead of uninstalling. When you disable something in Device Manager, the driver is
unloaded. Drivers interrupt the CPU, halting everything until the driver gets CPU time (some drivers are poorly programmed and can cause the system to halt for a very long time
[stuttering]). What to disable:
Display adapters:
- Intel graphics (if you don’t use it, ideally should be disabled in the BIOS)
Network adapters:
- All WAN miniports
- Microsoft ISATAP Adapter
Storage controllers:
- Microsoft iSCSI Initiator
System devices:
- Composite Bus Enumerator
- Intel Management Engine / AMD PSP
- Intel SPI (flash) Controller
- Microsoft GS Wavetable Synth
- Microsoft Virtual Drive Enumerator (if not using virtual drives)
- NDIS Virtual Network Adapter Enumerator
- Remote Desktop Device Redirector Bus
- SMBus
- System speaker
- Terminal Server Mouse/Keyboard drivers
- UMBus
- In the “Properties” window, be sure to disable “Power Management” for devices such as USB root hubs, network controllers, etc.
- Here is an example of someone’s device manager to give you a better idea: https://i.imgur.com/9sdzhbl.png
Another way to disable services via the registry is simply with a .reg file. Use the “Properties” box in services.msc to get the name of the service, then create a .reg file with entries
such as:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\BluetoothUserService]
"Start"=dword:00000004
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Spooler]
"Start"=dword:00000004
If you get an error when trying to run the .reg, use PowerRun (some services require the TrustedInstaller privilege in order to be modified, such as Windefend or wuauserv).
DHCP Client, Network Connections, Network List Service, Network Location Awareness, and Network Store Interface Service are required to automatically connect to a local
network, but once a static IP is set, they can be disabled. See below to setup static IP:
- Open Network and Sharing Center → Change adapter settings → right click, properties → IPv4 properties → open cmd and type “ipconfig” and fill out the settings as such:
https://i.imgur.com/o1PGS2E.png
- If you’re still confused, see this tutorial on how to set a static IP: https://pureinfotech.com/set-static-ip-address-windows-10/
- Network Store Interface Service may be required on Windows 10 for dogshit programs that determine whether or not you’re connected to the Internet such as Apex
Legends
- System Events Broker is required for Radeon Software to open on Windows 10
Startup
Prevent useless bloat such as Discord/Realtek/Steam/RGB/mouse/keyboard software etc. from starting up with Windows. Your PC will start up faster, and once started will run fewer
unnecessary programs.
1. Press “Windows key+R” → type “msconfig” → go to the “Startup” tab
2. Uncheck everything unless you absolutely need it. Launch it manually instead.
Windows 10 Specific
- Run this debloat script, clean everything else that the script doesn’t (check in Task Manager and Services)
- https://github.com/Sycnex/Windows10Debloater
- Disable Fullscreen Optimizations for every game you play (right click the game’s .exe → Properties → Compatibility → check “Disable Fullscreen Optimizations”)
- May not be necessary with newer Windows versions such as 21H1+, however some games may still perform worse than with exclusive fullscreen
- Use “Real” to reduce audio buffer size
- Disable VBS/HVCI (Windows 10/11)
- Disable fast startup (Control panel → Power Options → Choose what the power buttons do → uncheck “Turn on fast startup”)
- Replace the start menu with OpenShell; OpenShell is faster/lighter than the M$ start menu
- Add .old to StartMenuExperienceHost.exe and SearchApp.exe in C:\Windows\SystemApps to prevent the Win10 start menu from running
- If you want a Windows7-like configuration, import this .xml to OpenShell by pressing “Backup” in OpenShell settings (right-click start button)
Power Plan
By default, Windows uses the “Balanced” power plan which attempts to save energy when possible. Instead, set the plan to “High Performance” in Control Panel→Power Options or
even make a custom power plan using PowerSettingsExplorer. The default “High Performance” plan still has many energy-saving features enabled which is why it is better to create a
custom plan. On W10 1803+ you may enable the “Ultimate Performance” power plan which is a slight step above the regular “High Performance” plan by pasting this command into
CMD as admin:
powercfg -duplicatescheme e9a42b02-d5df-448d-aa00-03f14749eb61
Disable Spectre and Meltdown protection / other mitigations (Windows 10/11 or updated 7/8)
- https://www.grc.com/inspectre.htm
- Example image of what it should look like when you disable mitigations
- In C:\Windows\System32, Rename “mcupdate_GenuineIntel.dll” to “mcupdate_GenuineIntel.dll.old” (change file permissions in Properties→Security)
- Rename “mcupdate_AuthenticAMD.dll” to “mcupdate_AuthenticAMD.dll.old” if using an AMD CPU
Process scheduling
“Quantum” is the amount of time the Windows process scheduler allocates to a thread. You may choose between short or long quantum. Furthermore, you can choose to boost the
foreground quanta by double or triple, meaning the currently highlighted program gets two or three times longer quantum. For gaming, it makes most sense to use long quantum and
three times foreground boost, since we want to maximize CPU time the game gets. The higher the boost, the less the game will be interrupted by background programs. When not
gaming, the drawback to using longer quantum is that apparent responsiveness when using multiple programs may be reduced. In general, the longer the duration of quanta, the
more we minimize context switching. Context switching is computationally expensive and should be minimized to reduce jitter from background processes/threads when gaming.
The table below lists the possible configurations you can tell the scheduler to use. You may select short or long quantum, fixed or variable; and if you select variable, how much boost
(2x or 3x) to give the foreground program. What quantum you decide depends on your use case. The default quantum is dec. 38 for non-server Windows editions, while for server it is dec. 24. This
can be checked in Advanced System Settings → Performance → Advanced. My personal recommendation is dec. 22.
- From the table, add together the decimal values you want and enter that as a decimal to the Win32PrioritySeparation key. You cannot use the third column unless you use
variable quantum. If you are using fixed quantum, ignore the third column
- Changes to Win32PrioritySeparation apply instantly; no restart required
- Examples: Short, 3x = 32+4+2 = dec. 38 Long, 3x = 16+4+2 = dec. 22 Short, fixed = 32+8 = dec. 40
- Possible values, in decimal:
- 20 = Long, variable, no foreground boost (12:12)
- 21 = Long, variable, 2x foreground boost (24:12)
- 22 = Long, variable, 3x foreground boost (36:12)
- 24 = Long, fixed (36:36)
- 36 = Short, variable, no foreground boost (6:6)
- 37 = Short, variable, 2x foreground boost (12:6)
- 38 = Short, variable, 3x foreground boost (18:6)
- 40 = Short, fixed (18:18)
Example of IRQ sharing - four devices share IRQ 16 which will cause interrupts from these devices to compete with each other
Nvidia 3D settings
- Low Latency Mode should be set to “On” instead of “Ultra” if you experience low smoothness or stuttering
- Make sure there are no per-game override settings such as “Image Sharpening” enabled (Apex Legends for example has it enabled by default, despite the global setting)
- Under “Change Resolution,” use display scaling if available, uncheck override scaling mode
Lock GPU clocks (Nvidia only, see the section below for Radeon cards)
This tweak forces the GPU to always run at boost clocks. This prevents the GPU from constantly switching back and forth between different clock speeds which will negatively
impact performance. Ensure you have adequate load temperatures (<70°C) or you will shorten the lifespan of your card. Note that starting with Nvidia 1000 series cards, you cannot
completely lock clocks. The core clock will fluctuate based on load, temperature, or power usage.
In regedit, navigate to the path below and create a dword as such (if you have multiple GPUs installed in your system, the 0000 may be 0001, 0002, etc.):
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e968-e325-11ce-bfc1-08002be10318}\0000]
"DisableDynamicPstate"=dword:00000001
Radeon Settings
- Radeon Software settings:
- Graphics tab (see this image for all graphics settings):
- Radeon Super Resolution: Off
- Radeon Anti-Lag: On (you may experience low smoothness when CPU bound with anti-lag enabled)
- Radeon Chill: Off
- Radeon Boost: Off
- Radeon Image Sharpening: Off
- Radeon Enhanced Sync: Off
- Wait for Vertical Refresh: Off
- Everything else: Off / Lowest
- Display tab:
- AMD FreeSync: Off is almost always better, test 2
- Virtual Super Resolution: Off
- GPU Scaling: Off (unless your monitor doesn’t have scaling support)
- Display Color Enhancement: Off
- HDCP Support: Off (unless you view DRM content)
- Change power limit to max in Performance→Tuning if your power supply is capable
- Raise VRAM clock to something stable
- Radeon Software (the control panel) requires DWM+composition in order to not crash on Windows 7, but it should be disabled once you’ve dialed the settings
- Fullscreen games require Desktop Window Manager and Themes services and Aero theme if using more than one monitor
- Fix for "Do you want to change the color scheme to improve performance?" on Windows 7
Download and install MorePowerTool; this section will massively help with stuttering caused by power saving and downclocking. This tool allows you to change certain
VBIOS values without having to flash the GPU’s BIOS; instead they are read from the Registry, meaning you can easily change or revert them if something is wrong.
- Use GPU-Z to dump your current VBIOS - see this picture if you can’t find the button
- Open MorePowerTool, then use the VBIOS you dumped from GPU-Z to load the Power Play Tables to edit the settings below
- Features
- Feature Control (5xxx)
- Feature Control (6xxx)
- Power:
- You can slightly raise the GPU’s power limit but this is not an overclocking guide so be extremely conservative unless you know what you are
doing, see here for more info
- Frequency: Set reasonable minimum frequencies for GFX, SoC, and Fclk (1900, 900, 1700, respectively, are good starting values for 6xxx)
- Setting these too high will result in instability which will cause microstuttering, stuttering, or crashing
- Ensure your GPU’s cooler is adequate to run these frequencies 24/7, otherwise your GPU will quickly degrade
- SoC and Fclk can have equal min/max which is ideal, equal GFX requires MoreClockTool below
- Dcefclk must remain at its default value
- Fan: Disable “Zero RPM Enable”; this will increase stability and performance in game, while also increasing the lifetime of your card
- Additionally, set a custom fan curve using MoreClockTool below to further prevent degradation from ridiculous default fan curves
- Once finished, click “Write SPPT” and reboot (restarting the driver is not enough)
- Use HWiNFO to make sure minimum frequencies were applied. If your GPU is still downclocking below the minimum you set, it means you have to start
over and figure out which clock was set too high (the driver will use default SPPT values if it doesn’t like a value)
- If you’re satisfied with the values you’ve set, click “Save” so you can apply these changes whenever you update drivers as it deletes the SPPT (Soft
Power Play Table)
Download and install MoreClockTool; using this tool you can raise your maximum clock beyond what is possible through MorePowerTool. Additionally, it allows you to set
your minimum and maximum core clocks to the same value (e.g. 2500 min, 2500 max, although it will still fluctuate) which is not possible through MorePowerTool or
Radeon Software. You can also change a few other performance-related settings through this tool which is faster than using Radeon Software. If you haven’t already,
change the fan curve because stock fan curves are generally extremely relaxed for the high level of heat that GPUs output (the VRAM and other components on the PCB
get cooked due to inadequate airflow). Your fan curve should be at 100% at 80°C or lower. One thing to keep note of, depending on your cooler and the type of load, the
hotspot temperature can be 30°C higher than the “GPU” temperature, which can cause rapid degradation if not kept in check (e.g. 60°C GPU, 90°C hotspot).
- If you get an error when clicking “Set” then you need to change your performance tuning profile to “Custom” in Radeon Software
- Every time the system crashes (regardless of whether the GPU was at fault) you have redo everything, so saving the settings to a file will save some time
Interrupt affinity
Using Microsoft’s Interrupt-Affinity Policy Tool (backup link), you can set affinity for a driver’s interrupts. Do not go overboard. You can make the system perform worse if you
randomly start changing affinities. Ideally each device should have its own core, or left alone if you have already dedicated your most important devices to every available core.
- Changing the interrupt affinity of some drivers may prevent you from booting. If this is the case, use recovery mode to boot from last known good configuration
- Default install dir: C:\Program Files (x86)\Microsoft Corporation\Interrupt Affinity Policy Tool (use the x64 executable)
1. Run as admin
2. Open Device Manager and click “View”→”Devices by connection.” Then expand all devices, as you will need this to see which devices are connected to which port/bridge
a. If you open the properties of the device, it will show “PCI bus #, device #, function #,” you will need this in case multiple devices share the same name (e.g. two
xHCI controllers, both named “USB xHCI Compliant Host Controller”)
3. Select a driver and click “Set Mask” (this is for IrqPolicySpecifiedProcessors)
a. Select the core you want the driver to be executed on
b. If you have HT or SMT, avoid every other CPU to ensure one core doesn’t get two interrupts at once; you want to avoid contention that comes from two logical
processors sharing an execution unit
c. Press the “Advanced…” button for other choices (not useful unless you have drivers that use MSI-X, or you have a multi-socket system)
d. Do not restart drivers for storage devices or root ports with storage devices attached, restart your PC instead to prevent risk of data corruption
Generally drivers perform best when affinity is set to a single core. If a device uses MSI-X, it might be better to use IRQPolicySpreadMessagesAcrossAllCores. Every time
you update a driver (such as your GPU driver) you will have to set the affinity again. Examples of devices to change:
- GPU
- Setting the graphics card onto a single core gives the best performance, however setting it to a busy core will result in worse performance. You will have
to find out which core performs best by benchmarking, such as using menu FPS or something very consistent with high FPS (500+) that you can
reproduce easily. Usually it is the last core.
- USB controllers (xHCI/EHCI; also works best on a single core, test polling using MouseTester)
- Storage controller(s), audio controller(s)
- Network controller (set to IrqPolicySpreadMessagesAcrossAllProcessors if using RSS, see example)
- To check the device ID, open Device Manager, click View and select “Devices by connection,” right click on a device, Properties, Details, Physical Device Object Name
Benchmarking affinities or driver latency
- MouseTester for benchmarking xHCI/EHCI controller affinities
- liblava-demo for benchmarking GPU affinities, or anything else with extremely high FPS
- Use CapFrameX or something similar to benchmark average FPS, 1% and .1% lows
- Use AutoGPUAffinity to automate the process
- Xperf for benchmarking execution latencies for each driver. A script will make using it very easy
- My simple batch script which includes a Windows 7 download link without having to install all of ADK
- Timecard's script which uses PowerShell if you prefer that instead
- Permon (Ctrl+R) allows you to see DPCs and interrupts per core
- Go to “Performance Monitor” → click the green “+” sign → Processor → select DPC Rate, DPCs Queued/sec, Interrupts/sec → <All instances> → press “Add > >” then
“OK” → Change to “Report” view
- Run the game or program that you normally would and move perfmon to another monitor to see DPC/interrupt activity in realtime
Lower Latency Hardware (centered around gaming, not professional tasks such as low latency audio)
Disclosure: I receive a commission through these Amazon product links at no cost to you.
CPUs:
For optimal smoothness in gaming, an 8-core CPU is the minimum. 6-core CPUs are obsolete and will not be able to smoothly run modern games at high frame rates. Ryzen is
excluded for latency reasons. 12th and 13th gen (Alder/Raptor Lake) CPUs are the only CPUs that make sense to buy in this current market. Also, ensure your CPU’s frequency is
locked across all cores to minimize latency and jitter from constant clock switching.
i7-3770K (4C/8T)
- Outdated for modern games; however, the L2 hit latency is 10ns lower than current Skylake-based CPUs (~10ns vs. ~20ns)
- Uses DDR3 which is lower latency than DDR4 due to DDR4’s grouped banks and timing limits on Skylake (ex. tRCDtRP, 28 tRAS, 16 tFAW)
Disable E-cores on 12th generation CPUs (Alder lake) through UEFI as they massively limit overclocking potential of the uncore while also increasing system jitter. The 12900K was
superseded by the generally higher performing 13700K.
i7-12700K/F (8P/4E)
- Lower-binned 12900K, expect lower clocks
- Still a viable option if you need 8 P-cores for less than the 13700K
Raptor Lake is similar to Alder Lake, with the exception of 8x2MB L2 vs. 8x1.25MB L2 on Alder Lake and efficiency improvements. E-cores should be disabled for lower latency,
however the performance penalty of leaving them enabled is not as high as with Alder Lake. If you opt to use E-cores, you will need something like Process Lasso to automate
setting affinities/CPU sets to ensure the game doesn’t get scheduled on the E-cores, or use Windows 11, which has higher latency than previous Windows versions.
i7-13700K/F (8P/8E)
- Lower binned 13900K, main difference is 8 fewer e-cores and 6MB less L3
i9-13900K/F (8P/16E)
- Higher overclocking potential, but at a large premium over the 13700K
CPU Cooling:
As with any other electronic component, the electrical losses are lower the better they are cooled, resulting in better efficiency. Therefore it’s important to have a strong
cooler for the CPU, as the IHS and small die size massively limit cooling performance. AIOs offer better cooling performance than air coolers because the radiators have
higher fin density and the warm air can be directly exhausted out of the case. Another benefit to having water cooling is the ability to mount a RAM fan due to the free space
from not having a tower cooler.
Motherboards:
Cheap motherboards will not allow your hardware to run at its full potential; RAM overclocking is highly dependent on the motherboard, and to a lesser extent CPU overclocking as
well, therefore it is important to be selective when choosing one. Motherboards can be judged by hardware design; things like PCB layout and trace design, PCB layer count, VRM
design, heatsinks, etc. all play a massive role in quality. On the software side, the firmware is also critical to RAM overclocking. A poorly optimized firmware will not take advantage
of the (hopefully) good hardware.
Motherboards with 2 DIMM slots such as mini-ITX will have higher RAM overclocking potential than boards with 4 DIMM slots due to shorter distance from the CPU. 2 DIMM ATX
boards will cost more compared to mini-ITX boards, but have much stronger VRMs. Mini-ITX motherboards also have a large drawback: the RAM is right next to the GPU which
emits a lot of heat. Even if you do not plan on drawing high current, extra EPS connectors help provide more stable power for the CPU’s VRM via lower resistance. Gigabyte
motherboards lack user addressable IOL/RTL settings, which can very negatively impact RAM latency. Intel Z390 is the last generation to have T-topology motherboards for 4 DIMM
overclocking (vs. the daisy chain layout, which suffers when 4 DIMMs are present). One extremely useful feature to consider is BIOS flashback, as it allows you to flash your BIOS
without having to turn the system on. In the case of a failed BIOS flash, flashback should allow you to recover, especially with the newer WSON8 packages (previously SOIC8)
where using external programmers with clips is nearly impossible. Flashback should also allow you to bypass downgrade and modded BIOS restrictions.
Z490:
Asus Z490i
- Reportedly better RAM OC than MSI Z490i, but much weaker VRM
- 8 layer PCB
MSI Z490i Unify
- Requires firmware updates for CR1 support
- 10 layer PCB
- Direct phase design
MSI Z490 Unify ATX
- 6 layer PCB, decent value for quad rank, ample VRM but uses doublers
Asus Z490 XII Apex
- Only 6 layer PCB, 2 DIMM slots
EVGA Z490 Dark
- Windows XP support
- 10 layer PCB, 2 DIMM slots
- Direct phase design, can disable LLC
Z590:
Avoid the Z590 Gigabyte Aorus Elite/Pro due to faulty power plane design
MSI Z590 Unify-X: $270
- 8 layer PCB, 2 DIMM slots
- 16+2+1 “mirrored” VRM
ASRock Z590 OC Formula: $480
- 12 layer PCB, 2 DIMM slots
- Missing “Hidden OC Item” making the UEFI nearly useless
Asus Z590 XIII Apex: $500
- 10 layer PCB, 2 DIMM slots
- Has new “Vlatch” feature which can detect and report minimum and maximum Vcore voltages through HWInfo
Gigabyte Z590 Tachyon: $530
- 8 layer PCB, 2 DIMM slots
- Direct phase design
EVGA Z590 Dark: $600
- 10 layer PCB, 2 DIMM slots
- Direct phase design, can disable LLC
Z690:
If using Windows 7 and motherboard audio, beware of the Realtek ALC4080 audio chip present on most higher-end Z690 boards as there is no W7 driver. ASRock and Gigabyte
boards are questionable this generation. Be sure to get an aftermarket ILM (independent loading mechanism) since the stock ILM causes warpage resulting in very poor thermal
performance:
- igor’sLAB: German Engineered Bend Aids for Intels LGA1700 – Thermal Grizzly CPU Contact Frame and Alphacool Apex Backplate Thermal Testing | Review
- Gamer’sNexus: $4.35 Fix for Intel Thermal Problems | Thermalright 12th Gen Contact Frame
MSI PRO Z690-A DDR4: $200 (was $160, no longer good value)
- 6 layer PCB, twin phase design
- Barebones Z690 board with a “good enough” VRM for most builds
- There was a batch with broken flashback; if you buy it, ensure flashback works before it’s too late to exchange/return
EVGA Z690 Classified DDR5: $330
- 10 layer PCB, 19 direct phases
- Mid-range DDR5 board with typical EVGA quality and features
Uses ALC1220
EVGA Z690 Dark: $400
- 10 layer PCB, 2 DIMM slots, direct phase design
- No Vccgt for iGPU (Quick Sync, backup GPU, etc.)
- Uses ALC1220, overall better board than the Unify-X
Z790:
The same ILM issue applies to Z790 as well, so be sure to install an aftermarket contact frame (or delid). While Raptor Lake CPUs are compatible with Z690 boards after a UEFI
update, the updated UEFIs may not be well optimized for the platform (such as with RAM overclocking). For that reason, if you have the money and want to avoid potential
headaches, buy a Z790 motherboard. As Z690 stock dries up, you’ll be forced to buy one anyway.
Asus Z790-A Strix DDR4: $347
- 6 layer PCB, unknown phase design
- Backwards-compatible with LGA 1200 coolers (additional mounting holes)
Asus Z790 Apex: $700
- 8? layer PCB, 12 phase teamed design
- Beware of bowed PCBs
EVGA Z790 Dark: $800
- 14 layer PCB, 21 phase (direct?)
RAM:
Having fast and stable RAM can dramatically decrease system latency since numerous games and programs heavily depend on RAM to feed the CPU with data (anything that is not
immediately on the CPU must be retrieved from RAM, which is orders of magnitude slower than the CPU). By default on most systems, DDR4 RAM is clocked at 2133 15-15-15,
which is extremely slow compared to something easily attainable like 3600 15-15-15 with tuned subtings and any decent Samsung B-die RAM kit, and on recent platforms like
Z390/X570 and up. Here you can see some benchmarks of what results are possible just by overclocking RAM:
- Impact of RAM speed on Intel's Skylake desktop architecture by KingFaris
- A benchmark I did a while ago
Built-in overclocking profiles like XMP/DOCP/EOCP can be toggled for better performance, however they are still overclocks and thus do not guarantee stability. On top of that, the
profiles do not include subtimings, meaning there is still a large amount of performance left on the table. Therefore, it’s a good idea to learn how to overclock the RAM yourself to
ensure it is running at its full potential and is stable. You can reference this guide to learn more:
- DDR4 OC Guide by integralfx
When overclocking anything (CPU/GPU/RAM, etc.) it is important to ensure the overclock is stable and temperatures are controlled. Higher temperatures result in lower stability
which can lead to errors that result in data corruption and/or crashes. Thus it is critical when stress testing, that you use multiple stress tests (not at once) for multiple hours to
guarantee stability. An unstable overclock that does not immediately appear to be unstable can also have devastating effects. It can result in constant error-correction which in
games can lead to inconsistent frame times which will be perceived as low smoothness/microstuttering. It is nearly impossible to pinpoint such an issue unless you recognize your
system is overclocked (such as with XMP), as stress tests rarely pick up this type of instability. Another thing to keep note of is heat from other components such as the CPU/GPU
can heat up your RAM which will lead to instability. Removing case panels can help mitigate this issue, but heat will still build up without proper airflow. A RAM fan is a must when
overclocking. Any 140mm fan will do, but it must be securely mounted to blow directly onto the DIMMs, otherwise the effectiveness will be little to nothing.
RGB on RAM is detrimental to performance due to the additional traces and components required for the LEDs. This will increase power draw which will in turn increase heat and
electrical noise which will both interfere with RAM operation, all while driving up cost to you. In terms of DDR4 DRAM voltage, anything under 1.5-1.6V is “safe” for daily use, but
around these voltages the RAM will quickly be thermally limited, even with a fan. The metallic covers on DIMMs are only there for aesthetic and safety purposes to prevent
accidental damage from user error. These covers can be removed for better thermals since they use low quality thermal tape (or just glue) and cover the back of the PCB with foam
spacers which make the RAM run hotter than if the “heatsinks” weren’t there in the first place. The temperature sensors present on DIMMs are located on the Serial Presence Detect
(SPD) chip and do not report the actual junction temperatures of the dies. In reality the memory is probably overheating when the temperature sensor is reporting only 40C, which is
the ambient air immediate to the DIMMs.
All else equal, dual-rank RAM performs better than single-rank RAM. This is because the data is more evenly spread out across different banks, meaning the memory controller is
less likely to run into a bank that is busy refreshing. However, more ranks require more voltage for the same timings and require a high quality motherboard for better signal integrity.
There is also more heat being produced which requires more powerful cooling. Since manufacturers do not state whether their DIMMs are dual rank or not, the only way to really
determine if you’re buying dual rank is to know what chips are being used. In the case of Samsung B-die, a dual rank kit will be 2x16GB since a single rank B-die kit is 2x8GB.
If your hardware allows for it, make sure the “command rate” timing (in your BIOS/UEFI) is set to 1. CR2 (command rate 2) is the default setting on most motherboards since it is
easier to guarantee stability. However, there is a latency penalty when using CR2 since the memory controller will skip X (1, 2, …) cycles before issuing commands to the RAM
chips. However, stabilizing command rate 1 requires a very high quality motherboard, a good IMC (integrated memory controller), and good RAM (Samsung B-die for DDR4). On top
of that, if you have an 11th or 12th gen. Intel CPU, ensure your memory controller is set to “gear 1.” Gear 2 incurs a large latency penalty since the memory controller is running at
half the memory’s frequency. 11th gen. CPU IMCs typically cap with RAM around 3600 MT/s in gear 1, while 12th gen. typically caps around 4000 MT/s with some leeway offered if
Vccsa is increased. Ryzen CPUs are similar in that the Mclk and Fclk need to be 1:1, otherwise you incur a large latency penalty just like 11/12th gen. Intel CPUs. Zen 3 will cap
around 3733 MT/s. Both command rate and gear settings are also dependent on the load on the memory controller. Tighter timings, higher frequency, additional ranks, additional
DIMMs, and additional channels (if applicable) all add stress to the memory controller, with the latter being the heaviest loads. Therefore it is important to recognize what your
limiting factors are.
The “best” consumer DDR4 RAM die in most cases is Samsung 8Gb B-die, as it scales well with voltage allowing for lower timings. Beware of A0 PCB kits which are usually older
(2017-2018). This older PCB layout is less ideal due to the chips being farther away from the DIMM’s pins. The A2 layout is generally better, and is found in recently released kits.
Listed below are typical B-die timings, but do not guarantee Samsung B-die. Use these as base timings; higher price does not guarantee a better bin. Keep in mind many of the kits
in these lists have RGB which is detrimental to performance. If you find two kits with similar timings but dissimilar voltage, the lower voltage kit could imply a better bin.
- 3200 14-14-14-XX
- 3600 14-14-14-XX
- 3600 14-15-15-XX
- 3600 15-15-15-XX
- 3600 16-16-16-XX
- 4000 14-15-15-XX
- 4000 15-16-16-XX
- 4000 16-16-16-XX
- 4000 17-17-17-XX
GPUs:
At low settings, the CPU and RAM are more important than the GPU for high refresh rate gaming. You want a stable foundation (CPU and RAM) before buying a GPU, so a modern
(10700K/5800X+) overclocked eight-core CPU is the minimum for driving high refresh rates. Avoid buying blower cards (one fan), avoid overly cheap cards, and be wary of problems
brought up in reviews. Nvidia cards are much better optimized for DX11 games where the CPU is the bottleneck. AMD GPUs typically perform better in DX12/Vulkan. AMD’s video
encoder is very far behind Nvidia’s; both quality and stability-wise, so keep this in mind (streaming/recording). Linux driver support is typically better for AMD. If you have a Radeon
6000 or Nvidia 3000 series card, consider enabling Resizable BAR as it may help with performance. See this article for more information about requirements and how to enable
Resizable BAR, as well as benchmarks.
- 6800 XT / 6900 XT
- Only good for DX12/Vulkan. OpenGL, DX9, DX10, and DX11 performance suffers behind Nvidia’s offerings (games such as Apex Legends, Counter-Strike, Fortnite,
Minecraft, etc.). When CPU single-threaded bound the performance will quickly plummet below any Nvidia offering due to how well Nvidia’s drivers are optimized
- Beware of driver issues with Radeon cards
- Windows 7 drivers are practically unusable on 5000 and 6000 series, consider Nvidia 30 series for Windows 7 instead
- AMD has no equivalent of Nvidia’s Reflex which helps even when not GPU bound
Storage:
Random accesses are generally what regular usage involves (i.e. gaming, desktop usage), so choosing an SSD with low latency and high RND4K read speeds is important. NVMe
SSDs have much lower latency than SATA SSDs. HDDs should be avoided unless absolutely necessary as they are inherently slow; they take longer to turn on and seek files, while
making extra noise (acoustic and EMI) and using a lot of energy to do so. Most M.2 ports interface through the chipset instead of directly to the CPU. While this isn’t terrible, there is
obviously a latency penalty. Since Zen 3 and Intel 11th gen., motherboards have at least a single x4 M.2 slot that interfaces directly with the CPU instead of PCH, so if applicable,
use those ports for higher performance. One thing to keep in mind is that higher capacity SSDs typically have higher performance and endurance ratings, so note this when
choosing which size to buy (500GB, 1TB, 2TB, etc.). When looking at SSD reviews, any reviewer that doesn’t list system specifications or isn’t using a platform newer than Intel 12th
gen. Or Ryzen 7000 should be disregarded, as CPU performance massively dictates SSD performance (if the reviewer has the Samsung 980 Pro in a comparison and isn’t getting
>90MB/s 4KQ1T1 in CDM, that’s a massive red flag).
From the software side, the operating system and storage drivers also play an important role in SSD speed. The generic NVMe driver that comes with Windows generally performs
worse than manufacturer drivers such as Samsung’s1. Ensure your drive never thermal throttles and has some form of cooling (heatsink, fan, or both). For optimal SSD response,
the CPU should be running at a fixed frequency across all cores with SMT disabled, ASPM and C-states disabled from UEFI, and idle disabled in power plan settings.
1. Use the modded Samsung NVMe driver from Fernando (modded to work with non-Samsung drives):
- Download: Windows 7, Windows 10
Samsung has very questionable reliability, however firmware updates have been released to supposedly address the issues. Be mindful that SSDs frequently go on sale, so never
buy at list price.
Hynix P31: Only buy on sale for <$75 (exclusive to North America)
- Solid performance for a gen. 3 drive but its sequential speeds are quite low, otherwise very energy efficient
WD SN850X: $100
- Beware of issues on Ryzen platforms, otherwise solid performer
Solidigm P44 Pro: $120 (Tom’s Hardware, TweakTown)
- Nearly identical to the Hynix P41 while being slightly cheaper (Solidigm was Intel’s SSD division, now owned by SK Hynix)
Hynix P41: $150 (exclusive to North America)
- Higher bandwidth than SN850X and 980 Pro, edges out the SN850X for a premium
Samsung 990 Pro: $170 (TweakTown)
- Highest RND4K performance and lowest latency out of consumer SSDs, ~115MB/s @ 35µs
- Warning: requires firmware update before use
ZET 983/900p/P4800X/905p/P5800X
- AIC and U.2 drives with much higher performance than M.2 drives, listed by order of highest to lowest latency
- Can be acquired on Ebay cheaply, but ask about wear level before buying
- Requires 20 CPU PCIe lanes to not run through chipset or force the GPU into x8 mode, meaning Intel Z590 or newer
Mice:
Do not use wireless peripherals unless you are willing to forgo a latency penalty of 1+ milliseconds. Higher DPI results in lower latency unless there is smoothing (HERO, Focus+,
3366, and certain 3370/3389 implementations can do 12000+ DPI without additional smoothing). Turn off RGB as it uses extra power, creates additional interference, and loads the
MCU, which can impact the performance of the mouse. Your CPU or chipset’s USB controllers will usually result in the lowest jitter and latency. Ryzen CPUs have a USB controller
on-die, while Intel CPUs have it integrated in the PCH. Regardless of your platform, avoid using external controllers such as ASMedia as they are almost always worse than the
native solutions offered by the CPU/PCH. Ideally you should disable external USB controllers through your UEFI or Device Manager. Ensure your polling rate is set to 1000Hz or
higher. FinalMouse mice are 500Hz by default but can be set to 1000Hz using DM1 Pro S software.
Monitors:
Monitors have many sources of latency, starting from the GPU’s output to the display itself. CRTs have very low latency because lower signal processing is required and the nature
of CRT technology (once the signal is converted to analog, a CRT’s latency is basically the refresh rate), whereas LCDs have multiple components (such as the scalar, timing
controller, source drivers, TFT) and each have their own delays.
I will only cover 240Hz+ monitors since CRTs are no longer in production. The latency can be split into two categories: processing and pixel response time. Processing is the delay
of the monitor processing the signal, whereas response time is how quickly the pixel can change states (manifests as motion blur). An example below shows the separation of the
processing and response time latencies. Note that this selection of monitors is very limited, so don’t base your monitor purchase off a single source. Typically IPS monitors such as
the VG279QM will have lower processing latency than TN monitors, but will suffer from worse response times. Avoid monitors with PWM (pulse-width modulation) at all costs, even if
high frequency. Amazon Renewed monitors are often much cheaper than brand new monitors while only having damaged packaging. It is worthwhile as you can save a lot of money
and have a 30 day return policy if you are not content. Higher overdrive is lower latency, so set it as high as you can tolerate. Black frame insertion (e.g. DyAc, ELMB, etc.)
increases latency and introduces flicker which causes eye strain. Therefore, BFI is not a substitute for having a panel with good response times.
Source: https://www.tftcentral.co.uk/reviews/asus_rog_swift_360hz_pg259qn.htm#lag
Monitor review sites with latency measurements (do not compare latency measurements from different sources due to differing test methods)
https://www.tftcentral.co.uk/reviews.htm
https://www.rtings.com/monitor/reviews
https://pcmonitors.info/reviews/archive/
Miscellaneous links
Windows activation
- https://www.reddit.com/r/Piracy/wiki/megathread/tools
How LCD Response Times are Measured, and Why 10% to 90% GtG Measurements are Moderately Deceptive
https://www.youtube.com/watch?v=MbZUgKpzTA0
Fujitsu Primergy Server BIOS Settings for Performance, Low-Latency and Energy Efficiency
https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-bios-settings-primergy-ww-en.pdf
Follow me on Twitter
https://twitter.com/CaIypto
The fruit of my labor. One of the hardest scenarios in Kovaak’s Aim Trainer (now outdated but still a decent score)