Benchmarking memory on an inexpensive PC

As I previously mentioned in an early-January post, late last summer I built up three systems for donation to a local charity for subsequent “back to school” donation to their clients (followed by five more systems I assembled and donated ahead of the holiday 2022 gifting season…but that’s another story for another day). Here are the specs of the first Intel-based system I assembled (not-listed accessories: keyboard and mouse, speakers, and mic-inclusive webcam):

And for the second Intel-based system:

The third, AMD-based system is the primary focus of this particular piece. I shared several photos of it back in January; here again is the relevant one for today’s discussion:

As I alluded to back in early January, it (like all of the systems I assembled for donation) was sufficiently outfitted to enable a free upgrade to Windows 11, should the recipient desire to do so either now or in the future:

And here are this particular PC’s deets:

That asterisk at the end of the memory entry is important. Here’s the elaboration:

*The memory configuration that the system ended up with when donated

To wit, here’s what I wrote as foreshadowing back in early January:

In a follow-on writeup, I’ll share the specifics on the tests I did with one of the systems prior to donation, wherein I leveraged various DIMMs I had in inventory to vary DRAM speed, total capacity and one-vs-two memory channel population and benchmarked the results using both Windows’ built-in Experience Index and several SiSoftware Sandra tests…Perhaps obviously, getting the system memory speed and capacity right are particularly important when using a CPU with integrated graphics, since a portion of the DRAM will end up finding use as the GPU frame buffer (versus a standalone frame buffer on a discrete graphics board).

Specifically, here are the various system memory options I tested, leveraging a tranche of DIMMs then in home office inventory, all DDR4-3200 (PC4-25600) in base speed:

  • 4 GByte Crucial (single-channel) (22-22-22-52)
  • 8 GByte Crucial (dual-channel) (22-22-22-52)
  • 4 GByte Corsair (single-channel) (16-20-20-38))
  • 8 GByte Corsair (dual-channel) (16-20-20-38))
  • 4 GByte G.Skill (single-channel) (16-18-18-38)
  • 8 GByte G.Skill (dual-channel) (16-18-18-38)
  • 8 GByte Corsair (single-channel) (16-20-20-38)
  • 16 GByte Corsair (dual-channel) (16-20-20-38)

Some of this terminology begs for further explanation before continuing. The chipset and motherboard I used in this system (and more generally all of the systems I assembled for donation) supported up to two concurrently accessed memory channels, with each channel populated by one (mini-ITX) or up to two (microATX) DIMMs (you may recall that the personal workstation I attempted to build for myself last year handled four memory channels, each supporting up to two DIMMs). Conceptually, at least, a dual-channel populated configuration should run faster than its single-channel counterpart…as long as the CPU and software exploit this incremental hardware performance potential. Therein my testing of both configuration options, across a range of total memory capacities.

Secondly, take a look at the string of numbers separated by dashes at the end of every bullet list entry. They’re various key timing specs: here’s what they refer to (quoting from Wikipedia), in sequential order (I’m not going to delve into explanations of terms such as row, column and precharge here; see Wikipedia’s relevant entry, along with a really old but still relevant article from yours truly, for more on DRAM fundamentals):

  1. CAS latency: The number of cycles between sending a column address to the memory and the beginning of the data in response. This is the number of cycles it takes to read the first bit of memory from a DRAM with the correct row already open.
  2. tRCD: The minimum number of clock cycles required between opening a row of memory and accessing columns within it [editor note: typically, but not always, this time is the same for both read and write operations].
  3. tRP: The minimum number of clock cycles required between issuing the precharge command and opening the next row.
  4. tRAS: The minimum number of clock cycles required between a row active command and issuing the precharge command. This is the time needed to internally refresh the row.

Sometimes, tRC (row cycle time) is also listed in the specs, as the fifth number at the end of the dash-separated sequence.

This information is stored in the serial presence detect (SPD) memory, also included on a DRAM module. This nonvolatile memory, typically a 256-byte EEPROM, is larger in capacity than what’s required to encompass the baseline JEDEC-defined information, and manufacturers have leveraged the remaining available space to store additional extension data. Perhaps the most common extension scheme is the Intel-defined Extreme Memory Profile (XMP). A memory supplier speed-tests each module during manufacturing and stores different, typically faster than JEDEC timings in the XMP space, timings which are often enabled by also-different-than-JEDEC operating voltages. If the DRAM controller and accompanying BIOS in the PC support XMP, and if the user enables XMP in BIOS settings, faster-than-JEDEC performance may result.

Here are some BIOS settings screenshots I took of the Crucial memory, which does not support XMP and therefore delivers standard DDR4-3200 JEDEC timings, in operation:

When I first installed the G.Skill memory, XMP was initially not enabled in the BIOS, and the memory modules resultantly (and curiously) only supported slower DDR4-2133 speeds:

Enable XMP, however:

And not only did the full spec’d speeds get enabled, the voltage that the system’s regulator circuitry applied to the modules also bumped up, from 1.2V to 1.35V:

I’ll save you the additional screenshots; suffice to say that the same thing happened when I put both 4 GByte and 8 GByte Corsair memory modules in the PC. So don’t forget to check your BIOS settings (yes, I realize that I’m still using the old firmware acronym, vs modern EFI/UEFI)!

Microsoft claims that you can credibly run Windows 10 (or 11, for that matter) on as little as 4 GBytes of system memory. Call me skeptical, given my personal experiences, but I decided to start my testing at that total capacity level (in case you were wondering, I wasn’t able to find any 2 GByte DDR4-3200 DIMMs available for sale, two of which, if they existed, could have conceivably been used to construct a 4 GByte two-channel pair). One DRAM module nuance which I did choose to ignore, however, involves a term known as memory rank:

A memory rank is a set of DRAM chips connected to the same chip select, which are therefore accessed simultaneously. In practice all DRAM chips share all of the other command and control signals, and only the chip select pins for each rank are separate (the data pins are shared across ranks).

DRAM modules come in both single- and dual-rank variants for mainstream PCs (quad-rank options are even available for high-end systems), and are conceptually similar to single- and dual-channel motherboard and chipset variants at the system level. However, the performance benefit (if any) of a dual-rank module versus its single-rank counterpart is highly application dependent and dubious at best. I “think” that all the modules I used were single-rank in architecture (but don’t quote me on this; sometimes this spec isn’t even documented).

As I already alluded to earlier, I used both Microsoft’s built-in Windows Experience Index (WEI) and several of the utilities in the SiSoftware Sandra testing suite to benchmark the various system memory configuration options. WEI, now rebranded as the Windows System Assessment Tool (WinSAT), is intended to quantify a system’s degree of usability by benchmarking it both in aggregate and across its various subsystems (CPU, GPU, memory, storage, etc.). It appeared beginning with Windows Vista and was originally graphically displayed right in the Control Panel:

Nowadays, you instead need to run WinSAT from an admin command line and wade through the cryptic XLM report files’ results to get at the data (never fear, dear readers, I do it all for you). And regarding SiSoftware Sandra, for each system memory configuration I generated both an “Overall Computer Score” and results for the memory-centric “CPU vs GPU: Memory Latency” and “CPU vs GPU: Memory Bandwidth“ benchmarks (particularly critical tests since, as previously mentioned, the graphics are integrated in this system and the frame buffer is therefore allocated out of normal system memory versus being standalone).

‘Nuff setup, let’s get to the results!

Observations and conclusions, you ask? Of course! Let’s begin with the two sets of asterisked notations in the table. First off, WinSAT (the single-asterisk * entries). In diving into the benchmark-generated reports while writing this piece, I came across the following comments:

Physical memory available to the OS is less than 4.0GB-64MB on a 64-bit OS : limit mem score to 5.9

Apparently, Microsoft quietly agrees with me that 4 GBytes of system memory (especially in an integrated-graphics configuration) isn’t enough to really run Windows 10 robustly. And because WinSAT’s (and the Windows Experience Index predecessor’s) system score is identical to the lowest score of any of the subsystem benchmark tests, you’ll see hard-coded 5.9s for all the 4 GByte system memory variants there, too.

I ran into a similar situation with SiSoftware Sandra. The Overall benchmark first sequentially runs a series of subsystem benchmarks:

  • Processor
  • Memory
  • GPGPU (general-purpose graphics processing)
  • Disk

then combines them (somehow; I can’t find any specifics on the methodology in the published documentation) to come up with the overall score. If any of the subsystem benchmarks fail…no overall score. Well, the GPGPU testing indeed failed with every 4 GByte configuration (see the double-asterisk ** entries in the table), accompanied by a terse, cryptic entry in the report:

Finished Successfully : No

I suspect, again, that this is due to the utility’s perception of insufficient resources available, particularly (again) when the system memory is doing double-duty as the graphics frame buffer. This meant, of course, that SiSoftware Sandra never got to the disk testing, either…but fortunately it’d already successfully completed the GPGPU memory testing (Sandra’s developer, Adrian Silasi, is at least as big a fan of the GPGPU concept as I am!), for which I could extract a valid subsystem score in each case, along with the supplemental memory-focused tests I did.

That all said, I was still able to extract some interesting (IMHO) learnings from the exercise. Let’s go back again to the WinStat results. This particular benchmark was unable to discern the comparably minor timing differences between the various manufacturers’ DDR4-3200 DIMMs, and total capacity didn’t seem to matter, either…as long as you had at least 4 GBytes of system memory and were comparing dual-channel results to each other (8 vs 16 GBytes of total system memory).

WinStat was, however, sensitive (indirectly, at least) to varying single-vs-dual memory channel configurations. The system score for 8 GBytes of single-channel memory was 6.8; that for both 8 and 16 GBytes of dual-channel memory was 8.1 (and remember; a larger score is better in this case). Why? That’s because in all three of these cases, graphics performance was the limiting factor in the overall score, and graphics scored better in dual-channel system memory configurations. I’m postulating that this happened because the dual-channel approach enables the CPU and integrated GPU to concurrently, therefore more efficiently, access memory, whereas the single-channel alternative ended up being a key graphics performance bottleneck.

Now for SiSoftware Sandra. Scan the results and you’ll be able to see that this benchmark is much more sensitive to secondary memory timing differences, even though at a high level all of the memory options ran at DDR4-3200 speeds. 22-22-22-52 Crucial memory had the lowest bandwidth and longest latency, with 16-20-20-38 Corsair memory notably better and 16-18-18-38 G.Skill memory slightly better still. Performance also improved with dual-channel configurations versus single-channel alternatives, even if the memory timings were identical and the total system memory was the same; compare the “8 GBytes (dual-channel, Corsair)” and “8 GBytes (single-channel, Corsair)” data sets. Finally, all other factors being equal, more total system memory was better from a SiSoftware Sandra benchmarking results standpoint.

This is all well and good, but only in an absolute sense. It doesn’t answer the question of what number set is good enough, because (of course) that’s highly dependent on what you’re using the computer for. Equally of-course, the dataset I’ve generated is specific to this exact system configuration, including the exact memory options I’ve plugged into its DIMM slots. Use a faster (or slower) CPU, one with more (or fewer) cores, one with more (or less) on-chip cache amounts and types, and/or one with a newer (or older) architecture, for example, and all bets are off. And finally, of course, if you were to (for example) leverage a discrete graphics card with its own on-board frame buffer instead of the integrated graphics, thereby freeing up more system memory for use by the CPU and other non-graphics system resources, the results would again veer off into yet-uncharted territory. Still…some interesting stuff here, huh?

For the results of all of my tests plus a whole lot of additional data, check out the contents of this downloadable ZIP file. And with that said, I’ve once again wandered into 2,500-plus-word territory, so I’ll step away from the keyboard, put the laptop in sleep, and await your thoughts in the comments!

Brian Dipert is the Editor-in-Chief of the Edge AI and Vision Alliance, and a Senior Analyst at BDTI and Editor-in-Chief of InsideDSP, the company’s online newsletter.

Related Content

Scroll to Top