The Youtube channel Hardware Unboxed published a video yesterday taking a look at gaming loading times on PC using NVME PCIE 4.0 SSDs. They performed synthetic benchamrks and then looked at gaming load times for a variety of different drives - PCIE 4.0, PCIE 3.0, SATA SSD, and regular HDDs.
As expected, it confirmed what we all know - there are major bottlenecks that exist on PC when it comes to gaming I/O. Despite the fact that PCIE 3.0+ NVME SSDs offer signficant performance improvements over SATA SSDs, there is effectively zero meaningful improvement in the loading times of games.
You're reaching entirely the wrong conclusion from this.
All the data shows is that current games are not built to take advantage of high-performance drives.
That does not mean "major bottlenecks" are preventing NVMe drives from performing faster than SATA drives in a PC.
As an example:
If a game is built for an HDD, it might be considerably easier to build a single-threaded loading system.
That could be all you need with the HDD, as the data is coming in slow enough that there's no advantage to the additional complexity of building a multi-threaded loading system.
Maybe the single core that loading is running on is only hitting 33% utilization when the data is coming from an HDD.
Adding a SATA SSD will speed up that loading process by eliminating the disk bottleneck.
Load times in this example might speed up by a factor of three, which is a lot quicker than the HDD.
Replacing the SATA SSD with an NVMe SSD would only speed that up very slightly though. Why?
Well, the data transfer rate is much higher, but the bottleneck with the SATA SSD was already the single-threaded loading, not the data transfer rate.
The SATA SSD speeds alone were fast enough that the single-threaded loading is pushing that one CPU core to 100% to achieve the 3x speed-up.
Adding a faster drive doesn't affect how quickly the CPU can decompress that data.
A game built to take advantage of SSDs does not behave this way because the loading will be multi-threaded.
It may be more complex to build, but let's say you develop a multi-threaded loading system which scales linearly with the number of cores.
Now the limit on a 16-core CPU would be a 48x speed-up rather than 3x.
Switching from an HDD to a SATA SSD might be a 5x speed-up rather than 3x now, and switching to NVMe may reach that full 48x speed-up.
In that case, even the 16-core CPU might be holding back a really fast PCIe 4.0 NVMe drive; which is where new APIs and hardware decompression comes in. Instead of using all your CPU cores for the decompression, it's done in hardware which could reach speeds of several-hundred times without breaking a sweat - so you're back to being limited by the drive again.
Another factor in this is that NVMe drives are not always the big speed-up over SATA drives that is claimed.
Sure, large sequential reads might be able to hit 7 GB/s, but small random reads are only reaching 61 MB/s in this test.
That's why we need to move beyond NAND. Optane is a further 4–5x speed-up over typical NVMe drives, reaching almost 300 MB/s in the same test:
Optane DIMMs are many times faster than that too; but cost is prohibitively expensive for consumers right now. It's pro-grade hardware.