• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.

Timu

Member
Oct 25, 2017
15,540
Eh, Xbox Series NVME drive is pretty much the same as the PCI Gen 3 drive for read/write speeds, in fact it is slower than my PCI express gen 3 drive. And the 2 lanes of PCI express 4.0 bandwidth on the Series X is equivalent bandwidth to 4 lanes of PCI Express 3.0 on a PC. So I don't think there is going to be such a dramatic difference since I am sure devs will go for what's easiest, ie, having DirectStorage support be similar to what the Series X is doing in games with storage.
While that is true for Gen 3, what about those that target Gen 4 speed? From what I heard Gen 4 is suppose to be better once utilized. Otherwise, what would be the point of getting Gen 4 Nvmes if Gen 3 Nvmes are nearly as good? While I expect Gen 3 to be the norm, certain games will probably use Gen 4 to make those Nvmes worthwhile.
 
Last edited:

Bonfires Down

Member
Nov 2, 2017
2,814
Sure, but it's sort of like if Uber forced all of its drivers to use manual transmission cars; you can blame all the unskilled uber drivers for having herky-jerky rides because they spent 10 years driving automatic and now are fumbling through driving stick. You could point to the manual transmission cars as more efficient, easier to repair, with greater control over shifting, and all the rest of it and say it's not the manual transmission that is at fault, it's just a bunch of drivers who don't know what they are doing.

As the person riding in the back seat, I can look at how I rode with Driver A before and it was fine and smooth and when i drove with Driver A after the Uber Manual Policy Mandate it was herky jerky and a bad experience. It kind of doesn't matter to me that nVidia was spending millions making uniquely tuned automatic transmissions as smooth as possible for each individual car manufacturer.
I agree with this. It's not our problem to figure out what the stuttering issue is with DX12, we just know there is a problem compared to DX11. I'm sure once developers get their heads around DX12 we will all be better off than with DX11, but as it stands it's still an issue.
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
No worries, since I'd got the Hubble imagery with your link, it wasn't a total waste of time.

I'm not sure I'm getting any perceptual differences with my setup. My speeds already seem broken eitherway. I'm seeing similar read speed averages (5000 to 8000 MB/s average), and max read speeds upwards of 50,000 MB/s. Which. Uh... wow.

It's not broken I think? there's a scale number in the overlay next to the max field... 50,000 would be 5GB/s-ish.

You also don't need to use the built-in OSD, you can use Afterburner/CapFrameX to detect read speeds as well.

Edit: Also, I think the more important metric for DirectStorage wouldn't be sustained read speeds but random 64K reads? which DS would do in queues and batches and so you wouldn't have to wait on each read request to complete as was the case with old IO stacks.
 

Teeth

Member
Nov 4, 2017
3,935
Talk about missing the point in its entirety.

Microsoft isn't forcing anyone to utilize DX12, but they can't keep adding new features to an old API like DX11... if FromSoftware for example didn't have time to optimize the DX12 back-end properly they could have easily utilized DX11 on PC but I suspect they just took the faster, easier way out and since they were gonna support DX12 on Xbox Series consoles anyway they chose the same for PC.

Choices like the above crop up in software dev all the time, and choosing the technology you're using correctly is one of the most important decisions that might cause technical debt to make/break a project because your engineers are unfamiliar with a core piece of technology upon which your codebase is built.

The SDK is not to blame here in any way shape or form.... maybe invest in training your engineers on new SDKs/APIs or hire people with history of optimizing for said SDKs and put them to work.

Edit: In the next few years take a look at geometry pipelines as engines/devs take the leap to using Mesh Shaders instead of Vertex Shaders... new tech, possibility to extract more performance but also more responsibility since they run like compute shaders so you can do anything with them... who's responsible for properly learning how to utilize/optimize for the new geometry pipeline?

The SDK is not to blame, but from a consumer perspective, it makes people wary when they hear/find out about something running on something that has historically caused, well, a lot of developers problems. Not even just developers with historically....janky...tech. Powerhouses like DICE have had issues. It's not blaming the SDK, it's the canary in the coalmine for "will this cause new problems in the face of solving old ones".

So it wouldn't be blaming DX12 for causing issues, it'd be blaming places for choosing to use DX12.

Also "choosing the correct technology" is often not up to the people directly having to do the coding.
 

Hoddi

Member
Mar 29, 2021
59
No worries, since I'd got the Hubble imagery with your link, it wasn't a total waste of time.

I'm not sure I'm getting any perceptual differences with my setup. My speeds already seem broken eitherway. I'm seeing similar read speed averages (5000 to 8000 MB/s average), and max read speeds upwards of 50,000 MB/s. Which. Uh... wow.

I wouldn't put too much focus on max read speeds (although the benchmark mode is good for testing your SSD). This is ultimately an SFS demo meaning that each read operation is just 64KB in size. It's just doing so many of them that it would start affecting CPU performance without DirectStorage. There's an option for enabling/disabling DS and there's frankly a big difference in CPU utilization between having it off and on.

It's also worth noting that rendering resolution and framerate have a big effect on the read rate. Running in fullscreen at 4k shows reads up to 400-500MB/s while running in a small window drops it into the sub-50MB/s range.
 
Last edited:

Hoddi

Member
Mar 29, 2021
59
On a different note, DirectStorage isn't locked to NVMe drives.

nlrbi2Y.png

CvlD5G3.png
 
Last edited:

ILikeFeet

DF Deet Master
Banned
Oct 25, 2017
61,987
As can be seen in the video, there is virtually no difference in frame rate, or most other metrics, when enabling DirectStorage. However, in the normal demo mode, which would seem to better approximate the usual gaming experience, turning DirectStorage on lowers CPU usage and temperature by quite a bit. From around 66 degrees and a utilization in the low 20%, to about 59 degrees and a utilization of around 10%. In the benchmark mode, these metrics are closer, but the situation seems to be reversed. There is slightly higher CPU utilization with DirectStorage on.

A couple of things should be noted.

First, GPU based asset decompression is not supported by the DirectStorage SDK yet. Microsoft says that this is on the roadmap, however.


 
Nov 8, 2017
13,098
Based on how adoption of low level api's has gone I'm excited to see what the top tier developers will do with it but also frightened of what this might result in when not so technically inclined teams have granular over IO lol
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt


Also of note, the SFS demo from Intel isn't using the IO Rings/BypassIO APIs specifically designed for Windows 11... so I expect even more of a decrease in CPU usage when these APIs are used.

Edit: Also, contrary to what the article says, BypassIO is not disabled in Win11, it's active for specific configurations only (maybe they're still ironing out bugs)

 
Last edited:

Vimto

Member
Oct 29, 2017
3,714
Compiled version.

If you want to use the 16k images then you'll have to edit demo-hubble.bat and point it to where you've extracted the file. Demo.bat and stress.bat should otherwise run without any changes.

Edit: Fixed the original link with a proper version.

Thanks, ran it on my Samsung 980 Pro and was averaging 4.5GB/s.. Although it seem the speed was degrading over time, maybe throttling as it heating?
DKfV44v.png
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
Thanks, ran it on my Samsung 980 Pro and was averaging 4.5GB/s.. Although it seem the speed was degrading over time, maybe throttling as it heating?
DKfV44v.png

You don't need to worry about that at all, the benchmark is actively trying to impose conditions upon which consistent disk reads are needed.

In real scenarios games would never need to constantly keep loading data from disk like this, the "demo" configuration is more akin to real world scenarios at which you can see how rapid camera movement leads to data loads in 100s of MB/s.
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt


Great talk, also confirms GPU decompression is coming soon.

For now, the process is: Storage -> System Memory -> CPU Decompression (through a custom decompression queue) -> Copy to VRAM

I confirmed on the DX Discord that they plan to introduce GPU decompression which would make the process: Storage -> System Memory -> Copy to VRAM & Decompress

They're also working on DMA to VRAM, so peer-to-peer and the process would be Storage -> VRAM & Decompress

GPU decompression will come first before peer-to-peer but they have no timeline as of yet.
 

vixolus

Prophet of Truth
Member
Sep 22, 2020
54,297
Great talk, also confirms GPU decompression is coming soon.

For now, the process is: Storage -> System Memory -> CPU Decompression (through a custom decompression queue) -> Copy to VRAM

I confirmed on the DX Discord that they plan to introduce GPU decompression which would make the process: Storage -> System Memory -> Copy to VRAM & Decompress

They're also working on DMA to VRAM, so peer-to-peer and the process would be Storage -> VRAM & Decompress

GPU decompression will come first before peer-to-peer but they have no timeline as of yet.
Is gpu decompression direct storage part of Series X/S API already or will that come with the Windows upgrade? Did anyone ask/answer a similar q in the discord?
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
Is gpu decompression direct storage part of Series X/S API already or will that come with the Windows upgrade? Did anyone ask/answer a similar q in the discord?

Series X|S consoles have a dedicated HW decompression block that takes care of that as part of the Velocity architecture... the Series X|S are quite far ahead of PC in that regard.

They're essentially at v4, they're not even using GPU compute for decompression, but have dedicated HW to do it... This will require new GPUs on PC with dedicated silicon for decompression, while GPU decompression (v2 essentially above) will work with existing GPUs that gamers have right now on their PCs.

PC DirectStorage will get there eventually, but for now the goal is DMA to VRAM with GPU compute decompression which would be quite ideal with current hardware.

Edit:

Just to clarify as to the version analogy:

v1: Storage -> System Memory -> CPU Decompression (through a custom decompression queue) -> Copy to VRAM [PC is HERE]
v2: Storage -> System Memory -> Copy to VRAM -> Decompress via Compute Shader
v3: Storage -> VRAM -> Decompress via Compute Shader [Achievable on the hardware available in PCs today]
v4: Storage -> VRAM -> Decompress via Dedicated HW [Xbox Series S|X are essentially here, requires HW advancements in PC GPUs]
 
Last edited:

Firefly

Member
Jul 10, 2018
8,621
Series X|S consoles have a dedicated HW decompression block that takes care of that as part of the Velocity architecture... the Series X|S are quite far ahead of PC in that regard.

They're essentially at v4, they're not even using GPU compute for decompression, but have dedicated HW to do it... This will require new GPUs on PC with dedicated silicon for decompression, while GPU decompression (v2 essentially above) will work with existing GPUs that gamers have right now on their PCs.

PC DirectStorage will get there eventually, but for now the goal is DMA to VRAM with GPU compute decompression which would be quite ideal with current hardware.
How does RTX I/O factor into this pipeline?
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
How does RTX I/O factor into this pipeline?

RTX I/O in my opinion is just Nvidia's marketing name for DirectStorage support in their GPUs... doesn't sound like Nvidia will do anything proprietary there.

Same way they're still using the "RTX" moniker even though it's just DXR (was always slated to be DXR/VRTe... for D3D/Vulkan respectively).
 

dgrdsv

Member
Oct 25, 2017
11,846
RTX I/O in my opinion is just Nvidia's marketing name for DirectStorage support in their GPUs... doesn't sound like Nvidia will do anything proprietary there.
These two:
v2: Storage -> System Memory -> Copy to VRAM -> Decompress via Compute Shader
v3: Storage -> VRAM -> Decompress via Compute Shader [Achievable on the hardware available in PCs today]
Are proprietary since they are done by the GPU driver. You could argue that this is the main part of "RTX I/O" but you're right in a sense that "RTX I/O" as a whole is just Nvidia's name for DirectStorage support.
It could be that "RTX I/O" will be used on platforms without DirectStorage though (Linux or even Windows with Vulkan) so there are possible terminological differences.
 

ILikeFeet

DF Deet Master
Banned
Oct 25, 2017
61,987
You might be wondering if that's substantially faster than games run without DirectStorage, and Ono admits the answer is actually no, not yet: while you'll definitely see a huge speed boost from an SSD over the magnetic spinning platters of a hard drive, and from an NVMe SSD over a slower SATA-based drive, the current implementation of DirectStorage in Forspoken is only removing one of the big I/O bottlenecks — others exist on the CPU.

forspoken_ssd_speed_2.jpg

 

Vimto

Member
Oct 29, 2017
3,714
Wait, current API is capable of pushing 2.8GB/s?

How is that possible? And if so, why arent we seeing the 2-3 seconds loading times on older games??
 

Henrar

Member
Nov 27, 2017
1,905
Wait, current API is capable of pushing 2.8GB/s?

How is that possible? And if so, why arent we seeing the 2-3 seconds loading times on older games??
Because games during loading are doing more stuff than just moving files from disk to runtime memory. The bottleneck in those games exist elsewhere.
 

Edward850

Software & Netcode Engineer at Nightdive Studios
Verified
Apr 5, 2019
990
New Zealand
Wait, current API is capable of pushing 2.8GB/s?

How is that possible? And if so, why arent we seeing the 2-3 seconds loading times on older games??
Loading isn't just about moving data from disk to RAM with most games. It can be, for example the idtech1 games (Doom, etc) had assets highly optimised around memory alignments and structures meaning most tasks were either straight up memory copies or aligned around linear reads, however the cost was no compression was used to store the assets, and tools had to be designed around manipulating and compiling binary structures. Unless you were bleeding for RAM that is, as it'd keep having to swap stuff off the zone allocator and then fetch it again off the disk for larger operations. :V

As time went on this kind of design was phased out, in exchange for compressed data and various forms of data pre-processing (parsing text files and implicit structures that needed to be expanded out to more complex states, for example). This kind of data is highly non-linear and can require a lot of CPU overhead to process, let alone read the information in the first place. Coupled with decompressing data on the CPU to then hand off to the GPU, things get busy very quick.

Kind of the funny thing about all this? The idea of games being able to load and swap levels and level data instantly has always been possible without SSDs. Halo 2 was already doing this on the original Xbox (in the campaign anyway, though occasionally popping in texture mips due to needing to rush the game out the door at the very end). Heck, Crash bandicoot was doing a form of this on the PS1 as it streamed level chunks off the disc. It's just workflows became highly focused around unprocessed and compressed data that made things rather inefficient on the CPU side. Not that it was a bad decision in a vacuum, mind you. Sort of a more "pick your poison" type scenario. SSDs and DirectStorage now are providing alternatives by allowing for highly efficient non-linear reads and the potential to stream and process data directly to and on the GPU instead.
 
Last edited:

dgrdsv

Member
Oct 25, 2017
11,846
That's pretty much what I've expected. DS isn't providing much improvement in load times, and likely won't provide much even with GPU decompression. Even 4.5->1.9 sec isn't anything worth pursuing IMO.
Now the ability to use storage for streaming more data each frame is much more interesting. But games will have to start requiring NVMe SSDs for that to happen.

Wait, current API is capable of pushing 2.8GB/s?

How is that possible? And if so, why arent we seeing the 2-3 seconds loading times on older games??
They are fully capable of pushing way more than 2.8GB/s. PCIe4 NVMe drives are hitting >6GB/s easily. APIs are not and never were an issue. The need to process the data which is read before it can be used is the issue.
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
That's pretty much what I've expected. DS isn't providing much improvement in load times, and likely won't provide much even with GPU decompression. Even 4.5->1.9 sec isn't anything worth pursuing IMO.
Now the ability to use storage for streaming more data each frame is much more interesting. But games will have to start requiring NVMe SSDs for that to happen.


They are fully capable of pushing way more than 2.8GB/s. PCIe4 NVMe drives are hitting >6GB/s easily. APIs are not and never were an issue. The need to process the data which is read before it can be used is the issue.

The stated goal of DirectStorage is minimizing CPU overhead, as of the current version saving are between 20-40%, which will increase a lot more when GPU decompression and DMA to VRAM come along.

No idea why they're comparing loading times here, improvements aren't expected to be reflected best there.
 

Deleted member 14089

Oct 27, 2017
6,264
The stated goal of DirectStorage is minimizing CPU overhead, as of the current version saving are between 20-40%, which will increase a lot more when GPU decompression and DMA to VRAM come along.

No idea why they're comparing loading times here, improvements aren't expected to be reflected best there.

I advise you not to engage with that user, unless you want a headache.

Nontheless, it's a pre-mature conclusion by them anyway.
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
These two:

Are proprietary since they are done by the GPU driver. You could argue that this is the main part of "RTX I/O" but you're right in a sense that "RTX I/O" as a whole is just Nvidia's name for DirectStorage support.
It could be that "RTX I/O" will be used on platforms without DirectStorage though (Linux or even Windows with Vulkan) so there are possible terminological differences.

There's nothing proprietary about using a compute shader to run a GPU-optimized decompression algorithm... it's non-proprietary by virtue.

Vendor specific driver optimization can be done to make a shader run faster on certain hardware, but that's about it.
 

Sweet Blue

Member
Nov 1, 2018
244
Ok, I have a pretty vague idea of what the availability of this API can bring to the game industry and...
I tried out the Hubble Demo in Benchmark Mode on my machine and I had this :

ldrnmNL.jpeg


I'm unsure of what the bandwith represent because uh... The numbers seem crazy high?
Is my NVME streaming textures @ 1TB/s? O_o
 

Deleted member 93062

Account closed at user request
Banned
Mar 4, 2021
24,767
Ok, I have a pretty vague idea of what the availability of this API can bring to the game industry and...
I tried out the Hubble Demo in Benchmark Mode on my machine and I had this :

ldrnmNL.jpeg


I'm unsure of what the bandwith represent because uh... The numbers seem crazy high?
Is my NVME streaming textures @ 1TB/s? O_o
It seems like it's averaging out at 1.118GB/s.
 

dgrdsv

Member
Oct 25, 2017
11,846
The stated goal of DirectStorage is minimizing CPU overhead, as of the current version saving are between 20-40%, which will increase a lot more when GPU decompression and DMA to VRAM come along.

No idea why they're comparing loading times here, improvements aren't expected to be reflected best there.
20-40% of what? Current CPUs are hardly even loaded by reading NVMe drives at their full speed. You'll also be hard pressed to find a PCIE4 CPU which isn't somewhat current.

There's nothing proprietary about using a compute shader to run a GPU-optimized decompression algorithm... it's non-proprietary by virtue.

Vendor specific driver optimization can be done to make a shader run faster on certain hardware, but that's about it.
What? Compute shader is a program. It can be proprietary just as easily as any program.
It can also not be but considering that we're talking about a Windows GPU driver here - it will 100% be proprietary.
Note that DirectX is also "proprietary".
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
20-40% of what? Current CPUs are hardly even loaded by reading NVMe drives at their full speed. You'll also be hard pressed to find a PCIE4 CPU which isn't somewhat current.


What? Compute shader is a program. It can be proprietary just as easily as any program.
It can also not be but considering that we're talking about a Windows GPU driver here - it will 100% be proprietary.
Note that DirectX is also "proprietary".

Microsoft and GPU vendors are working on a vendor-agnostic compression format that would be effective/suitable to decompress on the GPU (check Andrew Yeung's talk on GameStackLive last year) ... you can write your own decompression shader for the compression but what has that got to do with RTX I/O?.

I have no idea how to simplify more, the shader can be optimized at a driver level, or Nvidia can write an optimized shader for their own hardware... but everything about the compression format will be available for everyone to see, RTX I/O has nothing to do with this.

20-40% of current load on CPUs caused by IO requests, it's literally in one of the slides in the talk by one of the engineers developing DirectStorage, what are we even discussing here?

60E1jdu.png
 
Last edited:

Flappy Pannus

Member
Feb 14, 2019
2,340
Thanks so much for the further explanatory posts V3N1X, I was precisely wondering what the point of this was right now as from my understanding, the primary reason for this existing was to move decompression entirely off the CPU and have it done by the GPU, which this version doesn't do quite yet. Every subsequent question I had you've answered.

Only downside to me is that this further amplifies the outsized bottleneck that all these game launchers on the PC have wrt actually getting into a game. The game can load in 4 seconds!...after you wait 30 seconds for the publishers launcher to load and sync all your save games. :(
 

Deleted member 93062

Account closed at user request
Banned
Mar 4, 2021
24,767
Thanks so much for the further explanatory posts V3N1X, I was precisely wondering what the point of this was right now as from my understanding, the primary reason for this existing was to move decompression entirely off the CPU and have it done by the GPU, which this version doesn't do quite yet. Every subsequent question I had you've answered.

Only downside to me is that this further amplifies the outsized bottleneck that all these game launchers on the PC have wrt actually getting into a game. The game can load in 4 seconds!...after you wait 30 seconds for the publishers launcher to load and sync all your save games. :(
Next Microsoft needs to bring Quick Resume to PC!
 

Flappy Pannus

Member
Feb 14, 2019
2,340
Insightful video about Directstorage



Somewhat. The video focuses pretty much solely on the advantage of having the GPU decompress textures directly into VRAM, and while that is a huge step, there are other bottlenecks in the current storage API's that DS relieves that weren't touched upon. This video could have been made 2 years ago when DS was first announced, it doesn't actually describe what ver 1 is bringing to the table.
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
Thanks so much for the further explanatory posts V3N1X, I was precisely wondering what the point of this was right now as from my understanding, the primary reason for this existing was to move decompression entirely off the CPU and have it done by the GPU, which this version doesn't do quite yet. Every subsequent question I had you've answered.

Only downside to me is that this further amplifies the outsized bottleneck that all these game launchers on the PC have wrt actually getting into a game. The game can load in 4 seconds!...after you wait 30 seconds for the publishers launcher to load and sync all your save games. :(

🙏

In regards to loading saves, I feel the same way... like if the game can run offline, why wouldn't they sync the saves asynchronously and let me play?
 
OP
OP
V3N1X

V3N1X

Prophet of Truth
Member
Oct 16, 2021
796
Alexandria, Egypt
Would this be an issue for DX12 games that use async shader compilation?

I'd say allow the game to hook its custom shader compilation code in this case? and instead of just going right away exactly back to where you are you get a loading screen while the compilation ends.

I don't know how feasible that would be though, might be more intricacies to figure out.
 

JahIthBer

Member
Jan 27, 2018
10,376
Series X|S consoles have a dedicated HW decompression block that takes care of that as part of the Velocity architecture... the Series X|S are quite far ahead of PC in that regard.

They're essentially at v4, they're not even using GPU compute for decompression, but have dedicated HW to do it... This will require new GPUs on PC with dedicated silicon for decompression, while GPU decompression (v2 essentially above) will work with existing GPUs that gamers have right now on their PCs.

PC DirectStorage will get there eventually, but for now the goal is DMA to VRAM with GPU compute decompression which would be quite ideal with current hardware.

Edit:

Just to clarify as to the version analogy:

v1: Storage -> System Memory -> CPU Decompression (through a custom decompression queue) -> Copy to VRAM [PC is HERE]
v2: Storage -> System Memory -> Copy to VRAM -> Decompress via Compute Shader
v3: Storage -> VRAM -> Decompress via Compute Shader [Achievable on the hardware available in PCs today]
v4: Storage -> VRAM -> Decompress via Dedicated HW [Xbox Series S|X are essentially here, requires HW advancements in PC GPUs]
Tensor cores might take off some of the load, but Nvidia claim tensor cores do more than they actually do so who knows.