Xbox Series X: A Closer Look at the Technology Powering the Next Generation

nib95 · Mar 24, 2020

Lukas Taves said:
It was small but it was also super fast specially latency wise, so much that the goal of optimizing for xbone was to use the tiled resources feature and have the gpu work 100% of the time in the esram leaving the main memory for the cpu and low bandwidth tasks.

On SX this won't be an issue. 10GB is enough that no single buffer will have problem fitting in, and there will be no need to keep moving stuff around, so developers will be able to utilize it as is and enjoy the

The point of the image is more to show the disproportional behavior of bandwidth contention, not the actual bandwidth (as I recall there was this remark on the slide). The problem is that, when you have 2 or more consumers using the same pool the total bandwidth is not the same as if only one of them was using the bus alone. There is ways to mitigate that, on xbone ms said that the move engines + esram helped with the issue, on X they also said they modified the cpu to try and reduce this effect, but they must have felt it was not enough to make this setup on SX.

Of course, achievable performance won't be sustained in real life.

And please, no need to attack, I never claimed Sony is the only one that provides peak values that are not sustained. My only point is that with a unified pool bandwidth contention is a problem, and ms addressed that by splitting theirs. The alternative would be to put an even faster memory so contention is not an issue, but that's not cost effective.

Microsoft did not address this by splitting theirs, because the CPU and GPU still use the same bus, so similar issues of potential contention arise. Microsoft split their ram this time around not for any efficiency savings (if anything it's less efficient as there's now split speeds and more variables, which makes things potentially a tad bit more complicated for developers), but cost saving.

On the Move Engines, both that and the Esram were solutions to problems that the PS4 did not have (eg to help the Xbox One mitigate the bandwidth drawbacks of DDR3 and to help manage things with a split pool of ram). The Move Engines essentially move data from one RAM pool to another, saving the CPU/GPU from having to waste cycles on that work, though in real world terms you still lose bandwidth from your peak (more so infact than you do with the PS4 in real world testing). The PS4 has a unified pool, hence the requirements are slightly different, similar to the One X, which is no doubt why the One X also dropped the Move Engines.

Ultimately, not only are the kind of contentions you're speaking about not unique to Sony (though your implication was the opposite) but efficiency and other aspects included to try and help minimise said contentions are not going to be exclusive to Microsoft either.

Iron Eddie · Mar 24, 2020

On a different note I guess if these new consoles do get delayed to next year first party crossgen games like Halo Infinite will still be coming out hopefully. So Microsoft will have that going for them.

Scently said:
While the memory on the XSX can be addressed in a unified manner, XSX has a much wider bus. So while it still has to deal with contention its going to be a lot less severe, especially given the fact that the CPU and the GPU access different addresses; the GPU on the fast path memory and the CPU on the standard path.

You do realize this back and forth will never stop right? Yes it's interesting to dissect the numbers given but when at least one party is biased they will always lean towards defending that system over the other.

Scently · Mar 24, 2020

nib95 said:
Microsoft did not address this by splitting theirs, because the CPU and GPU still use the same bus, so the similar issues of potential contention arise. Microsoft split their ram this time around not for any efficiency savings (if anything it's less efficient as there's now split speeds and more variables, which makes things potentially a tad bit more complicated for developers), but cost saving.

On the Move Engines, both that and the Esram were solutions to problems that the PS4 did not have (eg to help the Xbox One mitigate the bandwidth drawbacks of DDR3 and to help manage things with a split pool of ram). The Move Engines essentially move data from one RAM pool to another, saving the CPU/GPU from having to waste cycles on that work, though in real world terms you still lose bandwidth from your peak (in real world testing, more so infact than you do with the PS4). The PS4 has a unified pool, hence the requirements are slightly different, similar to the One X, which is no doubt why the One X also dropped the Move Engines.

Ultimately, not only are the kind of contentions you're speaking about not unique to Sony (though your implication was the opposite) but efficiency and other aspects included to try and help minimise said contentions are not going to be exclusive to Microsoft either.

While the memory on the XSX can be addressed in a unified manner, XSX has a much wider bus. So while it still has to deal with contention its going to be a lot less severe, especially given the fact that the CPU and the GPU access different addresses; the GPU on the fast path memory and the CPU on the standard path.

RogerL · Mar 24, 2020

Liabe Brave said:
Actually, I believe in that quote Mr. Goossen is talking about the SFS feature, the Series X version of Partially Resident Textures. Though at other times Microsoft has discussed using SSD as paged extra memory, as your link elaborates.

Yes, but how is that actually implemented?
Suppose you tile a texture.
Will it be the CPUs responsibility to calculate what tiles that should be loaded? (That quite complex calculations, but could be pre-calculated offline)
Or will it be first touch in a tile by GPU that starts download? (Letting the GPU work on something else until data is loaded, or even render using existing LOD)
Or will it be multi pass - GPU renders with TILE# instead of actual data, CPU fetch all needed TILEs to GPU memory, GPU renders actual texture

Scently said:
I don't think MS will design such a streaming system and then purposefully cripple it by not providing adequate bandwidth for it. The PS5 SSD bandwidth is higher but I expect the SSD bandwidth in the XSX to be able to accommodate what they want to do.

Microsofts is not crippled, it is actually same or better than the professional Radeon Pro SSG
" The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec "

https://www.amd.com/en/products/professional-graphics/radeon-pro-ssg

Updated: AMD Announces Radeon Pro SSG: Fiji With M.2 SSDs On-Board

www.anandtech.com

It is just that it looks like SONY made something even better!

This came in while writing - exactly like this!

Lukas Taves said:
Bandwidth and latency will matter a lot, yes. But AMD has been demoing this tech, in some super complex scenes using a SSD about 3 times lower than the raw SX throughput, without hardware compression, without the modifications to have the ssd to output directly into ram.

At around 1h:10 mark they demo it. A scene with 250 billion polygons load, on a system with lower throughput than what Ms has put in, with none of the hardware accelerations to reduce cpu usage, and it's already able to provide much more data than the gpu can realistically consume in 30fps.

So most likely we will be seeing production and storage bottlenecks much sooner than the ssd setup limiting game design or how beauty games are going to be now.

nib95 · Mar 24, 2020

Scently said:
While the memory on the XSX can be addressed in a unified manner, XSX has a much wider bus. So while it still has to deal with contention its going to be a lot less severe, especially given the fact that the CPU and the GPU access different addresses; the GPU on the fast path memory and the CPU on the standard path.

Right, but they're still on the same bus. And many are assuming that devs will never need or use more than 10GB for GPU/graphics use, when that may not be the case, especially if the OS footprint size reduces, or if the PS5 offers more useable ram to devs. Microsoft said its "typically easy" to fill the 2.5GB with non GPU related data, but that's still not a guarantee.

Ultimately, this is all much a do about nothing. In real world testing the PS4 in games manages 172 GB/s, which is surprisingly close to its peak of 176 GB/s. If anything I expect both these new systems to be similarly if not more efficient on that front.

Kerberosorthos · Mar 24, 2020

nib95 said:
Right, but they're still on the same bus. And many are assuming that devs will never need or use more than 10GB for GPU/graphics use, when that may not be the case, especially if the OS footprint size reduces, or if the PS5 offers more useable ram to devs. Microsoft said its "typically easy" to fill the 2.5GB with non GPU related data, but that's still not a guarantee.

Ultimately, this is all much a do about nothing. In real world testing the PS4 in games manages 172 GB/s, which is surprisingly close to its peak of 176 GB/s. If anything I expect both these new systems to be similarly if not more efficient on that front.

Just for reference, the RTX 2080 Super has "only" 8GB" VRAM so 2 GB less than the XSX.

Deleted member 20297 · Mar 24, 2020

nib95 said:
Ultimately, this is all much a do about nothing. In real world testing the PS4 in games manages 172 GB/s, which is surprisingly close to its peak of 176 GB/s. If anything I expect both these new systems to be similarly if not more efficient on that front.

Where is this achieved? The SDK itself doesn't speak of any number near that, nor this leaked image:

nib95 · Mar 24, 2020

Kerberosorthos said:
Just for reference, the RTX 2080 Super hast "only" 8GB" VRAM so 2 GB less than the XSX.

Right, basically a similar situation to the start of last gen, when even the highest end single GPU's typically only had 2-4 GB, whilst the consoles had 8GB. Eg the the GTX 7080 Ti (3GB) and R9 290X (4GB). At least the RTX 2080 Ti actually comes closer with 11GB. Comparable GPU's to the PS4/XO at the time (HD 7870) had only 2GB, so the RTX 2080 Super with 8GB isn't bad going.

On that point though, these are next gen systems that are to last 7+ years, of course they're not going to be comparable in ram terms to what exists today, because they're designed to handle the games of tomorrow.

c0de said:
Where is this achieved? The SDK itself doesn't speak of any number near that, nor this leaked image:

I'm talking about the figures developers themselves have quoted, eg Just Add Water, Guerilla Games etc. That slide is from a leaked early presentation, so doesn't have the same real world or present day basis. Also, without seeing the entire presentation, that slide may not be in proper context.

Lukas Taves · Mar 24, 2020

RogerL said:
Yes, but how is that actually implemented?
Suppose you tile a texture.
Will it be the CPUs responsibility to calculate what tiles that should be loaded? (That quite complex calculations, but could be pre-calculated offline)
Or will it be first touch in a tile by GPU that starts download? (Letting the GPU work on something else until data is loaded, or even render using existing LOD)
Or will it be multi pass - GPU renders with TILE# instead of actual data, CPU fetch all needed TILEs to GPU memory, GPU renders actual texture

DF article touched that a bit, reading that and coupled with that Ms guy tweets this is how I understand the feature works:

- The gpu has hardware to flag the resource (texture for example) and the tiles it needs from it
- There seems to have some logic in place that based on current used tiles and other information tries to guess the next ones to load
- The ssd loads the tiles only instead of the whole texture in some sort of hierarchy in place. For example for the tiles more likely to be shown it loads in the maximum resolution, for the less likely it loads a lower resolution.
- In the case of a cache miss (as in a resource that was thought that wouldn't be needed but was) the system automatically handles replacing the lower res version with the higher res one when available (so kinda like old ue3 texture streaming but handled at a system level)

All this seems to be implemented with a mix of hardware + software, the custom hardware to support it, and the direct storage apis.

Iron Eddie · Mar 24, 2020

nib95 said:
Right, basically a similar situation to the start of last gen, when even the highest end single GPU's typically only had 2-4 GB, whilst the consoles had 8GB. Eg the the GTX 7080 Ti (3GB) and R9 290X (4GB). At least the RTX 2080 Ti actually comes closer with 11GB. Comparable GPU's to the PS4/XO at the time (HD 7870) had only 2GB, so the RTX 2080 Super with 8GB isn't bad going.

On that point though, these are next gen systems that are to last 7+ years, of course they're not going to be comparable in ram terms to what exists today, because they're designed to handle the games of tomorrow.

Consoles never lead when handling the games on tomorrow, they are locked devices and will remain static once they launch. Meaning the PC will always eventually move well beyond them and are already doing things the consoles won't reach like raytracing limits. Which explains why so many games will likely continue to be designed at a 30fps lock which you seem to not have any issues with.

The best news of all of this are the new consoles adopting a SSD. This means the PC will have to eventually move forward as well which shouldn't take too long.

Scently · Mar 24, 2020

RogerL said:
Yes, but how is that actually implemented?
Suppose you tile a texture.
Will it be the CPUs responsibility to calculate what tiles that should be loaded? (That quite complex calculations, but could be pre-calculated offline)
Or will it be first touch in a tile by GPU that starts download? (Letting the GPU work on something else until data is loaded, or even render using existing LOD)
Or will it be multi pass - GPU renders with TILE# instead of actual data, CPU fetch all needed TILEs to GPU memory, GPU renders actual texture

Microsofts is not crippled, it is actually same or better than the professional Radeon Pro SSG
" The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec "

https://www.amd.com/en/products/professional-graphics/radeon-pro-ssg

Updated: AMD Announces Radeon Pro SSG: Fiji With M.2 SSDs On-Board

www.anandtech.com

It is just that it looks like SONY made something even better!

This came in while writing - exactly like this!

And that's great for the PS5, I never stated otherwise, but that is neither here nor there with regards to what MS is trying to achieve with the XSX.

nib95 · Mar 24, 2020

Walken said:
Consoles never lead when handling the games on tomorrow, they are locked devices and will remain static once they launch. Meaning the PC will always eventually move well beyond them and are already doing things the consoles won't reach like raytracing limits. Which explains why so many games will likely continue to be designed at a 30fps lock which you seem to not have any issues with.

The best news of all of this are the new consoles adopting a SSD. This means the PC will have to eventually move forward as well which shouldn't take too long.

The PS4 and Xbox One still play every multiplatform game released. Of course PC's will eventually overtake them, infact outside of ram and SSD speeds (vs PS5), I think the RTX 2080 Ti already exceeds both next gen console GPU's, at least peak computation performance wise.

And yea, I agree about SSD's, since it means going forward the minimal spec requirement for many games (inc PC) will require an SSD. Hopefully eventually NVMe only SSD, and not just Sata.

Lukas Taves · Mar 24, 2020

nib95 said:
Microsoft did not address this by splitting theirs, because the CPU and GPU still use the same bus, so similar issues of potential contention arise. Microsoft split their ram this time around not for any efficiency savings (if anything it's less efficient as there's now split speeds and more variables, which makes things potentially a tad bit more complicated for developers), but cost saving.

On the Move Engines, both that and the Esram were solutions to problems that the PS4 did not have (eg to help the Xbox One mitigate the bandwidth drawbacks of DDR3 and to help manage things with a split pool of ram). The Move Engines essentially move data from one RAM pool to another, saving the CPU/GPU from having to waste cycles on that work, though in real world terms you still lose bandwidth from your peak (more so infact than you do with the PS4 in real world testing). The PS4 has a unified pool, hence the requirements are slightly different, similar to the One X, which is no doubt why the One X also dropped the Move Engines.

Ultimately, not only are the kind of contentions you're speaking about not unique to Sony (though your implication was the opposite) but efficiency and other aspects included to try and help minimise said contentions are not going to be exclusive to Microsoft either.

I'm going by the DF insight, what Ms told them was that with this setup the gpu can access the faster pool (that they call GPU optimized memory) at full rate, while for the slower pool (that they call it standard memory) the gpu sees a reduction when the cpu is using.

They also said that the goal of the setup was to increase the effective bandwidth without increasing costs, because to have the same effective bandwidth with a unified pool they would have to either use a wider bus (as in more chips) or faster chips and both would be cost prohibitive.

Both the cpu and gpu can access both pools, but from what they are saying I think GPU access is priorized in the GPU optimized memory (meaning the cpu can only access unused bandwidth) while for the standard memory it's the other way around, the cpu has priority and the gpu can use only what's left.

Given everything that is being said I don't see how this doesn't address the contention issue, it seems to be precisely designed for that, and as they said, it increased the effective bandwidth for the gpu

Iron Eddie · Mar 24, 2020

nib95 said:
The PS4 and Xbox One still play every multiplatform game released. Of course PC's will eventually overtake them, infact outside of ram and SSD speeds (vs PS5), I think the RTX 2080 Ti already exceeds both next gen console GPU's, at least peak computation performance wise.

And yea, I agree about SSD's, since it means going forward the minimal spec requirement for many games (inc PC) will require an SSD. Hopefully eventually NVMe only SSD, and not just Sata.

If Sony releases their games with an option for 60fps I will get one before I decide if I want a Series X. I have 2 M.2 drives on my PC and they install and load games very quickly, you will be happy.

Deleted member 20297 · Mar 24, 2020

nib95 said:
I'm talking about the figures developers themselves have quoted, eg Just Add Water, Guerilla Games etc. That slide is from a leaked early presentation, so doesn't have the same real world or present day basis.

I mean, it would be great for the XSX because of the higher bandwidth for most of the RAM, no doubt, but 172 is never mentioned throughout the SDK, at all, and it is a given fact that every access through the northbridge to the dram controller has an impact on the bandwidth.
Go and find the leaked SDK yourself and look it up.

Dictator · Mar 24, 2020

nib95 said:
Right, basically a similar situation to the start of last gen, when even the highest end single GPU's typically only had 2-4 GB, whilst the consoles had 8GB. Eg the the GTX 7080 Ti (3GB) and R9 290X (4GB). At least the RTX 2080 Ti actually comes closer with 11GB. Comparable GPU's to the PS4/XO at the time (HD 7870) had only 2GB, so the RTX 2080 Super with 8GB isn't bad going.

8 GB of which 5,5 was originally usable, and in the end, as per guerilla games, saw like 3ish used for graphics tasks. Important context. The ram is shared.

nib95 · Mar 24, 2020

Lukas Taves said:
I'm going by the DF insight, what Ms told them was that with this setup the gpu can access the faster pool (that they call GPU optimized memory) at full rate, while for the slower pool (that they call it standard memory) the gpu sees a reduction when the cpu is using.

They also said that the goal of the setup was to increase the effective bandwidth without increasing costs, because to have the same effective bandwidth with a unified pool they would have to either use a wider bus (as in more chips) or faster chips and both would be cost prohibitive.

Both the cpu and gpu can access both pools, but from what they are saying I think GPU access is priorized in the GPU optimized memory (meaning the cpu can only access unused bandwidth) while for the standard memory it's the other way around, the cpu has priority and the gpu can use only what's left.

Given everything that is being said I don't see how this doesn't address the contention issue, it seems to be precisely designed for that, and as they said, it increased the effective bandwidth for the gpu

But it doesn't address it because they still share the same bus (that's why the contention exists in the first place). All it does is potentially minimise it, but then springs up other potential complications instead, or puts more pressure or incentive for XSX devs to stay within the 10GB range for GPU stuff, instead of the added flexibility of not needing to worry due to unified speeds. Not that the 6GB is actually that slow mind you.

Ultimately, Microsoft's set up is a cost cutting measure, I think most recognise this. Microsoft could just as easily have an entire 16GB of 560 GB/s but then allow split address for a similar thing but with more flexibility. But obviously that'd cost them more money manufacturing wise.

Dictator said:
8 GB of which 5,5 was originally usable, and in the end, as per guerilla games, saw like 3ish used for graphics tasks. Important context. The ram is shared.

Of course, wouldn't suggest otherwise. That said, I believe the OS footprint size continually came down over the course of the gen, and along with it the available memory. That's pesumably why you stated "originally".

Lukas Taves · Mar 24, 2020

nib95 said:
But it doesn't address it because they still share the same bus (that's why the contention exists in the first place). All it does is potentially minimise it, but then springs up other potential complications instead, or puts more pressure or incentive for XSX devs to stay within the 10GB range for GPU stuff, instead of the added flexibility of not needing to worry due to unified speeds. Not that the 6GB is slow mind you.

Ultimately, Microsoft's set up is a cost cutting measure, I think most recognise this. Microsoft could just as easily have an entire 16GB of 560 GB/s but then split the address to allow a similar thing but with more flexibility.

It's a cost saving measure to provide the needed bandwidth yeah, no one is denying that.

But they achieve the target bandwidth by splitting usage and ensuring the gpu gets the desired effective bandwidth. I really don't see the divise you are creating.

From what I understand all the chips have a 32bit bus (320bit in total) and are all 14gbps chips, the pool divide and overall bandwidth comes from logically splitting the pools, so they are not cheaping out on memory either, they just tuned to be able to achieve their target without needing to go even faster, because to offer that bandwidth across the entire pool it would have costed more.

Deleted member 20297 · Mar 24, 2020

nib95 said:
Of course, wouldn't suggest otherwise. That said, I believe the OS footprint size continually came down over the course of the gen, and along with it the available memory. That's pesumably why you stated "originally".

Please provide us when and how OS footprint was reduced "over the course of the gen".

Deleted member 61469 · Mar 24, 2020

c0de said:
Please provide us when and how OS footprint was reduced "over the course of the gen".

He can't because they never did. The Pro had more RAM because of this reason. I really want to know how you would reduce the RAM requirements for social features and other apps because it sounds really interesting.

Miniature Kaiju · Mar 24, 2020

Lukas Taves said:
they just tuned to be able to achieve their target without needing to go even faster, because to offer that bandwidth across the entire pool it would have costed more.

They wouldn't have had to go faster, they'd have to go denser with all of the modules. They could do the same speed chips, same bus, but using 10 x 2GB modules for homogenous bandwidth - but that would cost more. It's a cost cutting measure.

Euler007 · Mar 24, 2020

Dictator said:
8 GB of which 5,5 was originally usable, and in the end, as per guerilla games, saw like 3ish used for graphics tasks. Important context. The ram is shared.

And most gamers had at least 8GB of system ram for the game logic,on top of the GPU RAM (meshes and textures).

Deleted member 20297 · Mar 24, 2020

Coda said:
He can't because they never did. The Pro had more RAM because of this reason. I really want to know how you would reduce the RAM requirements for social features and other apps because it sounds really interesting.

I still think Sony was rather lucky with the 8GB GDDR5, it would have been a more interesting gen with them having to deal with 4 ;)
In any case, I was surprised that both ran with 3 GB reserved for the OS. Especially when 360 had a footprint of 32 MB (!).

Lukas Taves · Mar 24, 2020

Miniature Kaiju said:
They wouldn't have had to go faster, they'd have to go denser with all of the modules. They could do the same speed chips, same bus, but using 10 x 2GB modules for homogenous bandwidth - but that would cost more. It's a cost cutting measure.

Exactly what I meant.
They would either need to go faster or denser, both would have costed more and would be prohibitive. But by logically splitting the pool they achieve the target bandwidth on a significant portion of the memory.

Like I said, I don't see how that is a division. Yes, it was a cost cutting measure, but also yes, it allowed them to provide a fast pool of 10GB and when accessing this pool the gpu sees no or less contention from the cpu. One does not conflict with the other. And honestly given that the discussion was about elegance and efficiency, how is that not an elegant solution for the problem? Specially that the practical drawback is that they only have about 3.5 GB of ram that is slower than ps5, and as they said, it's not really a compromise because the slow data portion of a game can easily go over that, so it's more likely that non bandwidth intensive tasks will need to access the faster pool than a bandwidth intensive task needing to access the slower one.

Miniature Kaiju · Mar 24, 2020

Lukas Taves said:
when accessing this pool the gpu sees no or less contention from the cpu. One does not conflict with the other.

People far smarter than me having pointed out that this isn't actually true. They're on the same bus, just different addressing ranges, so they have the same contention issues - maybe compounded by the lower effective bandwidth for part of it.

nib95 · Mar 24, 2020

c0de said:
Please provide us when and how OS footprint was reduced "over the course of the gen".

I don't have links, but I remember sometime into the gen there were threads that confirmed the amount of ram required for the OS was reduced and over time a bit more of the ram was available to developers for games.

Axel Stone · Mar 24, 2020

I seem to recall that Microsoft freed up some RAM by dropping Kinect support.

Lukas Taves · Mar 24, 2020

Miniature Kaiju said:
People far smarter than me having pointed out that this isn't actually true. They're on the same bus, just different addressing ranges, so they have the same contention issues - maybe compounded by the lower effective bandwidth for part of it.

Perhaps I misunderstood Ms claim then, gonna watch/read it all again because I was sure they said that in their setup the gpu sees the full bandwidth when reading from the higher optimized ram.

Deleted member 20297 · Mar 24, 2020

nib95 said:
I don't have links, but I remember sometime into the gen there were threads that confirmed the amount of ram required for the OS was reduced and over time a bit more of the ram was available to developers for games.

Sorry but if you claim something like that you better back it up. I am not saying you are wrong but unless proven this is just hearsay.
Also, on which console did that happen?

Miniature Kaiju · Mar 24, 2020

Lukas Taves said:
Perhaps I misunderstood Ms claim then, gonna watch/read it all again because I was sure they said that in their setup the gpu sees the full bandwidth when reading from the higher optimized ram.

It will do that, but if the CPU is requesting data from the slow pool, it will still occupy the full bus to retrieve data at a slower speed. Imagine GPU and CPU are "taking turns" accessing data, only the CPU is capped at a lower top speed - that will lower the overall effective bandwidth. By how much will depend on loads, how much data is being retrieved, how frequently, etc.

melodiousmowl · Mar 24, 2020

Lukas Taves said:
I'm going by the DF insight, what Ms told them was that with this setup the gpu can access the faster pool (that they call GPU optimized memory) at full rate, while for the slower pool (that they call it standard memory) the gpu sees a reduction when the cpu is using.

They also said that the goal of the setup was to increase the effective bandwidth without increasing costs, because to have the same effective bandwidth with a unified pool they would have to either use a wider bus (as in more chips) or faster chips and both would be cost prohibitive.

Both the cpu and gpu can access both pools, but from what they are saying I think GPU access is priorized in the GPU optimized memory (meaning the cpu can only access unused bandwidth) while for the standard memory it's the other way around, the cpu has priority and the gpu can use only what's left.

Given everything that is being said I don't see how this doesn't address the contention issue, it seems to be precisely designed for that, and as they said, it increased the effective bandwidth for the gpu

On top of that, since there is actually a physical(as in the slower pool is actually physically different rather than just part of a larger pool) the memory controller could and probably is tailored for not have contention for the CPU/GPU using the ram.

Actually I may take that back - I wonder what the bandwidth is when CPU and GPU are both in use - can they access the pools concurrently or is the base bandwidth shared. (this is a problem I guess for ps5 as well?)

revben · Mar 24, 2020

Miniature Kaiju said:
It will do that, but if the CPU is requesting data from the slow pool, it will still occupy the full bus to retrieve data at a slower speed. Imagine GPU and CPU are "taking turns" accessing data, only the CPU is capped at a lower top speed - that will lower the overall effective bandwidth. By how much will depend on loads, how much data is being retrieved, how frequently, etc.

Wouldn't slower Ram take up less of the "lanes", so 320 bit bus would make contention less?

Miniature Kaiju · Mar 24, 2020

One thing I want to be clear is that I'm not saying the Series X memory will be horrible or anything like that - it will be amazing, it's a definite advantage the Series X has over the PS5 and it will likely always perform better. I fully expect the effective top bandwidth to be over 500GB/s most of the time.

revben said:
Wouldn't slower Ram take up less of the "lanes", so 320 bit bus would make contention less?

It kinda does, but the free lanes have to "sit idle" while the slow memory is being addressed due to how data is parallelized.

M3rcy · Mar 24, 2020

Miniature Kaiju said:
It kinda does, but the free lanes have to "sit idle" while the slow memory is being addressed due to how data is parallelized.

Learning a lot on this subject. Thanks all for the insight!

NXGamer · Mar 24, 2020

Liabe Brave said:
Well, the NXGamer video which I first saw mention it was favorably retweeted by a senior software engineer for the PS5. That doesn't mean he was endorsing every single sentence in the video, though. So until it's officially announced I'd consider it unconfirmed. There's been plenty of stuff that makes technical sense which companies still didn't implement.

Absolutely, ALL I mentioned was a possibility here, ideas, options, nothing more and I even state in the video this is just an example and not fact or confirmed by Sony. Not aimed at you Liabe but just so people stop quoting this as fact, it is a possibility but the design of the system and reservations may mean the full Bandwidth of the SSD is best used for application use and 4K/60 capture within the OS and not for this, I will just be very surprised of they do not use it to improve/supplement the rest of the system design, just as I think MS will also use as much as possible.

anexanhume · Mar 24, 2020

Miniature Kaiju said:
It kinda does, but the free lanes have to "sit idle" while the slow memory is being addressed due to how data is parallelized.

Exactly. Otherwise there wouldn't be a lower speed rating.

Overall, I'm pretty happy with MS's compromise. The "real" bandwidth will definitely be somewhere in between the two, but it must be appreciably higher than a 256-bit bus can achieve, otherwise they'd just do that.

retrosega · Mar 24, 2020

Now that we've seen all the specs, I'm pretty sure I'll be buying an XSX day one.
I'll get a PS5 sometime in 2021.

ShapeGSX · Mar 24, 2020

Lukas Taves said:
It's a cost saving measure to provide the needed bandwidth yeah, no one is denying that.

But they achieve the target bandwidth by splitting usage and ensuring the gpu gets the desired effective bandwidth. I really don't see the divise you are creating.

From what I understand all the chips have a 32bit bus (320bit in total) and are all 14gbps chips, the pool divide and overall bandwidth comes from logically splitting the pools, so they are not cheaping out on memory either, they just tuned to be able to achieve their target without needing to go even faster, because to offer that bandwidth across the entire pool it would have costed more.

GDDR6 chips communicate serially. They have two independent serial communication channels and they can be R/R, R/W, or W/W.

https://www.micron.com/-/media/client/global/documents/products/technical-note/dram/tned03_gddr6.pdf

M3rcy · Mar 24, 2020

So, let's see if I've got this straight.

In order to maximize bandwidth data is striped across the physical memory chips so that you are accessing all of the available chips when reading/writing data from them. When you read/write to an address the system accesses a block of data at a time and within that block is the data you need for the current operation along with surrounding data which, optimally, is data you will also be needing soon.

I intentionally left out cache for simplicity, but is that close to correct?

Timlot · Mar 24, 2020

ShapeGSX said:
GDDR6 chips communicate serially. They have two independent serial communication channels and they can be R/R, R/W, or W/W.

https://www.micron.com/-/media/client/global/documents/products/technical-note/dram/tned03_gddr6.pdf

Makes sense. So much misinformation trying to be spread.

melodiousmowl · Mar 24, 2020

ShapeGSX said:
GDDR6 chips communicate serially. They have two independent serial communication channels and they can be R/R, R/W, or W/W.

https://www.micron.com/-/media/client/global/documents/products/technical-note/dram/tned03_gddr6.pdf

do to mean the channels from the memory array to the p-s converter?

or are you talking about gddr6 in this 2 channel setup?

bsigg · Mar 24, 2020

Miniature Kaiju said:
It will do that, but if the CPU is requesting data from the slow pool, it will still occupy the full bus to retrieve data at a slower speed. Imagine GPU and CPU are "taking turns" accessing data, only the CPU is capped at a lower top speed - that will lower the overall effective bandwidth. By how much will depend on loads, how much data is being retrieved, how frequently, etc.

Here's the quote from MS:

"Memory performance is asymmetrical - it's not something we could have done with the PC," explains Andrew Goossen "10 gigabytes of physical memory [runs at] 560GB/s. We call this GPU optimal memory. Six gigabytes [runs at] 336GB/s. We call this standard memory. GPU optimal and standard offer identical performance for CPU audio and file IO. The only hardware component that sees a difference in the GPU."

"When we talked to the system team there were a lot of issues around the complexity of signal integrity and what-not," explains Goossen. "As you know, with the Xbox One X, we went with the 384[-bit interface] but at these incredible speeds - 14gbps with the GDDR6 - we've pushed as hard as we could and we felt that 320 was a good compromise in terms of achieving as high performance as we could while at the same time building the system that would actually work and we could actually ship."

GUNDAM STARDUST · Mar 24, 2020

nib95 said:
Right, basically a similar situation to the start of last gen, when even the highest end single GPU's typically only had 2-4 GB, whilst the consoles had 8GB. Eg the the GTX 7080 Ti (3GB) and R9 290X (4GB). At least the RTX 2080 Ti actually comes closer with 11GB. Comparable GPU's to the PS4/XO at the time (HD 7870) had only 2GB, so the RTX 2080 Super with 8GB isn't bad going.

On that point though, these are next gen systems that are to last 7+ years, of course they're not going to be comparable in ram terms to what exists today, because they're designed to handle the games of tomorrow.

I'm talking about the figures developers themselves have quoted, eg Just Add Water, Guerilla Games etc. That slide is from a leaked early presentation, so doesn't have the same real world or present day basis. Also, without seeing the entire presentation, that slide may not be in proper context.

I'm hopeful that these new modern compression/GPU feeding methods work as advertised and effectively halve the RAM requirements of the previous generation.

ShapeGSX · Mar 24, 2020

melodiousmowl said:
do to mean the channels from the memory array to the p-s converter?

or are you talking about gddr6 in this 2 channel setup?

Basically, there isn't a 256 bit bus per channel. Bumps on a chip are too precious, and need to be mostly used for power delivery. So data bumps cost a lot (not dollars, but tradeoffs).

So they take a 256 bit packet and serialize it over a 16 bit wide connection to the APU that runs at a higher frequency (16 times faster) than the actual memory array read speed (the serialized data speed is not mentioned on the data sheet, but you could calculate it out). But there are two of these (32 bits total). These 32 wires are the data wires between the chip and the APU on the motherboard.

Check out the "simulated input eye data" later on in the PDF to see what the data bits actually look like on the motherboard traces. It's pretty wild.

M3rcy · Mar 24, 2020

Much like the TFlop #'s the bandwidth numbers are measurements of capacity to do work. Unless the memory controller is operating at full capacity for every single cycle during a full second you aren't actually going to get that #. So, the thing about the split memory architecture is that whether the bus is 320 bit only sometimes isn't really relevant, IMO, as long as it's 320-bit when the system needs it to be.

DrKeo · Mar 24, 2020

I don't think the average bandwidth will be much lower than 560GB/s. MS could have taken the easy route and used lower binned GDDR6 chips, for instance 13Gbps and use the money they save for adding the missing 4GB and having a unified 20GB setup. For instance, underclocking a 13Gbps chip to 1600Mhz will result in x10 2GB chips running at 12.8Gbps which is both a big pool of 20GB and runs in a unified speed of 512GB/s. So assuming MS could have used lower grade chips and got well over 500GB/s unified memory AND a big 20GB pool means that MS did their due diligence and concluded that the mixed pool won't hurt bandwidth much.

edit:
When I've said "average bandwidth", I meant VRAM. Anything outside the realm of VRAM can do just fine with 70GB/s, not to mention 336GB/s.

Lukas Taves · Mar 24, 2020

Miniature Kaiju said:
It will do that, but if the CPU is requesting data from the slow pool, it will still occupy the full bus to retrieve data at a slower speed. Imagine GPU and CPU are "taking turns" accessing data, only the CPU is capped at a lower top speed - that will lower the overall effective bandwidth. By how much will depend on loads, how much data is being retrieved, how frequently, etc.

I see your point.

For the cpu that's not much of a problem I think even the slower pool should be a few times over of what the cpu was designed to use. My doubt is if the gpu can access the full bus on the 10GB even if the cpu is accessing the slower pool. I got the impression that this is what they claimed.

(but I couldn't read anything yet, still working XD)

M3rcy · Mar 24, 2020

DrKeo said:
edit:
When I've said "average bandwidth", I meant VRAM. Anything outside the realm of VRAM can do just fine with 70GB/s, not to mention 336GB/s.

When it's the only consumer. When you have shared memory, there's something to be said for having more capacity available per cycle to have your data demands satisfied sooner and get out of the way of the other consumers.

Lukas Taves · Mar 24, 2020

bsigg said:
Here's the quote from MS:

Thanks for posting this. I think I read it wrong the first time.

It's not that the gpu can access the faster pool without contention from the cpu it's that the cpu bandwidth is unchanged regardless of the pool it's accessing and that the slower pool bandwidth impacts only the gpu.

Sorry for that guys :)

Black Mantis · Mar 24, 2020

Axel Stone said:
I seem to recall that Microsoft freed up some RAM by dropping Kinect support.

It was the GPU that got the advantage.

How the Xbox One GPU Kinect reserve unlock works

Microsoft has moved to clarify the impact of its decision to unlock the power of the Xbox One's graphics processing uni…

www.eurogamer.net

Xbox Series X: A Closer Look at the Technology Powering the Next Generation

Contains No Misinformation on Philly Cheesesteaks

Contains No Misinformation on Philly Cheesesteaks

User requested account closure

Contains No Misinformation on Philly Cheesesteaks

Contains No Misinformation on Philly Cheesesteaks

User requested account closure

Digital Foundry

Contains No Misinformation on Philly Cheesesteaks

User requested account closure

Attempted to circumvent ban with alt account

User requested account closure

Contains No Misinformation on Philly Cheesesteaks

User requested account closure