PlayStation 5 System Architecture Deep Dive |OT| Secret Agent Cerny

Wollan · Apr 2, 2020

chris 1515 said:
Why do you want them to modify a CU inside the GPU to make the Tempest Engine less efficient because it will share memory access with the other CUs and the memory model is very different of the other CUs and GPU aren't memory latency sensitive, audio is memory latency sensitive.

Ok I was just curious if there was a secondary bus somewhere on the APU so that bandwidth wouldn't be needed to be shared with other CU's (hence the L1 cache is not needed) so that they could potentially use one of the idle CU's as the Tempest CU. The PS4 APU has a secondary 20GBps bus so that L2 and L1 cache can be bypassed (which likely the PS5 APU does as well due to BC concerns).

Pantato · Apr 2, 2020

NaturaNonFacisSaltus said:
I would advise against oversimplifying that much, bandwidth and latency in DRAM are nowhere near.

I specifically mentionned from a GPU point of view, what good would do the fastest DRAM in the world if the GPU can only communicate with it at PCIE speed and latency? Of course from a CPU point of view, it would be suicidal to process directly into the SSD...

Deleted member 4274 · Apr 2, 2020

Transistor said:
Did you play Titanfall 2? Genuine question.

Yes. I think I know where you're going here. Maybe? Lol

Transistor · Apr 2, 2020

tatsu123 said:
Yes. I think I know where you're going here. Maybe? Lol

Effect and Cause, the level where you were constantly shifting through time? Imagine something like that but on a whole world scale because the data can come in and out of the SSD so fast.

In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.

dgrdsv · Apr 2, 2020

chris 1515 said:
Yes it is an SIMD unit but the memory model works differently no cache and DMA call with a scratchpad memory. This is what is an SPU and it works perfectly for hardware.

Most modern GPU SMs can be configured to use their on-chip memory as "scratchpad memory" (called LDS these days) or a s/w trasnsparent cache.
Again, "SPU" is just a name Sony gave to the SIMD units Cell had. It doesn't really have any unique h/w capabilities which are absent from modern streaming processors i.e. GPU SMs/CUs.

chris 1515 · Apr 2, 2020

Wollan said:
Ok I was just curious if there was a secondary bus somewhere on the APU so that bandwidth wouldn't be needed to be shared with other CU's (hence the L1 cache is not needed) so that they could potentially use one of the idle CU's as the Tempest CU. The PS4 APU has a secondary 20GBps bus so that L2 and L1 cache can be bypassed (which likely the PS5 APU does as well due to BC concerns).

Yes PS5 has it I am sure they have the next version the bus Onion. In more advanced APU AMD replace this bus by an unique bus able to share data.

What I wanted to explain Sony decide to take some space in the die for 3D Audio and I/O complex. The I/O is not in the SOC on Xbox Series X. It means maybe they could have 0.x more flops on the GPU but they decided to sacrifice a little bit of GPU power for this.

Maybe it was a bad decision or maybe it is a good one, only time will tell...

Vimto · Apr 2, 2020

With headphones I'm able to hear enemy footsteps & locate them since PS3 days, so I don't know why Cerny brought it up like its a new feature lol

amstradcpc · Apr 2, 2020

Vimto said:
With headphones I'm able to hear enemy footsteps & locate them since PS3 days, so I don't know why Cerny brought it up like its a new feature lol

Are you talking about dolby headphones?.

Black_Stride · Apr 2, 2020

Transistor said:
Effect and Cause, the level where you were constantly shifting through time? Imagine something like that but on a whole world scale because the data can come in and out of the SSD so fast.

In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.

Just thinking about being able to load in entire new levels boreline instantaneously is blowing my mind.

Xmen Nightcrawler game in coming?
Better yet give me a game where I play Shimazaki, v this is best example of what I imagine teleportation seems like to the user.

Hey Please · Apr 2, 2020

For those in the know- How does sound processing work on PS4 and how much does it take away from (what is it) 7-core Jag (iirc 1 core is always reserved for OS)?

Vimto · Apr 2, 2020

amstradcpc said:
Are you talking about dolby headphones?.

I dont know, I had Astro headset (A40) since 2010!

And I was able to 100% locate the enemy, even if he is one floor above me

chris 1515 · Apr 2, 2020

dgrdsv said:
Most modern GPU SMs can be configured to use their on-chip memory as "scratchpad memory" (called LDS these days) or a s/w trasnsparent cache.
Again, "SPU" is just a name Sony gave to the SIMD units Cell had. It doesn't really have any unique h/w capabilities which are absent from modern streaming processors i.e. GPU SMs/CUs.

The LDS size is tinier than the local memory on a CELL SPU because you have some cache. Here there is no cache at all and they can replace it with more SRAM for the memory scratchpad.

It they did this there is a reason they probably think or find this is more efficient...

amstradcpc · Apr 2, 2020

Vimto said:
I dont know, I had Astro headset (A40) since 2010!

And I was able to 100% locate the enemy, even if he is one floor above me

Well, i have a pair of dolby headphones with a dolby decoder and are great, but what Cerny is talking about is achieving a full 360 sound origin with cheap airbuds.

DavidDesu · Apr 2, 2020

Pantato said:
To better understand how much of a revolution the PS5 SSD is, let's look at how things work on a PC.

The graphic card is connected to the main system memory with a PCIE bus, most of us are still using PCIE 3.0 16x, which has a theoretical bandwidth of 16GB/s.
Game data would need to be transfered from the mass storage (HDD or SSD) to the main RAM, and then to the VRAM via the PCIE bus at a real world speed of about 13GB/s.

That's pretty close from the typical 9GB/s compressed from the PS5 SSD into its unified RAM.

So, in PC terms, from the PS5 GPU point of view, it's like it has access to a gigantic 825GB of system RAM. Of course it's a bit of an oversimplification, but you get the idea.

Now, imagine the games that could be done with this amount of RAM!

Yeah it's certainly getting closer to RAM.

NaturaNonFacisSaltus said:
I would advise against oversimplifying that much, bandwidth and latency in DRAM are nowhere near.

While sure I agree, we're definitely seeing the gap close and the generation after next might it not get so so much closer? Right now I'm visualising mass storage being this tank sectioned off from RAM with relatively narrow pipes feeding the data across, but in future I'm visualising more of a really wide pipe, very wide as it meets the RAM and only slightly narrowing as it reaches mass storage with slower speed access there, but everything not all that far away from the GPU and CPU ultimately.

I wonder is there scope for storage interacting directly with current processes, not involving RAM at all? Let's say you launch a missile in game and it lands several miles away. You can't see it but the game has saved to storage the fact you have done that and places a crater at the site forever in memory which if you visit the area will be visible.

dgrdsv · Apr 2, 2020

chris 1515 said:
The LDS size is tinier than the local memory o an a CELL SPU because you have some cache.

+16KB register file

main-qimg-1a0e0df6c8a9bc250c78019bee640853

So while technically the LDS on SPU is bigger than even on GV100's SM, it's actually smaller if you account for the register file size difference.

TitanicFall · Apr 2, 2020

Vimto said:
I dont know, I had Astro headset (A40) since 2010!

And I was able to 100% locate the enemy, even if he is one floor above me

Third party solution vs built-in solution.

chris 1515 · Apr 2, 2020

dgrdsv said:
+16KB register file

So while technically the LDS on SPU is bigger than even on GV100's SM, it's actually smaller if you account for the register file size difference.

I speak about the possibility to replace the cache by more SRAM ;) The CELL is a 2005 CPU, I am sure there is some improvement on this hybrid CU/SPU.

Binabik15 · Apr 2, 2020

Thanks for all the examples! I don't want to throw up a huge quote wall that makes scrolling the thread less pleasing, but I'll listen to everything 😊

chris 1515 · Apr 2, 2020

Binabik15 said:
Thanks for all the examples! I don't want to throw up a huge quote wall that makes scrolling the thread less pleasing, but I'll listen to everything 😊

this one too

Amazing 3D sound tiktok video. Use headphones

#3Dsounds

www.youtube.com

Sklaary · Apr 2, 2020

Digital Foundry on Twitter

“We've been working on this one for a while - a deeper look into the system architecture of PlayStation 5, with more details from Mark Cerny: https://t.co/34epXz5BXg”

twitter.com

Somebody please make a thread!

Chamon · Apr 2, 2020

Vimto said:
With headphones I'm able to hear enemy footsteps & locate them since PS3 days, so I don't know why Cerny brought it up like its a new feature lol

Consoles have had this ability for a long time, but the quality is really poor. There is a big difference between a sound you can locate in a "game world" and one you can locate in real life. I hope that next gen close that gap as much as possible.

Thera · Apr 2, 2020

Transistor said:
Effect and Cause, the level where you were constantly shifting through time? Imagine something like that but on a whole world scale because the data can come in and out of the SSD so fast.

In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.

Even recently. In Death Stranding :

Imagine that, instead of having an awful long loading time or cinematic to go to WWI part, eveything popped out in the world.

They managed to do that if you are taken down and need to fight a BT, but a whole world and level design. The impact would have been completely different (not the boring gunplay :( )

CosmicBolt · Apr 2, 2020

There's enough power that both CPU and GPU can potentially run at their limits of 3.5GHz and 2.23GHz, it isn't the case that the developer has to choose to run one of them slower."

https://www.eurogamer.net/articles/digitalfoundry-2020-playstation-5-the-mark-cerny-tech-deep-dive
no confusion

Dashful · Apr 2, 2020

CosmicBolt said:
https://www.eurogamer.net/articles/digitalfoundry-2020-playstation-5-the-mark-cerny-tech-deep-dive
no confusion

PlayStation 5 New Details From Mark Cerny: Boost Mode, Tempest Engine, Back Compat + More

It's been a couple of weeks now since Mark Cerny delivered his 'Road to PlayStation 5' presentation. Rich had a chance to talk to the man himself and to pose...

www.youtube.com

Accompanying youtube video.

AndyD · Apr 2, 2020

Thera said:
Even recently. In Death Stranding :

Imagine that, instead of having an awful long loading time or cinematic to go to WWI part, eveything popped out in the world.

They managed to do that if you are taken down and need to fight a BT, but a whole world and level design. The impact would have been completely different (not the boring gunplay :( )

Yep, Titanfall 2 did the time shifting areas as well, but it was all interior carefully controlled environments. They even did a behind the scenes tech explnation.

Pheonix · Apr 2, 2020

Can anyone give me layman's rundown of RTRT?

I know its a lot so I would share what I know, then any polish around that would be appreciated.

It's split into 3 parts, BVH which ( i think) isolates objects to project rays onto. The actual projecting of rays and each result being referred to as a sample. And then denoising which is necessary because not as many rays are cast into the scene as needed.

That's all I know...

What I don't know,

I gather that RT can be used in different ways, what's the least expensive to most expensive implementation? (barring full scene RT rendering obviously)
how may rays would be sufficient for something like global illumination?
what part of the render pipeline does RT coming and in a 16ms frame time how much of it can be going to it?
what the hell does Nvidia 10gigRays/s actually mean? How many ray does that translate to per pixel and per frame????
Is there a standard to measure GPU RT proficiency?

Thera · Apr 2, 2020

AndyD said:
Yep, Titanfall 2 did the time shifting areas as well, but it was all interior carefully controlled environments. They even did a behind the scenes tech explnation.

Interesting ! I should do it.

NaturaNonFacisSaltus · Apr 2, 2020

Spreewaldgurke said:
Digital Foundry on Twitter

“We've been working on this one for a while - a deeper look into the system architecture of PlayStation 5, with more details from Mark Cerny: https://t.co/34epXz5BXg”

twitter.com

Great article, shines a light on a lot of unknows after the previous presentation.

Belvedere · Apr 2, 2020

Just started the article and I've already seen two different "bespoke" instances.

:P

This is amazing insight into development. And it's easy to forget how active Cerny is with development projects. Good job, DF.

gundamkyoukai · Apr 2, 2020

The new SSD eg he gave really does have be wondering what sort of data they will be able to pull on the fly instead of keeping in the ram .

bcatwilly · Apr 2, 2020

gundamkyoukai said:
The new SSD eg he gave really does have be wondering what sort of data they will be able to pull on the fly instead of keeping in the ram .

That type of talk is definitely exciting for the SSD potential in next generation, but honestly at least in this talk with DF he really didn't cite anything yet regarding SSD use that shouldn't clearly be possible on the Series X SSD based on what they have shared and their talk about use of it as "virtual RAM" with 100GB of game assets at the ready and such. I am not saying that they may not come up with that example/demo at some point, just don't see it here yet.

anexanhume · Apr 2, 2020

A couple choice quotes here:

So, when I made the statement that the GPU will spend most of its time at or near its top frequency, that is with 'race to idle' taken out of the equation - we were looking at PlayStation 5 games in situations where the whole frame was being used productively. The same is true for the CPU, based on examination of situations where it has high utilisation throughout the frame, we have concluded that the CPU will spend most of its time at its peak frequency."

This is important to show they're not gaming the metric.

Cerny also stresses that power consumption and clock speeds don't have a linear relationship. Dropping frequency by 10 per cent reduces power consumption by around 27 per cent.

This is a cubic relationship. 0.9^3 = 72.9%. That means drastic reductions in power. Downclocks should indeed be minor.

It's an innovative approach, and while the engineering effort that went into it is likely significant, Mark Cerny sums it up succinctly: "One of our breakthroughs was finding a set of frequencies where the hotspot - meaning the thermal density of the CPU and the GPU - is the same. And that's what we've done. They're equivalently easy to cool or difficult to cool - whatever you want to call it."

This is also an important point. Some Intel desktop CPUs run faster than their integrated graphics-less counterparts because those idle graphics cores actually add as a thermal spreader. If you had uneven thermal density, they'd be contributing to each other's hot spots.

There's likely more to discover about how boost will influence game design. Several developers speaking to Digital Foundry have stated that their current PS5 work sees them throttling back the CPU in order to ensure a sustained 2.23GHz clock on the graphics core. It makes perfect sense as most game engines right now are architected with the low performance Jaguar in mind - even a doubling of throughput (ie 60fps vs 30fps) would hardly tax PS5's Zen 2 cores. However, this doesn't sound like a boost solution, but rather performance profiles similar to what we've seen on Nintendo Switch. "Regarding locked profiles, we support those on our dev kits, it can be helpful not to have variable clocks when optimising. Released PS5 games always get boosted frequencies so that they can take advantage of the additional power," explains Cerny.

Yes, developers already see dropped clocks, but it's intentional. Some PC benchmarking sites tear their hair out trying to interpret benchmark results because of unpredictable boost behavior. This eliminates that.

"All of the game logic created for Jaguar CPUs works properly on Zen 2 CPUs, but the timing of execution of instructions can be substantially different," Mark Cerny tells us. "We worked to AMD to customise our particular Zen 2 cores; they have modes in which they can more closely approximate Jaguar timing. We're keeping that in our back pocket, so to speak, as we proceed with the backwards compatibility work."

This is the subject of several of Cerny's patents.

GPUs process hundreds or even thousands of wavefronts; the Tempest engine supports two," explains Mark Cerny. "One wavefront is for the 3D audio and other system functionality, and one is for the game. Bandwidth-wise, the Tempest engine can use over 20GB/s, but we have to be a little careful because we don't want the audio to take a notch out of the graphics processing. If the audio processing uses too much bandwidth, that can have a deleterious effect if the graphics processing happens to want to saturate the system bandwidth at the same time."

This is very important. It gives us an upper bound for how much memory bandwidth we would want on a per CU basis. Since the clock is the same as the GPU, PS5's number is 720GB/s.

As a result, with the GPU if you're getting 40 per cent VALU utilisation, you're doing pretty damn well. By contrast, with the Tempest engine and its asynchronous DMA model, the target is to achieve 100 percent VALU utilisation in key pieces of code."

This shows just how little of a system's teraflops can be realistically used. 40% VALU usage!

gundamkyoukai · Apr 2, 2020

bcatwilly said:
That type of talk is definitely exciting for the SSD potential in next generation, but honestly at least in this talk with DF he really didn't cite anything yet regarding SSD use that shouldn't clearly be possible on the Series X SSD based on what they have shared and their talk about use of it as "virtual RAM" with 100GB of game assets at the ready and such. I am not saying that they may not come up with that example/demo at some point, just don't see it here yet.

Well all of this new to devs so we have to wait and see what happens.

chris 1515 · Apr 2, 2020

The memory bandwidth looks like more and more a big compromise: 20 GB/s of memory by Tempest engine, CPU probably 40/45 GB/s, SSD 9 GB/s.

Deleted member 4274 · Apr 2, 2020

Transistor said:
Effect and Cause, the level where you were constantly shifting through time? Imagine something like that but on a whole world scale because the data can come in and out of the SSD so fast.

In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.

Thanks so much! As soon as you mentioned Titanfall 2, I understood. After that level I was in awe. Still never finished the game,
But I finished that level! A new soul reaver game on Ps5 would be bad ass!

Edit: this thread is great (sans the weird arguments). I've learned a ton about hardware in general on ERA. I really appreciate you all.

CypherSignal · Apr 2, 2020

Transistor said:
In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.

Or, let's say you wanted to faithfully make a Star Wars-style story in a game, which regularly and radically changes out what plot thread, characters, setting, you're focused on every 5-10 minutes and the only interruption is a wipe and two-second-long exposition shot of a planet.

Especially in contrast to, say, a Raiders of the Lost Ark style game where you're basically following Indy for hours on end and almost never deviate from his perspective.

Z-Brownie · Apr 2, 2020

Black_Stride said:
Just thinking about being able to load in entire new levels boreline instantaneously is blowing my mind.

Xmen Nightcrawler game in coming?
Better yet give me a game where I play Shimazaki, v this is best example of what I imagine teleportation seems like to the user.

i guess your spectations are unrealistic, this could work with current gen games, but probably not with next gen, it reminds me as a kid thinking that "mortal kombat graphics was like real life", not trying to compare you to a kid at all, but think that games loadtimes will be nonexistent is a bit surreal imo

natestellar · Apr 2, 2020

chris 1515 said:
The memory bandwidth looks like more and more a big compromise: 20 GB/s of memory by Tempest engine, CPU probably 40/45 GB/s, SSD 9 GB/s.

Indeed, I wonder how costly those 16/18Gbps chips actually are. Because Github benchmarks in regards to memory bandwidth were promising.

Also, in the video, Richard confirms that Sony aren't using VRS. I wonder why? It's baked into the RDNA2 architecture and offers 10-15% extra performance, rather strange they didn't go with it.

gundamkyoukai · Apr 2, 2020

chris 1515 said:
The memory bandwidth looks like more and more a big compromise: 20 GB/s of memory by Tempest engine, CPU probably 40/45 GB/s, SSD 9 GB/s.

Well i doubt that devs will let it get so high since sound is be low for a lot of games but yeah wish the go with better ram chips for more bandwidth .

natestellar said:
Indeed, I wonder how costly those 16/18Gbps chips actually are. Because Github benchmarks in regards to memory bandwidth were promising.

Also, in the video, Richard confirms that Sony aren't using VRS. I wonder why? It's baked into the RDNA2 architecture and offers 10-15% extra performance, rather strange they didn't go with it.

He did not confirm that , he just don't know if they using it or not.
This just cover what we find out about in GDC and had no new info per say .

Black_Stride · Apr 2, 2020

GusZamboni said:
i guess your spectations are unrealistic, this could work with current gen games, but probably not with next gen, it reminds me as a kid thinking that "mortal kombat graphics was like real life", not trying to compare you to a kid at all, but think that games loadtimes will be nonexistent is a bit surreal imo

Fundamental problem with text discussions....its hard to give inflections.
I was speaking in hyperbole.

Physics are still a thing actual instantaneous loading of an entire game couldnt actually be possible with anything approaching AAA standards.
But clever game design with the speeds and power we have available could easily let a game like GTA have you move across the map from one character to the next very very quickly.

Its not unrealistic to think with speeds we have you could have a portal-esc game where you leave one portal in stage 1 and the other in stage 4 and still be able to move back all the way to stage 1 practically instantaneously.
Keep an instance of stage one in the SSD cache and when the player jumps through the portal, put it back in RAM and put stage 4 in the cache.

And thats just a simple example, so while eliminating load times entirely is not what im saying, reducing them and also allowing for stage to stage fast travel have a reaction of "i see no perceptible loading" is very possible.

natestellar · Apr 2, 2020

gundamkyoukai said:
Well i doubt that devs will let it get so high since sound is be low for a lot of games but yeah wish the go with better ram chips for more bandwidth .

He did not confirm that , he just don't know if they using it or not.
This just cover what we find out about in GDC and had no new info per say .

Ah ok, he worded that very strangely. I thought he was implying Sony aren't using that feature. Still, would be nice if we got some confirmation on it. It's an important feature and something tailor made for VR titles.

Chris Metal · Apr 2, 2020

Chamon said:
...

8D audio is just someone slowly shifting panning between L+R, it's a cheap gimmick done with post processing on released songs I saw another say 16D haha, can sound wierd with a wavey effect and annoys the hell out of me that it's catching on as these aren't 3d audio recorded/binaural tracks and often aren't done well.
Here's a few good examples.
interactive(move vid around with cursor):

others:

finally:
Sony's own 360 Reality Audio headphone demo:

Brees2Thomas · Apr 2, 2020

anexanhume said:
This is very important. It gives us an upper bound for how much memory bandwidth we would want on a per CU basis. Since the clock is the same as the GPU, PS5's number is 720GB/s.

Thanks for the breakdown. Question: Can you tell me in laymen's terms where you got the 720GB/s number and what is that exactly?

Z-Brownie · Apr 2, 2020

Black_Stride said:
Fundamental problem with text discussions....its hard to give inflections.
I was speaking in hyperbole.

Physics are still a thing actual instantaneous loading of an entire game couldnt actually be possible with anything approaching AAA standards.
But clever game design with the speeds and power we have available could easily let a game like GTA have you move across the map from one character to the next very very quickly.

Its not unrealistic to think with speeds we have you could have a portal-esc game where you leave one portal in stage 1 and the other in stage 4 and still be able to move back all the way to stage 1 practically instantaneously.
Keep an instance of stage one in the SSD cache and when the player jumps through the portal, put it back in RAM and put stage 4 in the cache.

And thats just a simple example, so while eliminating load times entirely is not what im saying, reducing them and also allowing for stage to stage fast travel have a reaction of "i see no perceptible loading" is very possible.

i agree is "doable" with that design in mind

anexanhume · Apr 2, 2020

BlacknGoldBlood said:
Thanks for the breakdown. Question: Can you tell me in laymen's terms where you got the 720GB/s number and what is that exactly?

Cerny said that the Tempest Engine is a simplified single CU. He also said that it runs at the same frequency as the GPU, and that it's possible to hit 100% Vector ALU utilization. It can use up to 20 GB/s. Thus, if you have 100% utilization on 36 CUs. That number scales to 720 GB/s. Keep in mind the TE doesn't have local cache, whereas a GPU CU does, so it is a true upper bound.

chris 1515 · Apr 2, 2020

BlacknGoldBlood said:
Thanks for the breakdown. Question: Can you tell me in laymen's terms where you got the 720GB/s number and what is that exactly?

This how much bandwidth the PS5 can use in an idal world without cost limitation and where you can fully use the CU. It does not exist.

In theory with a 40% VALU you need 288 GB/s for the ALU part of the GPU. You have TMU, ROP taking bandwidth too.

Brees2Thomas · Apr 2, 2020

anexanhume said:
Cerny said that the Tempest Engine is a simplified single CU. He also said that it runs at the same frequency as the GPU, and that it's possible to hit 100% Vector ALU utilization. It can use up to 20 GB/s. Thus, if you have 100% utilization on 36 CUs. That number scales to 720 GB/s. Keep in mind the TE doesn't have local cache, whereas a GPU CU does, so it is a true upper bound.

chris 1515 said:
This how much bandwidth the PS5 can use in an idal world without cost limitation and where you can fully use the CU. It does not exist.

In theory with a 40% VALU you need 288 GB/s for the ALU part of the GPU. You have TMU, ROP taking bandwidth too.

Where I'm getting confused is, I thought PS5 only had 448GB/sec total bandwidth. Are you saying the memory bandwidth is actually increasing?

M3rcy · Apr 2, 2020

anexanhume said:
Cerny said that the Tempest Engine is a simplified single CU. He also said that it runs at the same frequency as the GPU, and that it's possible to hit 100% Vector ALU utilization. It can use up to 20 GB/s. Thus, if you have 100% utilization on 36 CUs. That number scales to 720 GB/s. Keep in mind the TE doesn't have local cache, whereas a GPU CU does, so it is a true upper bound.

And at 40% utilization you'd theoretically need 288 GB/s. Of course in real-life it's not that simple.

Edit:Beaten

natestellar · Apr 2, 2020

anexanhume said:
Cerny said that the Tempest Engine is a simplified single CU. He also said that it runs at the same frequency as the GPU, and that it's possible to hit 100% Vector ALU utilization. It can use up to 20 GB/s. Thus, if you have 100% utilization on 36 CUs. That number scales to 720 GB/s. Keep in mind the TE doesn't have local cache, whereas a GPU CU does, so it is a true upper bound.

HBM2 would've solved all their problems, haha.

On a serious note, unless RDNA2 has made substantial gains in regards to memory consumption. That bandwidth is gonna present a problem for the entirety of PS5 life-cycle.

Also, I don't know what to make of the testing he did in the video on 5700/5700XT, RDNA1 GPUs are notorious for not scaling well with clocks, aren't they? So, only incremental performance over substantial overclocking shouldn't come as a surprise.

M3rcy · Apr 2, 2020

BlacknGoldBlood said:
Where I'm getting confused is, I thought PS5 only had 448GB/sec total bandwidth. Are you saying the memory bandwidth is actually increasing?

No. He's saying to fully satisfy the bandwidth demands at 100% utilization, you'd need memory that performed at that speed.

PlayStation 5 System Architecture Deep Dive |OT| Secret Agent Cerny

Mostly Positive

User requested account closure

Vodka martini, dirty, with Tito's please

Self-Requested Ban

Community Resettler

Mambo Number PS5

User requested account closure

Avatar Master Painter