• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.
Status
Not open for further replies.

Wollan

Mostly Positive
Member
Oct 25, 2017
8,811
Norway but living in France
Why do you want them to modify a CU inside the GPU to make the Tempest Engine less efficient because it will share memory access with the other CUs and the memory model is very different of the other CUs and GPU aren't memory latency sensitive, audio is memory latency sensitive.
Ok I was just curious if there was a secondary bus somewhere on the APU so that bandwidth wouldn't be needed to be shared with other CU's (hence the L1 cache is not needed) so that they could potentially use one of the idle CU's as the Tempest CU. The PS4 APU has a secondary 20GBps bus so that L2 and L1 cache can be bypassed (which likely the PS5 APU does as well due to BC concerns).
 

Pantato

Member
Nov 5, 2019
68
I would advise against oversimplifying that much, bandwidth and latency in DRAM are nowhere near.
I specifically mentionned from a GPU point of view, what good would do the fastest DRAM in the world if the GPU can only communicate with it at PCIE speed and latency? Of course from a CPU point of view, it would be suicidal to process directly into the SSD...
 

Transistor

Vodka martini, dirty, with Tito's please
Administrator
Oct 25, 2017
37,128
Washington, D.C.
Yes. I think I know where you're going here. Maybe? Lol
Effect and Cause, the level where you were constantly shifting through time? Imagine something like that but on a whole world scale because the data can come in and out of the SSD so fast.

In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.
 

dgrdsv

Member
Oct 25, 2017
11,850
Yes it is an SIMD unit but the memory model works differently no cache and DMA call with a scratchpad memory. This is what is an SPU and it works perfectly for hardware.
Most modern GPU SMs can be configured to use their on-chip memory as "scratchpad memory" (called LDS these days) or a s/w trasnsparent cache.
Again, "SPU" is just a name Sony gave to the SIMD units Cell had. It doesn't really have any unique h/w capabilities which are absent from modern streaming processors i.e. GPU SMs/CUs.
 

chris 1515

Member
Oct 27, 2017
7,074
Barcelona Spain
Ok I was just curious if there was a secondary bus somewhere on the APU so that bandwidth wouldn't be needed to be shared with other CU's (hence the L1 cache is not needed) so that they could potentially use one of the idle CU's as the Tempest CU. The PS4 APU has a secondary 20GBps bus so that L2 and L1 cache can be bypassed (which likely the PS5 APU does as well due to BC concerns).

Yes PS5 has it I am sure they have the next version the bus Onion. In more advanced APU AMD replace this bus by an unique bus able to share data.

What I wanted to explain Sony decide to take some space in the die for 3D Audio and I/O complex. The I/O is not in the SOC on Xbox Series X. It means maybe they could have 0.x more flops on the GPU but they decided to sacrifice a little bit of GPU power for this.

Maybe it was a bad decision or maybe it is a good one, only time will tell...
 

Vimto

Member
Oct 29, 2017
3,714
With headphones I'm able to hear enemy footsteps & locate them since PS3 days, so I don't know why Cerny brought it up like its a new feature lol
 

Black_Stride

Avenger
Oct 28, 2017
7,388
Effect and Cause, the level where you were constantly shifting through time? Imagine something like that but on a whole world scale because the data can come in and out of the SSD so fast.

In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.

Just thinking about being able to load in entire new levels boreline instantaneously is blowing my mind.

Xmen Nightcrawler game in coming?
Better yet give me a game where I play Shimazaki, v this is best example of what I imagine teleportation seems like to the user.



 

Hey Please

Avenger
Oct 31, 2017
22,824
Not America
For those in the know- How does sound processing work on PS4 and how much does it take away from (what is it) 7-core Jag (iirc 1 core is always reserved for OS)?
 

chris 1515

Member
Oct 27, 2017
7,074
Barcelona Spain
Most modern GPU SMs can be configured to use their on-chip memory as "scratchpad memory" (called LDS these days) or a s/w trasnsparent cache.
Again, "SPU" is just a name Sony gave to the SIMD units Cell had. It doesn't really have any unique h/w capabilities which are absent from modern streaming processors i.e. GPU SMs/CUs.

The LDS size is tinier than the local memory on a CELL SPU because you have some cache. Here there is no cache at all and they can replace it with more SRAM for the memory scratchpad.

It they did this there is a reason they probably think or find this is more efficient...
 
Last edited:

amstradcpc

Member
Oct 27, 2017
1,768
I dont know, I had Astro headset (A40) since 2010!

And I was able to 100% locate the enemy, even if he is one floor above me
Well, i have a pair of dolby headphones with a dolby decoder and are great, but what Cerny is talking about is achieving a full 360 sound origin with cheap airbuds.
 

DavidDesu

Banned
Oct 29, 2017
5,718
Glasgow, Scotland
To better understand how much of a revolution the PS5 SSD is, let's look at how things work on a PC.

The graphic card is connected to the main system memory with a PCIE bus, most of us are still using PCIE 3.0 16x, which has a theoretical bandwidth of 16GB/s.
Game data would need to be transfered from the mass storage (HDD or SSD) to the main RAM, and then to the VRAM via the PCIE bus at a real world speed of about 13GB/s.

That's pretty close from the typical 9GB/s compressed from the PS5 SSD into its unified RAM.

So, in PC terms, from the PS5 GPU point of view, it's like it has access to a gigantic 825GB of system RAM. Of course it's a bit of an oversimplification, but you get the idea.

Now, imagine the games that could be done with this amount of RAM!

Yeah it's certainly getting closer to RAM.
I would advise against oversimplifying that much, bandwidth and latency in DRAM are nowhere near.
While sure I agree, we're definitely seeing the gap close and the generation after next might it not get so so much closer? Right now I'm visualising mass storage being this tank sectioned off from RAM with relatively narrow pipes feeding the data across, but in future I'm visualising more of a really wide pipe, very wide as it meets the RAM and only slightly narrowing as it reaches mass storage with slower speed access there, but everything not all that far away from the GPU and CPU ultimately.

I wonder is there scope for storage interacting directly with current processes, not involving RAM at all? Let's say you launch a missile in game and it lands several miles away. You can't see it but the game has saved to storage the fact you have done that and places a crater at the site forever in memory which if you visit the area will be visible.
 

dgrdsv

Member
Oct 25, 2017
11,850
The LDS size is tinier than the local memory o an a CELL SPU because you have some cache.
programming-the-ps3-4-728.jpg

+16KB register file

main-qimg-1a0e0df6c8a9bc250c78019bee640853


So while technically the LDS on SPU is bigger than even on GV100's SM, it's actually smaller if you account for the register file size difference.
 

Binabik15

Member
Oct 28, 2017
4,601
Thanks for all the examples! I don't want to throw up a huge quote wall that makes scrolling the thread less pleasing, but I'll listen to everything 😊
 

Sklaary

Member
Mar 21, 2020
546
twitter.com

Digital Foundry on Twitter

“We've been working on this one for a while - a deeper look into the system architecture of PlayStation 5, with more details from Mark Cerny: https://t.co/34epXz5BXg”

Somebody please make a thread!
 

Chamon

Member
Feb 26, 2019
1,221
With headphones I'm able to hear enemy footsteps & locate them since PS3 days, so I don't know why Cerny brought it up like its a new feature lol
Consoles have had this ability for a long time, but the quality is really poor. There is a big difference between a sound you can locate in a "game world" and one you can locate in real life. I hope that next gen close that gap as much as possible.
 

Thera

Banned
Feb 28, 2019
12,876
France
Effect and Cause, the level where you were constantly shifting through time? Imagine something like that but on a whole world scale because the data can come in and out of the SSD so fast.

In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.
Even recently. In Death Stranding :
Imagine that, instead of having an awful long loading time or cinematic to go to WWI part, eveything popped out in the world.
They managed to do that if you are taken down and need to fight a BT, but a whole world and level design. The impact would have been completely different (not the boring gunplay :( )
 

Dashful

Community Resettler
Member
Oct 25, 2017
2,399
Canada

AndyD

Mambo Number PS5
Member
Oct 27, 2017
8,602
Nashville
Even recently. In Death Stranding :
Imagine that, instead of having an awful long loading time or cinematic to go to WWI part, eveything popped out in the world.
They managed to do that if you are taken down and need to fight a BT, but a whole world and level design. The impact would have been completely different (not the boring gunplay :( )
Yep, Titanfall 2 did the time shifting areas as well, but it was all interior carefully controlled environments. They even did a behind the scenes tech explnation.
 

Pheonix

Banned
Dec 14, 2018
5,990
St Kitts
Can anyone give me layman's rundown of RTRT?

I know its a lot so I would share what I know, then any polish around that would be appreciated.

It's split into 3 parts, BVH which ( i think) isolates objects to project rays onto. The actual projecting of rays and each result being referred to as a sample. And then denoising which is necessary because not as many rays are cast into the scene as needed.

That's all I know...

What I don't know,
  • I gather that RT can be used in different ways, what's the least expensive to most expensive implementation? (barring full scene RT rendering obviously)
  • how may rays would be sufficient for something like global illumination?
  • what part of the render pipeline does RT coming and in a 16ms frame time how much of it can be going to it?
  • what the hell does Nvidia 10gigRays/s actually mean? How many ray does that translate to per pixel and per frame????
  • Is there a standard to measure GPU RT proficiency?
 

Belvedere

Member
Oct 27, 2017
2,683
Just started the article and I've already seen two different "bespoke" instances.

:P

This is amazing insight into development. And it's easy to forget how active Cerny is with development projects. Good job, DF.
 

gundamkyoukai

Member
Oct 25, 2017
21,105
The new SSD eg he gave really does have be wondering what sort of data they will be able to pull on the fly instead of keeping in the ram .
 

bcatwilly

Member
Oct 27, 2017
2,483
User Banned (3 Days): Ignoring the staff post
The new SSD eg he gave really does have be wondering what sort of data they will be able to pull on the fly instead of keeping in the ram .

That type of talk is definitely exciting for the SSD potential in next generation, but honestly at least in this talk with DF he really didn't cite anything yet regarding SSD use that shouldn't clearly be possible on the Series X SSD based on what they have shared and their talk about use of it as "virtual RAM" with 100GB of game assets at the ready and such. I am not saying that they may not come up with that example/demo at some point, just don't see it here yet.
 

anexanhume

Member
Oct 25, 2017
12,913
Maryland
A couple choice quotes here:

So, when I made the statement that the GPU will spend most of its time at or near its top frequency, that is with 'race to idle' taken out of the equation - we were looking at PlayStation 5 games in situations where the whole frame was being used productively. The same is true for the CPU, based on examination of situations where it has high utilisation throughout the frame, we have concluded that the CPU will spend most of its time at its peak frequency."
This is important to show they're not gaming the metric.

Cerny also stresses that power consumption and clock speeds don't have a linear relationship. Dropping frequency by 10 per cent reduces power consumption by around 27 per cent.
This is a cubic relationship. 0.9^3 = 72.9%. That means drastic reductions in power. Downclocks should indeed be minor.

It's an innovative approach, and while the engineering effort that went into it is likely significant, Mark Cerny sums it up succinctly: "One of our breakthroughs was finding a set of frequencies where the hotspot - meaning the thermal density of the CPU and the GPU - is the same. And that's what we've done. They're equivalently easy to cool or difficult to cool - whatever you want to call it."

This is also an important point. Some Intel desktop CPUs run faster than their integrated graphics-less counterparts because those idle graphics cores actually add as a thermal spreader. If you had uneven thermal density, they'd be contributing to each other's hot spots.

There's likely more to discover about how boost will influence game design. Several developers speaking to Digital Foundry have stated that their current PS5 work sees them throttling back the CPU in order to ensure a sustained 2.23GHz clock on the graphics core. It makes perfect sense as most game engines right now are architected with the low performance Jaguar in mind - even a doubling of throughput (ie 60fps vs 30fps) would hardly tax PS5's Zen 2 cores. However, this doesn't sound like a boost solution, but rather performance profiles similar to what we've seen on Nintendo Switch. "Regarding locked profiles, we support those on our dev kits, it can be helpful not to have variable clocks when optimising. Released PS5 games always get boosted frequencies so that they can take advantage of the additional power," explains Cerny.
Yes, developers already see dropped clocks, but it's intentional. Some PC benchmarking sites tear their hair out trying to interpret benchmark results because of unpredictable boost behavior. This eliminates that.

"All of the game logic created for Jaguar CPUs works properly on Zen 2 CPUs, but the timing of execution of instructions can be substantially different," Mark Cerny tells us. "We worked to AMD to customise our particular Zen 2 cores; they have modes in which they can more closely approximate Jaguar timing. We're keeping that in our back pocket, so to speak, as we proceed with the backwards compatibility work."
This is the subject of several of Cerny's patents.

GPUs process hundreds or even thousands of wavefronts; the Tempest engine supports two," explains Mark Cerny. "One wavefront is for the 3D audio and other system functionality, and one is for the game. Bandwidth-wise, the Tempest engine can use over 20GB/s, but we have to be a little careful because we don't want the audio to take a notch out of the graphics processing. If the audio processing uses too much bandwidth, that can have a deleterious effect if the graphics processing happens to want to saturate the system bandwidth at the same time."
This is very important. It gives us an upper bound for how much memory bandwidth we would want on a per CU basis. Since the clock is the same as the GPU, PS5's number is 720GB/s.

As a result, with the GPU if you're getting 40 per cent VALU utilisation, you're doing pretty damn well. By contrast, with the Tempest engine and its asynchronous DMA model, the target is to achieve 100 percent VALU utilisation in key pieces of code."
This shows just how little of a system's teraflops can be realistically used. 40% VALU usage!
 
Last edited:

gundamkyoukai

Member
Oct 25, 2017
21,105
That type of talk is definitely exciting for the SSD potential in next generation, but honestly at least in this talk with DF he really didn't cite anything yet regarding SSD use that shouldn't clearly be possible on the Series X SSD based on what they have shared and their talk about use of it as "virtual RAM" with 100GB of game assets at the ready and such. I am not saying that they may not come up with that example/demo at some point, just don't see it here yet.

Well all of this new to devs so we have to wait and see what happens.
 

Deleted member 4274

User requested account closure
Banned
Oct 25, 2017
3,435
Effect and Cause, the level where you were constantly shifting through time? Imagine something like that but on a whole world scale because the data can come in and out of the SSD so fast.

In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.

Thanks so much! As soon as you mentioned Titanfall 2, I understood. After that level I was in awe. Still never finished the game,
But I finished that level! A new soul reaver game on Ps5 would be bad ass!

Edit: this thread is great (sans the weird arguments). I've learned a ton about hardware in general on ERA. I really appreciate you all.
 

CypherSignal

Member
Oct 25, 2017
1,065
In fact, one of the original concepts for Resident Evil 4 was for Leon to shift between 3 different times / dimensions / realities, but the technology wasn't there at the moment. That vision could be fully realized and then some now.

Ditching the 5400 RPM HDD will be the biggest game changer for games in a long time.
Or, let's say you wanted to faithfully make a Star Wars-style story in a game, which regularly and radically changes out what plot thread, characters, setting, you're focused on every 5-10 minutes and the only interruption is a wipe and two-second-long exposition shot of a planet.

Especially in contrast to, say, a Raiders of the Lost Ark style game where you're basically following Indy for hours on end and almost never deviate from his perspective.
 

Z-Brownie

Member
Nov 6, 2017
3,907
Just thinking about being able to load in entire new levels boreline instantaneously is blowing my mind.

Xmen Nightcrawler game in coming?
Better yet give me a game where I play Shimazaki, v this is best example of what I imagine teleportation seems like to the user.





i guess your spectations are unrealistic, this could work with current gen games, but probably not with next gen, it reminds me as a kid thinking that "mortal kombat graphics was like real life", not trying to compare you to a kid at all, but think that games loadtimes will be nonexistent is a bit surreal imo
 

natestellar

Member
Sep 16, 2018
835
The memory bandwidth looks like more and more a big compromise: 20 GB/s of memory by Tempest engine, CPU probably 40/45 GB/s, SSD 9 GB/s.

Indeed, I wonder how costly those 16/18Gbps chips actually are. Because Github benchmarks in regards to memory bandwidth were promising.

Also, in the video, Richard confirms that Sony aren't using VRS. I wonder why? It's baked into the RDNA2 architecture and offers 10-15% extra performance, rather strange they didn't go with it.
 

gundamkyoukai

Member
Oct 25, 2017
21,105
The memory bandwidth looks like more and more a big compromise: 20 GB/s of memory by Tempest engine, CPU probably 40/45 GB/s, SSD 9 GB/s.

Well i doubt that devs will let it get so high since sound is be low for a lot of games but yeah wish the go with better ram chips for more bandwidth .

Indeed, I wonder how costly those 16/18Gbps chips actually are. Because Github benchmarks in regards to memory bandwidth were promising.

Also, in the video, Richard confirms that Sony aren't using VRS. I wonder why? It's baked into the RDNA2 architecture and offers 10-15% extra performance, rather strange they didn't go with it.

He did not confirm that , he just don't know if they using it or not.
This just cover what we find out about in GDC and had no new info per say .
 

Black_Stride

Avenger
Oct 28, 2017
7,388
i guess your spectations are unrealistic, this could work with current gen games, but probably not with next gen, it reminds me as a kid thinking that "mortal kombat graphics was like real life", not trying to compare you to a kid at all, but think that games loadtimes will be nonexistent is a bit surreal imo

Fundamental problem with text discussions....its hard to give inflections.
I was speaking in hyperbole.

Physics are still a thing actual instantaneous loading of an entire game couldnt actually be possible with anything approaching AAA standards.
But clever game design with the speeds and power we have available could easily let a game like GTA have you move across the map from one character to the next very very quickly.


Its not unrealistic to think with speeds we have you could have a portal-esc game where you leave one portal in stage 1 and the other in stage 4 and still be able to move back all the way to stage 1 practically instantaneously.
Keep an instance of stage one in the SSD cache and when the player jumps through the portal, put it back in RAM and put stage 4 in the cache.

And thats just a simple example, so while eliminating load times entirely is not what im saying, reducing them and also allowing for stage to stage fast travel have a reaction of "i see no perceptible loading" is very possible.
 

natestellar

Member
Sep 16, 2018
835
Well i doubt that devs will let it get so high since sound is be low for a lot of games but yeah wish the go with better ram chips for more bandwidth .



He did not confirm that , he just don't know if they using it or not.
This just cover what we find out about in GDC and had no new info per say .

Ah ok, he worded that very strangely. I thought he was implying Sony aren't using that feature. Still, would be nice if we got some confirmation on it. It's an important feature and something tailor made for VR titles.
 

Chris Metal

Avatar Master Painter
Member
Oct 25, 2017
2,582
United Kingdom
8D audio is just someone slowly shifting panning between L+R, it's a cheap gimmick done with post processing on released songs I saw another say 16D haha, can sound wierd with a wavey effect and annoys the hell out of me that it's catching on as these aren't 3d audio recorded/binaural tracks and often aren't done well.
Here's a few good examples.
interactive(move vid around with cursor):



others:




finally:
Sony's own 360 Reality Audio headphone demo:
 

Z-Brownie

Member
Nov 6, 2017
3,907
Fundamental problem with text discussions....its hard to give inflections.
I was speaking in hyperbole.

Physics are still a thing actual instantaneous loading of an entire game couldnt actually be possible with anything approaching AAA standards.
But clever game design with the speeds and power we have available could easily let a game like GTA have you move across the map from one character to the next very very quickly.


Its not unrealistic to think with speeds we have you could have a portal-esc game where you leave one portal in stage 1 and the other in stage 4 and still be able to move back all the way to stage 1 practically instantaneously.
Keep an instance of stage one in the SSD cache and when the player jumps through the portal, put it back in RAM and put stage 4 in the cache.

And thats just a simple example, so while eliminating load times entirely is not what im saying, reducing them and also allowing for stage to stage fast travel have a reaction of "i see no perceptible loading" is very possible.

i agree is "doable" with that design in mind
 

anexanhume

Member
Oct 25, 2017
12,913
Maryland
Thanks for the breakdown. Question: Can you tell me in laymen's terms where you got the 720GB/s number and what is that exactly?
Cerny said that the Tempest Engine is a simplified single CU. He also said that it runs at the same frequency as the GPU, and that it's possible to hit 100% Vector ALU utilization. It can use up to 20 GB/s. Thus, if you have 100% utilization on 36 CUs. That number scales to 720 GB/s. Keep in mind the TE doesn't have local cache, whereas a GPU CU does, so it is a true upper bound.
 

chris 1515

Member
Oct 27, 2017
7,074
Barcelona Spain
Thanks for the breakdown. Question: Can you tell me in laymen's terms where you got the 720GB/s number and what is that exactly?

This how much bandwidth the PS5 can use in an idal world without cost limitation and where you can fully use the CU. It does not exist.

In theory with a 40% VALU you need 288 GB/s for the ALU part of the GPU. You have TMU, ROP taking bandwidth too.
 

Brees2Thomas

Member
Dec 27, 2019
1,525
Cerny said that the Tempest Engine is a simplified single CU. He also said that it runs at the same frequency as the GPU, and that it's possible to hit 100% Vector ALU utilization. It can use up to 20 GB/s. Thus, if you have 100% utilization on 36 CUs. That number scales to 720 GB/s. Keep in mind the TE doesn't have local cache, whereas a GPU CU does, so it is a true upper bound.
This how much bandwidth the PS5 can use in an idal world without cost limitation and where you can fully use the CU. It does not exist.

In theory with a 40% VALU you need 288 GB/s for the ALU part of the GPU. You have TMU, ROP taking bandwidth too.
Where I'm getting confused is, I thought PS5 only had 448GB/sec total bandwidth. Are you saying the memory bandwidth is actually increasing?
 

M3rcy

Member
Oct 27, 2017
702
Cerny said that the Tempest Engine is a simplified single CU. He also said that it runs at the same frequency as the GPU, and that it's possible to hit 100% Vector ALU utilization. It can use up to 20 GB/s. Thus, if you have 100% utilization on 36 CUs. That number scales to 720 GB/s. Keep in mind the TE doesn't have local cache, whereas a GPU CU does, so it is a true upper bound.

And at 40% utilization you'd theoretically need 288 GB/s. Of course in real-life it's not that simple.

Edit:Beaten
 

natestellar

Member
Sep 16, 2018
835
Cerny said that the Tempest Engine is a simplified single CU. He also said that it runs at the same frequency as the GPU, and that it's possible to hit 100% Vector ALU utilization. It can use up to 20 GB/s. Thus, if you have 100% utilization on 36 CUs. That number scales to 720 GB/s. Keep in mind the TE doesn't have local cache, whereas a GPU CU does, so it is a true upper bound.

HBM2 would've solved all their problems, haha.

On a serious note, unless RDNA2 has made substantial gains in regards to memory consumption. That bandwidth is gonna present a problem for the entirety of PS5 life-cycle.

Also, I don't know what to make of the testing he did in the video on 5700/5700XT, RDNA1 GPUs are notorious for not scaling well with clocks, aren't they? So, only incremental performance over substantial overclocking shouldn't come as a surprise.
 
Status
Not open for further replies.