• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.
  • We have made minor adjustments to how the search bar works on ResetEra. You can read about the changes here.

nib95

Contains No Misinformation on Philly Cheesesteaks
Banned
Oct 28, 2017
18,498
No both top speeds can be maintained as long as you're not surpassing the total power budget. It has been explained by Cerny clearly, he was not equivocal. He said he expects both CPU and GPU to run most of the time at their peak frequencies.
The issue is how much power CPU and GPU are dissipating together not their running frequencies. There is a cap on the top power consumption level.
Power consumption is a function of clock speed but most importantly of the instructions (workload) that you're asking a processor to do at that speed. If you're asking a CPU to increment a register by 1, it doesn't take the resources hence the power dissipated as if you were asking to do a complex floating point calculation.
There is a workload monitor that decides to downclock either the CPU or GPU or both (depending on a priority scheme) only if the total power consumption budget has been reached.
This is pretty much what Cerny said until more details come out.

This.

Hence I concluded my earlier post above with the following.

nib95 said:
Now of course, perhaps Cerny minced his words, but I'm not so sure based on the language, and everything else Cerny said. My guess is that hitting max frequencies on both the CPU and GPU simultaneously, is not enough to hit their set power limit, and instead it's more so the types of tasks or instructions being carried out (some are more power hungry than others) that is likely the potential limiting factor. But I could be wrong.


AMD Smart Shift transferring power to and from the GPU and CPU where required for efficiency or productivity gains, is not the same thing as both the GPU and CPU being able to hit max frequencies where the work load or tasks require it. AMD Smart Shift in this scenario would recognise that max frequencies were required for said instructions and allow it.

In this situation (which Cerny implies is uncommon), the limiting factor based on what he's explained wouldn't be simultaneous max clocks on both the CPU and GPU, but as you've explained and Cerny details elsewhere, their maximum set power threshold instead.

Based on what Cerny said, it doesn't seem like simultaneous combined max frequency clocks are enough to on their own hit the PS5's power limit or cap, but instead the work the CPU/GPU are actually carrying out, as some workloads or instructions are more power hungry than others, even at max frequency clocks.
 
Last edited:

TheRealTalker

Member
Oct 25, 2017
21,494
giphy.gif
Old threads flashbacks activated
 

beta

Member
Dec 31, 2019
176
Thats not how it works, XSX has 6 (2GB) chips and 4 (1GB) chips, for a total of 320bit bus, so to access the 10(560GB/s) GB you use the first GB of the 6 (2GB) chips and the 4(1GB) chips, then the 6GB left on the 2GB chips, are the ones with lower bandwidth.

I understand it this way, sorry for my bad English.



Edit: I am not an expert and maybe this is a dumb though.

I was thinking, even when the 336GB/s "pool" is used for things that doesn't need a lot of bandwidth, since the bus of this "pool" is shared with the 560GB/s pool, if the cpu/gpu has to read in this part of memory, don't this will affect heavy the 560GB/s if you need to access that pool at the same time? Can someone explain this?

There is no simultaneous access, the access will be shared between the two probably using some sort of the queue based system on the memory controller where the CPU gets priority to keep latency down.

If an access is made to the slower pool then the bus runs at that effective rate until the transaction is done. Same with the faster pool.
 

zeuanimals

Member
Nov 23, 2017
1,454
AMD Smart Shift does work like that (to reduce wasted power, increase power and load efficiency etc), but that has nothing to do with simultaneous peak frequency clocks in the non-typical event where that it is actually required. You're conflating two different things.

And it doesn't really matter how many times you watched it if your language comprehension limits your understanding of what he said. The transcript of what he said is literally quoted above, and my breakdown covers why what you're saying isn't true except in the event Cerny simply mispoke.

The only reason the clockspeed would drop is if there's a sudden significant load, right? And only then will the CPU downclock so the GPU can get back to normal speeds?
 

Crayon

Member
Oct 26, 2017
15,580
I have the feeling that if i could better grasp his example of the horizon map screen, i could get this power budget thing. So far from what i can tell, its mostly a thing to do with cooling. That was the context he was talking about it in. I think. Maybe ill watch it again.
 

icecold1983

Banned
Nov 3, 2017
4,243
Again, that comes down to blunt power, not definitive "superiority" for development or performance.

The PS5 GPU has a bunch of unique built in instruments that may vastly improve performance, which XSX doesn't have (nor anyone for that matter) . For example an entire chip dedicated to on-the-fly optimisations (e.g. culling all pollies on reverse surfaces, all pollies that are side-on, combining pollies that are similar, etc). There was a whole other chip dedicated to similar optimisations but I forget what this was. The Xbox Series X has no such in-built GPU systems, IIRC. It has impressive instruments but they were more about providing options for users and developers rather than optimising the actual workload (e.g. working HDR on BC games (definitely cool) and also offering a raytracing tool).

Indeed, it's still up in the air whether only first party exclusive developers will make use of these really cool GPU instruments Sony developed. But from how Cerny described them, it sounds almost like it would simply be a "switch" that developers can flip as long as their assets/architecture can be read by the GPU. And if it works, and is that easy, it'll immediately make a ton of graphical load and performance optimisations automatically.

Perhaps that'll make up for that 15% raw GPU power difference. Perhaps not. We have to wait and see.

But again, the clear message from tech and development spheres is that "raw power" does not mean "superior", be it in GPU or CPU or whathaveyou.

what in the world? the PS5 doesnt have any unique chips or instruments to improve GPU performance. there is no chip dedicated to on the fly optimizations. how would a chip like this even function? i think youre confusing what Cerny dubbed the geometry engine. thats a standard part of every GPU more commonly called the "front end". its not a separate chip. the exact same geometry engine will be in xbox as well as all PC AMD gpus based on rdna 2. it just handles the setup of triangles before shading is performed. the only unique chip cerny announced was the audio engine but that is specifically for HRTF. general sound processing will still have to be done on the cpu. to summarize, the only advantage PS5 has over xbox is a faster SSD. in every other way xbox is flat out better.
 

Dictator

Digital Foundry
Verified
Oct 26, 2017
4,931
Berlin, 'SCHLAND
Yeah I figured it would most likely be for software not yet coded to make use of many threads. Almost seems like a marketing thing.
It is for titles that are not n wave threaded and do not use a very nwide Job System, a lot of titles do not do that. A number of engines still just target 7 - 8 threads.
It is not a Marketing tactic, just a choice for developers whose engines were designed around xb1 and ps4 instead of next gen PCs.
 

icecold1983

Banned
Nov 3, 2017
4,243
The PS5 also has a faster clock speed for the GPU, by a reasonable margin too.

There's a lot we still don't know about the consoles though, and there are also many differences where it's not clear which approach will end up being better.

the faster clock, of which we dont even know whats sustained under load, is more than offset by the far fewer CUs. its completely outclassed when it comes to everything outside the SSD. why is this even being argued? in terms of rendering and compute its incredibly clear which approach is better.
 

endlessflood

Banned
Oct 28, 2017
8,693
Australia (GMT+10)
the faster clock, of which we dont even know whats sustained under load, is more than offset by the far fewer CUs. its completely outclassed when it comes to everything outside the SSD. why is this even being argued? in terms of rendering and compute its incredibly clear which approach is better.
You said the SSD was the only advantage that the PS5 has over the XX. I was just pointing out that it also has a GPU clock speed advantage, because it does. I wasn't arguing anything, just stating a fact.
 

vivftp

Member
Oct 29, 2017
19,764
Forget everything else they've talked about, what I want to know about the PS5 is much more important. I want to know if I have my PS5 in standby mode and the power in my house fluctuates, what happens when I turn it back on? Is the SSD going to have to run extensive scans to ensure there's no corruption? Will there still be a risk of corruption? Or will this new setup just boot up perfectly each time as though nothing happened?

My PS4 Pro got fucked up once due to those circumstances and I had to wipe everything and start from scratch. It's the reason I'm hesitant to keep it in rest mode. I'd love to know if that's something I wouldn't have to worry about on the PS5.
 

Liabe Brave

Professionally Enhanced
Member
Oct 27, 2017
1,672
I was thinking, even when the 336GB/s "pool" is used for things that doesn't need a lot of bandwidth, since the bus of this "pool" is shared with the 560GB/s pool, if the cpu/gpu has to read in this part of memory, don't this will affect heavy the 560GB/s if you need to access that pool at the same time? Can someone explain this?
The extra GB in the 2GB chips that form the "slow pool" have a different set of memory addresses, so you can intentionally separate contents from the "fast pool" even though they exist together on the same physical chip. But the traces which connect that chip to the SOC aren't doubled. So my understanding is that when you're sending data to the "slow pool", that will negatively impact the speed of the "fast pool". Only 4 chips will be running at full speed, the others will be at a lower percentage depending on how the data for the two address sets is queued/interleaved.

PS5 will also have CPU/GPU contention issues that will reduce its effective RAM bandwidth, though of a different kind.

There are 10 GDDR6 chips on the XSX PCB, 3 on two sides, 4 on top. Those 4 on top are 16Gb/2GB chips running at 1750MHz QDR and the remaining 6 are 8Gb/1GB chips also running 1750MHz QDR, all connected on 32 bit GDDR6 memory controllers.
  • 1750MT/s x 32bits x 10 ICs is 560 GB/s if you multiply by the total capacity per chip and divide by 8.
  • 1750MT/s x 32 bits x 6 ICs is 336 GB/s, similar.
I'm pretty sure most of this is incorrect or mixed up. There are 10 RAM chips, yes. But there's not 4 at 2GB and 6 at 1GB, that would only be 14 GB total. You have the quantity per capacity reversed. Which also means you have the layout wrong--at least one chip on each of the sides is also 2GB. (I don't believe we know whether the layout is 1-4-1 or 2-2-2.)

But the more confusing thing is your explanation of speeds and how to derive bandwidth. First, how is "1750MHz" a data rate? Megahertz as units indicates a memory clock speed. But second, no GDDR6 is clocked that high. (We'll see where you actually got the number from below.) Third, why do you call it a quad data rate (QDR)? GDDR6 uses a double data rate--it's got "DDR" right there in the acronym. Fourth, when you reuse the 1750 figure in your bullet points, the units are now listed as "MT/s", which is proper. But that's not a data rate nor a clockspeed, it's total transfers (T). It's actual clockspeed (875MHz) times transfers per clock (2, for double data rate memory). Fifth, you say to multiply by "the total capacity per chip", but this isn't so for all chips. On the 2GB chips, you can only multiply by the half of the capacity that has the same address range as the others.

Even ignoring your errors, the formula you use can be greatly simplified and made clearer to readers. Instead of the elaborate string of factors, some of which have units people aren't generally familiar with, I think it makes more sense to put it this way:

Bus size (bits) = [number of chips] x 32
Bandwidth (GB/s) = [bus size (bits)] x [chip data rate (gbps)] / 8 (bits per byte)

For PS5, there are 8 2GB chips at 14gbps. So {8 x 32} means the bus is 256-bit. {256 x 14 / 8} means the bandwidth is 448 GB/s.
XSX is a touch more complicated because there are 10 physical chips at 14gbps, but 6 of them are 2GB and 4 are 1GB. Think of the chips as homes along a street, with 4 single homes with driveways and 6 duplexes of double size which share a driveway. There are 10 addresses that end in A, and 6 addresses that end in B.

ramroadj5k39.jpg

Each home represents a gigabyte of capacity, but the A and B addresses in the duplexes can't use the shared driveway at the same time. So we can consider the A addresses alone, where the bus size is {10 x 32} or 320-bit. {320 x 14 / 8} means the bandwidth is 560 GB/s. Or we can consider the B addresses alone, where the bus size is {6 x 32} or 192-bit. {192 x 14 / 8} means the bandwidth is 336 GB/s. Note that when the 6GB of B addresses are being filled or emptied, you can only fill or empty 4GB of "GPU-optimal" A addresses, at a bandwidth of {4 x 32 x 14 / 8} or 224 GB/s. When the 10GB of A addresses are being filled or emptied, you can't fill or empty the CPU/OS 6GB at all.

Of course, neither of those extremes will be reached. The two address sets will instead alternate using the interfaces ("driveways") very rapidly. But this does mean that due to these conflicts, the average bandwidth won't reach the 560 GB/s ceiling. (This is true for all unified memory pools, including PS5.)

I have the feeling that if i could better grasp his example of the horizon map screen, i could get this power budget thing. So far from what i can tell, its mostly a thing to do with cooling. That was the context he was talking about it in. I think. Maybe ill watch it again.
The map thing is talking about what kinds of situations cause power use to go up. It's obvious that one way is to go from very simple game engines to much more complicated and powerful ones. A simple indie game doesn't require as many calculations as quickly as a AAA showpiece,, so less wattage is used. But the point Mr. Cerny was making was about a counterintuitive fact that at some point, making the workflow more complicated actually requires less power.

This is because if a developer devises long and elaborate calculations, each step may need multiple input values, which themselves have to be calculated. If even one of the input steps takes longer than the rest, the overall calculation stalls because the following step can't start. At a certain point, too many stalls means framerate or IQ suffers as the GPU is no longer able to go fast enough to finish each frame in time. Developers optimize to reduce stalls and avoid this, but they also try to improve results, meaning stalls are never wholly eliminated. So in very advanced games, the GPU is not truly saturated, because some of it is waiting around for work.

Take a highly-optimized engine and give it something straightforward to do, though, and it can shove simple instructions through the GPU as fast as the silicon can go, with no stalls. So Horizon's menu screen suddenly is using every single transistor all the time, and power use spikes.

The intent of controlling CPU/GPU frequency by monitoring activity rather than temp is to be able to immediately detect these "supermax" usage events, and reduce the clock accordingly. That has no effect on rendering, because the associated workloads are simple and can be completed at pace even with lower clockspeed. But it allow you to eliminate all the highest power spikes a constant-frequency chip would face. So you can now design a cooling solution just good enough for the lower maxima hit by advanced gameplay scenarios and their semi-stalled activity. That saves size, cost, and noise.
 
Last edited:

GhostTrick

Member
Oct 25, 2017
11,316
It is for titles that are not n wave threaded and do not use a very nwide Job System, a lot of titles do not do that. A number of engines still just target 7 - 8 threads.
It is not a Marketing tactic, just a choice for developers whose engines were designed around xb1 and ps4 instead of next gen PCs.


It might also be interesting for those few "120 fps" titles which wont use much CPU power nor threads but will still need as much as they can to not be CPU limited over 60fps. Getting 10% more CPU power might be useful for those few titles.
 

Crayon

Member
Oct 26, 2017
15,580
The extra GB in the 2GB chips that form the "slow pool" have a different set of memory addresses, so you can intentionally separate contents from the "fast pool" even though they exist together on the same physical chip. But the traces which connect that chip to the SOC aren't doubled. So my understanding is that when you're sending data to the "slow pool", that will negatively impact the speed of the "fast pool". Only 4 chips will be running at full speed, the others will be at a lower percentage depending on how the data for the two address sets is queued/interleaved.

PS5 will also have CPU/GPU contention issues that will reduce its effective RAM bandwidth, though of a different kind.

I'm pretty sure most of this is incorrect or mixed up. There are 10 RAM chips, yes. But there's not 4 at 2GB and 6 at 1GB, that would only be 14 GB total. You have the quantity per capacity reversed. Which also means you have the layout wrong--at least one chip on each of the sides is also 2GB. (I don't believe we know whether the layout is 1-4-1 or 2-2-2.)

But the more confusing thing is your explanation of speeds and how to derive bandwidth. First, how is "1750MHz" a data rate? Megahertz as units indicates a memory clock speed. But second, no GDDR6 is clocked that high. (We'll see where you actually got the number from below.) Third, why do you call it a quad data rate (QDR)? GDDR6 uses a double data rate--it's got "DDR" right there in the acronym. Fourth, when you reuse the 1750 figure in your bullet points, the units are now listed as "MT/s", which is proper. But that's not a data rate nor a clockspeed, it's total transfers (T). It's actual clockspeed (875MHz) times transfers per clock (2, for double data rate memory). Fifth, you say to multiply by "the total capacity per chip", but this isn't so for all chips. On the 2GB chips, you can only multiply by the half of the capacity that has the same address range as the others.

Even ignoring your errors, the formula you use can be greatly simplified and made clearer to readers. Instead of the elaborate string of factors, some of which have units people aren't generally familiar with, I think it makes more sense to put it this way:

Bus size (bits) = [number of chips] x 32
Bandwidth (GB/s) = [bus size (bits)] x [chip data rate (gbps)] / 8 (bits per byte)

For PS5, there are 8 2GB chips at 14gbps. So {8 x 32} means the bus is 256-bit. {256 x 14 / 8} means the bandwidth is 448 GB/s.
XSX is a touch more complicated because there are 10 physical chips at 14gbps, but 6 of them are 2GB and 4 are 1GB. Think of the chips as homes along a street, with 4 single homes with driveways and 6 duplexes of double size which share a driveway. There are 10 addresses that end in A, and 6 addresses that end in B.

ramroadj5k39.jpg

Each home represents a gigabyte of capacity, but the A and B addresses in the duplexes can't use the shared driveway at the same time. So we can consider the A addresses alone, where the bus size is {10 x 32} or 320-bit. {320 x 14 / 8} means the bandwidth is 560 GB/s. Or we can consider the B addresses alone, where the bus size is {6 x 32} or 192-bit. {192 x 14 / 8} means the bandwidth is 336 GB/s. Note that when the 6GB of B addresses are being filled or emptied, you can only fill or empty 4GB of "GPU-optimal" A addresses, at a bandwidth of {4 x 32 x 14 / 8} or 224 GB/s. When the 10GB of A addresses are being filled or emptied, you can't fill or empty the CPU/OS 6GB at all.

Of course, neither of those extremes will be reached. The two address sets will instead alternate using the interfaces ("driveways") very rapidly. But this does mean that due to these conflicts, the average bandwidth won't reach the 560 GB/s ceiling. (This is true for all unified memory pools, including PS5.)


The map thing is talking about what kinds of situations cause power use to go up. It's obvious that one way is to go from very simple game engines to much more complicated and powerful ones. A simple indie game doesn't require as many calculations as quickly as a AAA showpiece,, so less wattage is used. But the point Mr. Cerny was making was about a counterintuitive fact that at some point, making the workflow more complicated actually requires less power.

This is because if a developer devises long and elaborate calculations, each step may need multiple input values, which themselves have to be calculated. If even one of the input steps takes longer than the rest, the overall calculation stalls because the following step can't start. At a certain point, too many stalls means framerate or IQ suffers as the GPU is no longer able to go fast enough to finish each frame in time. Developers optimize to reduce stalls and avoid this, but they also try to improve results, meaning stalls are never wholly eliminated. So in very advanced games, the GPU is not truly saturated, because some of it is waiting around for work.

Take a highly-optimized engine and give it something straightforward to do, though, and it can shove simple instructions through the GPU as fast as the silicon can go, with no stalls. So Horizon's menu screen suddenly is using every single transistor all the time, and power use spikes.

The intent of controlling CPU/GPU frequency by monitoring activity rather than temp is to be able to immediately detect these "supermax" usage events, and reduce the clock accordingly. That has no effect on rendering, because the associated workloads are simple and can be completed at pace even with lower clockspeed. But it allow you to eliminate all the highest power spikes a constant-frequency chip would face. So you can now design a cooling solution just good enough for the lower maxima=um hit by advanced gameplay scenarios and their semi-stalled activity. That saves size, cost, and noise.

Thank you thank you. The bigger picture has just been sinking in with me over the last few hours after i listened to the talk again. You give good details here.

Actually the last little light bulb went off when you posted in the other thread wondering about the strategy behind going this route.
 

vivftp

Member
Oct 29, 2017
19,764
Don't mind me if I'm talking nonsense, but I was just curious about something. In Cerny's talk he discussed the process of modifying an AMD GPUs CU to act very similar to an SPU from the PS3 for the Tempest Engine. Now Cell utilized 6 SPUs. Say for a crazy moment Sony wanted to slap 6 of these new style SPUs in the PS5 to help with PS3 BC, well first would that actually be helpful over trying to do emulation like the variant done on PC these days? And say they did go this crazy route of putting 6 SPUs in there, what else would be needed on top of that to create a "modern Cell" that could do the BC? Would the rest of the GPU hardware in the PS5 be able to handle it just fine or would something else be needed?

Finally, in terms of cost would 6 CUs really add that much cost to the PS5? Now of course these are "modified CUs" so that might make a difference.

Anyways, just my random late night musings I thought I'd toss out in case anyone wants to actually discuss it. If it's utter nonsense then please ignore :)
 

Crayon

Member
Oct 26, 2017
15,580
Don't mind me if I'm talking nonsense, but I was just curious about something. In Cerny's talk he discussed the process of modifying an AMD GPUs CU to act very similar to an SPU from the PS3 for the Tempest Engine. Now Cell utilized 6 SPUs. Say for a crazy moment Sony wanted to slap 6 of these new style SPUs in the PS5 to help with PS3 BC, well first would that actually be helpful over trying to do emulation like the variant done on PC these days? And say they did go this crazy route of putting 6 SPUs in there, what else would be needed on top of that to create a "modern Cell" that could do the BC? Would the rest of the GPU hardware in the PS5 be able to handle it just fine or would something else be needed?

Finally, in terms of cost would 6 CUs really add that much cost to the PS5? Now of course these are "modified CUs" so that might make a difference.

Anyways, just my random late night musings I thought I'd toss out in case anyone wants to actually discuss it. If it's utter nonsense then please ignore :)

I think he meant just kinda like an spu. Not like close enough that you could run ps3 stuff on em.
 

vivftp

Member
Oct 29, 2017
19,764
I think he meant just kinda like an spu. Not like close enough that you could run ps3 stuff on em.

Yeah, I didn't think he went into enough detail on the modifications to really say how close they were to the actual PS3 SPUs in abilities, but still thought I'd toss it out there since I don't know a damn thing about the subject.
 

BreakAtmo

Member
Nov 12, 2017
12,838
Australia
PS5 will also have CPU/GPU contention issues that will reduce its effective RAM bandwidth, though of a different kind.

Do you know if GDDR6 has improved granularity over GDDR5? I heard that that would help with contention issues.

Third, why do you call it a quad data rate (QDR)? GDDR6 uses a double data rate--it's got "DDR" right there in the acronym.

Will we ever get QDR or GQDR (or, hell, LPQDR) RAM? Because that sounds pretty nice.
 

Liabe Brave

Professionally Enhanced
Member
Oct 27, 2017
1,672
Do you know if GDDR6 has improved granularity over GDDR5? I heard that that would help with contention issues.
I do not, sorry.

Will we ever get QDR or GQDR (or, hell, LPQDR) RAM? Because that sounds pretty nice.
Well, GDDR5X had a quad data rate mode, that's what made it different from plain GDDR5. But my understanding is that QDR is a tradeoff, where access latency increases as date rate goes up.
 

beta

Member
Dec 31, 2019
176
The extra GB in the 2GB chips that form the "slow pool" have a different set of memory addresses, so you can intentionally separate contents from the "fast pool" even though they exist together on the same physical chip. But the traces which connect that chip to the SOC aren't doubled. So my understanding is that when you're sending data to the "slow pool", that will negatively impact the speed of the "fast pool". Only 4 chips will be running at full speed, the others will be at a lower percentage depending on how the data for the two address sets is queued/interleaved.

PS5 will also have CPU/GPU contention issues that will reduce its effective RAM bandwidth, though of a different kind.


I'm pretty sure most of this is incorrect or mixed up. There are 10 RAM chips, yes. But there's not 4 at 2GB and 6 at 1GB, that would only be 14 GB total. You have the quantity per capacity reversed. Which also means you have the layout wrong--at least one chip on each of the sides is also 2GB. (I don't believe we know whether the layout is 1-4-1 or 2-2-2.)

But the more confusing thing is your explanation of speeds and how to derive bandwidth. First, how is "1750MHz" a data rate? Megahertz as units indicates a memory clock speed. But second, no GDDR6 is clocked that high. (We'll see where you actually got the number from below.) Third, why do you call it a quad data rate (QDR)? GDDR6 uses a double data rate--it's got "DDR" right there in the acronym. Fourth, when you reuse the 1750 figure in your bullet points, the units are now listed as "MT/s", which is proper. But that's not a data rate nor a clockspeed, it's total transfers (T). It's actual clockspeed (875MHz) times transfers per clock (2, for double data rate memory). Fifth, you say to multiply by "the total capacity per chip", but this isn't so for all chips. On the 2GB chips, you can only multiply by the half of the capacity that has the same address range as the others.

Even ignoring your errors, the formula you use can be greatly simplified and made clearer to readers. Instead of the elaborate string of factors, some of which have units people aren't generally familiar with, I think it makes more sense to put it this way:

Bus size (bits) = [number of chips] x 32
Bandwidth (GB/s) = [bus size (bits)] x [chip data rate (gbps)] / 8 (bits per byte)

For PS5, there are 8 2GB chips at 14gbps. So {8 x 32} means the bus is 256-bit. {256 x 14 / 8} means the bandwidth is 448 GB/s.
XSX is a touch more complicated because there are 10 physical chips at 14gbps, but 6 of them are 2GB and 4 are 1GB. Think of the chips as homes along a street, with 4 single homes with driveways and 6 duplexes of double size which share a driveway. There are 10 addresses that end in A, and 6 addresses that end in B.

ramroadj5k39.jpg

Each home represents a gigabyte of capacity, but the A and B addresses in the duplexes can't use the shared driveway at the same time. So we can consider the A addresses alone, where the bus size is {10 x 32} or 320-bit. {320 x 14 / 8} means the bandwidth is 560 GB/s. Or we can consider the B addresses alone, where the bus size is {6 x 32} or 192-bit. {192 x 14 / 8} means the bandwidth is 336 GB/s. Note that when the 6GB of B addresses are being filled or emptied, you can only fill or empty 4GB of "GPU-optimal" A addresses, at a bandwidth of {4 x 32 x 14 / 8} or 224 GB/s. When the 10GB of A addresses are being filled or emptied, you can't fill or empty the CPU/OS 6GB at all.

Of course, neither of those extremes will be reached. The two address sets will instead alternate using the interfaces ("driveways") very rapidly. But this does mean that due to these conflicts, the average bandwidth won't reach the 560 GB/s ceiling. (This is true for all unified memory pools, including PS5.)


The map thing is talking about what kinds of situations cause power use to go up. It's obvious that one way is to go from very simple game engines to much more complicated and powerful ones. A simple indie game doesn't require as many calculations as quickly as a AAA showpiece,, so less wattage is used. But the point Mr. Cerny was making was about a counterintuitive fact that at some point, making the workflow more complicated actually requires less power.

This is because if a developer devises long and elaborate calculations, each step may need multiple input values, which themselves have to be calculated. If even one of the input steps takes longer than the rest, the overall calculation stalls because the following step can't start. At a certain point, too many stalls means framerate or IQ suffers as the GPU is no longer able to go fast enough to finish each frame in time. Developers optimize to reduce stalls and avoid this, but they also try to improve results, meaning stalls are never wholly eliminated. So in very advanced games, the GPU is not truly saturated, because some of it is waiting around for work.

Take a highly-optimized engine and give it something straightforward to do, though, and it can shove simple instructions through the GPU as fast as the silicon can go, with no stalls. So Horizon's menu screen suddenly is using every single transistor all the time, and power use spikes.

The intent of controlling CPU/GPU frequency by monitoring activity rather than temp is to be able to immediately detect these "supermax" usage events, and reduce the clock accordingly. That has no effect on rendering, because the associated workloads are simple and can be completed at pace even with lower clockspeed. But it allow you to eliminate all the highest power spikes a constant-frequency chip would face. So you can now design a cooling solution just good enough for the lower maxima hit by advanced gameplay scenarios and their semi-stalled activity. That saves size, cost, and noise.

Wouldn't the content on the XSX affect it more then the PS5 because of the split bandwidth, contention will cause lower bandwidth on both but on the PS5 at least all the memory runs at the same speed (as far as we know for now), meaning that any access should be at the full speed.

Where as the XSX the lower speed pools make the bus slower for a period of the memory transaction as you mentioned. I have a feeling the requests will interleaved in a way to prioritise the CPU first requests to reduce latency, which the CPU is much more sensitive to compared to the GPU.
 
Mar 29, 2018
7,078
what in the world? the PS5 doesnt have any unique chips or instruments to improve GPU performance. there is no chip dedicated to on the fly optimizations. how would a chip like this even function? i think youre confusing what Cerny dubbed the geometry engine. thats a standard part of every GPU more commonly called the "front end". its not a separate chip. the exact same geometry engine will be in xbox as well as all PC AMD gpus based on rdna 2. it just handles the setup of triangles before shading is performed. the only unique chip cerny announced was the audio engine but that is specifically for HRTF. general sound processing will still have to be done on the cpu. to summarize, the only advantage PS5 has over xbox is a faster SSD. in every other way xbox is flat out better.
Right, my understanding of the geometry engine being a separate chip must have been due to how his presentation slides introduced it. Looked like a distinctly separate thing. I based my posts on having watched two PS5 tech videos and two XSX tech videos, and they both sounded equally good, with slightly different directions/pros and cons.
 

foamdino

Banned
Oct 28, 2017
491
"And so for them to go wider, for them to go to 14 hardware threads, it means that they have the system to do it, but then, you have to have workloads that split even more effectively across them. And so we're actually finding that the vast majority of developers - talking with them about the their choices for launch - the vast majority are going to go with the SMT disabled and the higher clock."

So MS here is saying that for CPU workloads they anticipate narrow and fast is better than wide and slow as it's harder to effectively use all the hardware threads :interesting:

(I know this isn't apples to apples comparison with GPU workloads which often fall into the "embarrassingly parallel" camp - but this is precisely what Cerny said and has been ridiculed for by a set of frothing fanboys)
 
Feb 1, 2018
5,242
Europe
I don't see why BC would need 3.8 over 3.6. These Cpu's are so much more powerful than the jaguar cores that even lower clocks than 3.6 would suffice. So no I don't really think it's for BC.

It's got to be for titles that weren't coded to take full advantage of so many threads. Most likely crossgen titles if I would have to guess
Yeah actually that is what I meant with BC :) Wrong wording.
 

beta

Member
Dec 31, 2019
176
"And so for them to go wider, for them to go to 14 hardware threads, it means that they have the system to do it, but then, you have to have workloads that split even more effectively across them. And so we're actually finding that the vast majority of developers - talking with them about the their choices for launch - the vast majority are going to go with the SMT disabled and the higher clock."

So MS here is saying that for CPU workloads they anticipate narrow and fast is better than wide and slow as it's harder to effectively use all the hardware threads :interesting:

(I know this isn't apples to apples comparison with GPU workloads which often fall into the "embarrassingly parallel" camp - but this is precisely what Cerny said and has been ridiculed for by a set of frothing fanboys)

At launch its an obvious choice because the current engines are only written for 7 threads for the PS4 and XBONE anyway, in the future I suspect they will move towards 14 threads instead and towards better latency hiding.
 

Deleted member 28231

User requested account closure
Banned
Oct 31, 2017
36
the faster clock, of which we dont even know whats sustained under load, is more than offset by the far fewer CUs. its completely outclassed when it comes to everything outside the SSD. why is this even being argued? in terms of rendering and compute its incredibly clear which approach is better.

Your statement is not completely true. As you can see, Playstation 5 might have advantages in rasterizing and ROPs.

Raw-Power-Xbox-Series-X-vs-Playstation-5.png
 

Deleted member 28231

User requested account closure
Banned
Oct 31, 2017
36
Yeah, that's why i wrote "might". But just saying GPU of XBOX Series X is always and in every situation faster because 12 TF > 10 TF is just wrong. But you are right, time will tell how everything plays out.
 
Mar 22, 2020
87
Eh I don't think you really know what you are talking about. Did you turn things around on accident? Ofcourse devs are going to use SMT and it's going to have alot more impact than "1%". Where did you even get those numbers?
The numbers come from tests by Hardware Unboxed and GamersNexus which use the most rigorous methods for benchmarking I can find. Devs can definitely use SMT but I would expect their games to be definitely GPU bound when not rendering in 4K and definitely GPU bound when going 4K.
My 1-5% figure was not on 4K resolutions, I would expect not any difference when at 4K. It's also measured when using far more RAM and a RTX 2080 Ti so it's unlikely it has more impact on slower GPUs and with a lot of dedicated I/O hardware freeing up a lot of CPU time.

Edit: if you are talking about current gen games then I guess? But those weren't coded to make use of that many threads. This should pretty much change real fast if not immediately for next gen games.
I would not anticipate figures for next gen games, after all AMD chips went over some Intel chips that beat them when they released.

I think it threw me off cause he said ps5 is running 1400mhz on memory instead of 1750mhz(14gbps).
So this should be 1750mhz for all 16gb of ps5, not 1400mhz? Question, are you the same guy who posted this on reddit or did you just copy his post?
Yes, I was wrong for thinking they used different ICs of less than 14Gbps, I didn't make that post on reddit, but he does make more sense on the layout at least.
I went back on corrected my post, but not on this thread, so I edited out the corrections. Sorry for the confusion.

I'm pretty sure most of this is incorrect or mixed up. There are 10 RAM chips, yes. But there's not 4 at 2GB and 6 at 1GB, that would only be 14 GB total. You have the quantity per capacity reversed. Which also means you have the layout wrong--at least one chip on each of the sides is also 2GB. (I don't believe we know whether the layout is 1-4-1 or 2-2-2.)
Yes it is a typo, it is 6 2GB and 4 1GB chips. I will correct it on older posts.
But the more confusing thing is your explanation of speeds and how to derive bandwidth. First, how is "1750MHz" a data rate? Megahertz as units indicates a memory clock speed. But second, no GDDR6 is clocked that high. Third, why do you call it a quad data rate (QDR)? GDDR6 uses a double data rate--it's got "DDR" right there in the acronym. Fourth, when you reuse the 1750 figure in your bullet points, the units are now listed as "MT/s", which is proper.
GDDR6 is QDR, just like GDDR5X, hence why the 1750MT/s translate to 14GT/s (2 transfers for rise, 2 for fall).
Also the 14Gbps chips are clocked at 1750MHz, that's what both Micron and SKHynix list on their product pages.

Read the spec from micron here: https://www.micron.com/-/media/client/global/documents/products/technical-note/dram/tned03_gddr6.pdf
It also includes more info on GDDR6 granularity for BreakAtmo .

I'm not claiming 1750MHz refers to a data rate. But if you say 14Gbps, it usually means the IC is running at that clock frequency.
But that's not a data rate nor a clockspeed, it's total transfers (T). It's actual clockspeed (875MHz) times transfers per clock (2, for double data rate memory). Fifth, you say to multiply by "the total capacity per chip", but this isn't so for all chips. On the 2GB chips, you can only multiply by the half of the capacity that has the same address range as the others.
Yes, I worded this poorly, but you see in the formulas I used half the capacity of the chips:
  • 10GB from 4 chips of 1GB, 6GB from 6 chips of 2GB/2,
  • and the rest of 6GB from the other half of those 6 2GB chips.
I'm adding the corrected quote below. So far the typo is the only issue there, because the QDR and clock frequency is what GDDR6 specs at for 14Gbps.

There are 10 GDDR6 chips on the XSX PCB, 3 on two sides, 4 on top.

Those 4 on top are 16Gb/2GB 8Gb/1GB chips running at 1750MHz QDR and the remaining 6 are 8Gb/1GB 16Gb/2GB chips also running 1750MHz QDR, all connected on 32 bit GDDR6 memory controllers.
  • 1750MT/s x 32bits x 10 ICs is 560 GB/s if you multiply by the total capacity per chip and divide by 8. from 4 chips of 1GB, 6 "half-chips" of 2GB
  • 1750MT/s x 32 bits x 6 ICs is 336 GB/s, similar. from 6 "half-chips" of 2GB.
 
Nov 8, 2017
13,113
SMT provides different benefits for different workloads. The rule of thumb in tech circles is to expect up to ~30% as an average, but it's usually less for gaming. If a game is specifically programmed with it in mind you would expect closer to that 30% figure than PC ports that aren't well threaded (some games benefit a lot, some not at all or even have a tiny penalty).
 

McFly

Member
Nov 26, 2017
2,742
Again, comparing peak performance when we know that some workloads will have variable frequency is not that apples to apples. Only time will tell the actual difference between both.
A Teraflop is a peak performance metric. It assumes 100% utility of the floating point capability of the GPU.
 

Deleted member 10847

User requested account closure
Banned
Oct 27, 2017
1,343
A Teraflop is a peak performance metric. It assumes 100% utility of the floating point capability of the GPU.

But that number will vary according to frequency, you cannot compared a fixed frequency with a max variable frequency since it will not be sustained for 100% of the workloads, otherwise it would be fixed.
 

BreakAtmo

Member
Nov 12, 2017
12,838
Australia
I do not, sorry.

Well, GDDR5X had a quad data rate mode, that's what made it different from plain GDDR5. But my understanding is that QDR is a tradeoff, where access latency increases as date rate goes up.

So I was looking up details on GDDR6 and I actually found this:


Diving a bit deeper here, there are really two core changes coming from GDDR5 that enable GDDR6's big bandwidth boost. The first is the implementation of Quad Data Rate (QDR) signaling on the memory bus. Whereas GDDR5's memory bus would transfer data twice per write clock (WCK) via DDR, GDDR6 (& 5X) extends this to four transfers per clock. All other things held equal, this allows GDDR6 to transfer twice as much data per clock as GDDR5.

The challenge in doing this, of course, is that the more you pump a memory bus, the tighter the signal integrity requirements. So while it's simple to say "let's just double the memory bus bandwidth", doing it is another matter. In practice a lot of work goes into the GPU memory controller, the memory itself, and the PCB to handle these transmission speeds.

Is this still true?

It also had this section, but I'm not sure if this is referring to the same sort of "granularity" that's needed to fix memory contention issues:

Moving on, the second big change for GDDR6 is that how data is read out of the DRAM cells themselves has changed. For many generations the solution has been to just read and write in larger strides – the prefetch value – with GDDR5 taking this to 8n and GDDR5X taking it to 16n. However the resulting access granularities of 32 bytes and 64 bytes respectively were on the path of becoming increasingly suboptimal for small memory operations. As a result, GDDR6 does a larger prefetch and yet it does not.

GDDR_Channels_575px.png


Whereas both GDDR5 and GDDR5X used a single 32-bit channel per chip, GDDR6 instead uses a pair of 16-bit channels. This means that in a single memory core clock cycle (ed: not to be confused with the memory bus), 32 bytes will be fetched from each channel for a total of 64 bytes. This means that each GDDR6 memory chip can fetch twice as much data per clock as a GDDR5 chip, but it doesn't have to be one contiguous chunk of memory. In essence, each GDDR6 memory chip can function like two chips.

For graphics this doesn't have much of an impact since GPUs already read and write to RAM in massive sequential parallelism. However it's a more meaningful change for other markets. In this case the smaller memory channels will help with random access performance, especially compared to GDDR5X and its massive 64 byte access granularity.
 

ShapeGSX

Member
Nov 13, 2017
5,228
Technically they are already doing it. The XB1 reserves 3GB of RAM or its OS, the XSX is preserving 2.5GB. Surely the newer console using eve less RAM while having more features would be a benefit of that SSD.

Question is why is it even still so much? possible answers; maybe they just need that much RAM for their OS, maybe that reserve is that much in case they want to add in features later on...etc. Its also possible that their full OS is using as much as 5GB of RAM and the rest 2.5GB of it sits in the SSD.

I would expect the PS5 to definitely do something similar, but better. I expect sony to reserve only like 1GB of RAMfor their OS and be able to swap in an additional 3GB for the "full" OS the second you hit the home button for a total 4GB OS.

It would first have to swap out the 3GB to the SSD. So it would take over a second for this to happen. That's a pretty long wait compared to what we have now.

I expect that Microsoft will end up releasing some of that 2.5GB as the generation goes on. They've done that in the past generations.
 

ShapeGSX

Member
Nov 13, 2017
5,228
I believe it is as ridiculous as to expect 1825MHz constant clocks on a massive 52 CUs chip sharing die space with a 16 thread Zen2 CPU, 100MHz higher, or without SMT 300MHz higher clocks than what the PS5 uses. The APU does share its vapor chamber coldplate with 10 GDDR6 ICs and also most of the VRM power stages, I'm not expecting XSX iGPU clocks to hold up as announced.

360 square mm isn't that massive. It's big for a console, but not big for a chip these days. It's about the same size as the Xbox One X APU.

I don't see any reason why they wouldn't be able to run this chip at the announced frequencies with the cooling system they have. In fact, I wouldn't be surprised if they have some headroom in terms of frequency.

What's your reasoning to think that they're lying?

Edit: I found some of your other posts. There's too much fantasy engineering going on here. There's no reason not to believe Microsoft when they say that their clocks do not vary when running a game. If it is idle, it may lower its frequency, given the somewhat fleeting attention that consoles seem to get from environmentalists. They certainly will not have any reason to lower the clock due to temperature or power unless it is in a really extreme environment. They've got smart people there.
 
Last edited:

Pheonix

Banned
Dec 14, 2018
5,990
St Kitts
Again, comparing peak performance when we know that some workloads will have variable frequency is not that apples to apples. Only time will tell the actual difference between both.
Gues this is a thing now...

I would say, until proven otherwise, we should take what Ceny said as fact. The PS5 would run at these clocks [email protected] and [email protected] "Most of the time".

I would also like to point out that even the XSX with its fixed frequency... is also still running a variable frequency because in every computing device frequencies scale u ad down with the load.

The only difference between the two here is that the XSX has a fixed max frequency but doesn't fix the power consumption required to get to that max frequency. The PS5 has a fixed max power consumption and a "capped" max GPU frequency.

Furthermore, I think there is a lot of misinformation going around. If a processor is designed to hit a certain frequency and has the power available to drive it, it WILL hit that frequency. Every single time. If it gets too hot, then it requires more power to get there and that in turn makes it hotter and hotter until it gets fried. A processor running at its max clock speed doesn't mean or is not the same thing with it being at its most efficient.

It would first have to swap out the 3GB to the SSD. So it would take over a second for this to happen. That's a pretty long wait compared to what we have now.

I expect that Microsoft will end up releasing some of that 2.5GB as the generation goes on. They've done that in the past generations.
Not with the PS5... the PS5 can theoretically move around 5.5GB/s. So it should be able to swap out the 3GB from RAM in under 0.60 seconds. And copy the OS over in chunks based importance.

______________________

On a different note; I honestly feel Cerny/sony made a huge mistake during their talk. They should've just come out and said [email protected], [email protected], and SSD 5.5GB/s. Show a cleaner version better version of that their Spiderman demo, in 20 seconds that explained loading and streaming speed better than all Cerny aid in 15mins.

Show a couple of demos to explain some features. Shut it down and call it a day. Not a peep about variable freq. and constant power. That was them having too much faith in the world lol. Naive even.
 
Last edited:

Deleted member 10847

User requested account closure
Banned
Oct 27, 2017
1,343
Gues this is a thing now...

I would say, until proven otherwise, we should take what Ceny said as fact. The PS5 would run at these clocks [email protected] and [email protected] "Most of the time".

Well, not doubting Cerny, but what it most of the time? 80% of time? 90%of the time? Key word here is still variable frequency.

I would also like to point out that even the XSX with its fixed frequency... is also still running a variable frequency because in every computing device frequencies scale u ad down with the load.

The only difference between the two here is that the XSX has a fixed max frequency but doesn't fix the power consumption required to get to that max frequency. The PS5 has a fixed max power consumption and a "capped" max GPU frequency.

Where did you get this from? MS stated several times the word "fixed" during their presentation. So i still assume that those clocks will be achievable independent of the workload.

Furthermore, I think there is a lot of misinformation going around. If a processor is designed to hit a certain frequency and has the power available to drive it, it WILL hit that frequency. Every single time. If it gets too hot, then it requires more power to get there and that in turn makes it hotter and hotter until it gets fried. A processor running at its max clock speed doesn't mean or is not the same thing with it being at its most efficient.

This is basically how todays parts operate, my 2080ti when OCed starts at 2100mhz but will quickly downclock to 1900mhz (losing performance) either because:

1 - A certain temperature threshold was achieved
2 - Power target limits


For example: Witcher 3 at 4K even with a coller GPU will stay around 1900mhz and the initial 2100mhz boost, it lost more or less 10% of its frequency.
 

BradleyLove

Member
Oct 29, 2017
1,464
On a different note; I honestly feel Cerny/sony made a huge mistake during their talk. They should've just come out and said [email protected], [email protected], and SSD 5.5GB/s. Show a cleaner version better version of that their Spiderman demo, in 20 seconds that explained loading and streaming speed better than all Cerny aid in 15mins.

Show a couple of demos to explain some features. Shut it down and call it a day. Not a peep about variable freq. and constant power. That was them having too much faith in the world lol. Naive even.
It was a briefing for developers though, and not intended for the layman.

But for sure, they should have done a prior reveal in a more consumer-friendly format.
 
Mar 22, 2020
87
360 square mm isn't that massive. It's big for a console, but not big for a chip these days. It's about the same size as the Xbox One X APU.
Well, I think the larger amount of CU is a massive difference, on the opposite side the die is bigger than 2019's Navi (a small die), but 1/2 the area of a 2080 Ti die. So on that size there would be a 8C Zen2 chip, a lot of I/O hardware, and also 52CUs of RDNA2 ? It's very dense.
I don't see any reason why they wouldn't be able to run this chip at the announced frequencies with the cooling system they have. In fact, I wouldn't be surprised if they have some headroom in terms of frequency.
What's your reasoning to think that they're lying?
Well I'm surprised they claimed clock frequencies to be constant (similarly to what you said about constant power consumption), they'll rely on similar firmware and smartshift basically raising or lowering CPU or GPU frequency (GPU frequency can get lowered to save power or lower thermals, while also increasing performance if the CPU becomes the bottleneck, so it's a good thing). I'm just thinking 1825MHz is quite high considering there is a power threshold for the APU at about 250W, and the APU has 2x the amount of GDDR6 over a RX 5700 XT, and I expect an optimized Zen 2 CPU to at least consume 50/60W, which is a lot of added heat on the same die. Again, it's also tough for Sony.
A good RX 5700 XT holds about 1900MHz and consumes between 190 to 220W of power (in a 3D Mark load, similar to a game power load), and TSMC N7P announced 10% power reduction @ iso-speed, 7% speed increase @ iso-power. It means 2090MHz at the same power, or 20W less at the same speed from a change node only. So I would expect the PS5 GPU to be able to hold ~2100MHz, but the GDDR6 increase and CPU is also an unknown quantity.
The most unknown quantity, I guess, is the scaling to a 52 CUs configuration. I know I did not also account for any improvements to power efficiency in the architecture alone, and clocking alone (as well as IPC) because AMD didn't exactly detail it yet, (I'm sure there are some). In the past, scaling was not exactly perfect, especially with binning a larger chip at 7nm, maybe it ends up a bit outside the curve.

Again, it may not be "lying" but more like omitting to mention a few things. More like:
  • we got warned by Mark Cerny that frequency would not hold 24/7, yet the performance estimate was based on boost clocks,
  • MS did not warn us about any clock drops, on the opposite "constant clocks", yet smartshift is precisely dynamic clocks, and the GPU is larger yet also quite fast.
Again you made some pretty good points, and it may be that their thermal solution is relatively efficient. I do believe it's not impossible to see some throttle on a XboX, especially since I don't account for any I/O hardware both consoles use, which should consume a fair bit of power too.
 
Last edited:

Pheonix

Banned
Dec 14, 2018
5,990
St Kitts
Where did you get this from? MS stated several times the word "fixed" during their presentation. So i still assume that those clocks will be achievable independent of the workload.



This is basically how todays parts operate, my 2080ti when OCed starts at 2100mhz but will quickly downclock to 1900mhz (losing performance) either because:

1 - A certain temperature threshold was achieved
2 - Power target limits


For example: Witcher 3 at 4K even with a coller GPU will stay around 1900mhz and the initial 2100mhz boost, it lost more or less 10% of its frequency.
Where did I get that from? Common sense. And also by looking at any single video benchmark out there. You are complicating this thing.

XSX fixed frequencies doesn't mean (as is the case with any computing device) that it would be running at that clock whenever any amount of load is passed onto the Chip. It fluctuates. It would be ridiculously inefficient to be running at max clocks ALL the time!!!!

You are completely right that the clocks in the XSX can be reached independently of the workload, and that is exactly the problem sony is trying to fix. Because while the clocks in the XSX can be reached independent of the workload, its also doing that regardless of how much power drawn and in turn heat it's generating while getting there.

The PS5 variable frequency is NOT and doe NOT work how anything out there with a variable frequency works. I don't know how many ties this needs to be explained. I really don't want to get into it because its been explained multiple times in multiple threads but think of it this way.

XSX = variable frequency up to a fixed (nominal) 1825Mhz. Variable power, variable temp.

PC = Variable frequency up to and beyond nominal frequency. Beyond it is aka "Boost mode" and dependent on thermal headroom or manual frequency lock which makes it behave like the XSX above. Variable power. Variable temp.

PS5= variable frequency up to its nominal fixed (capped) 2230Mhz. Fixed power. Fixed temp.
 

ShapeGSX

Member
Nov 13, 2017
5,228
PS5= variable frequency up to its nominal fixed (capped) 2230Mhz. Fixed power. Fixed temp.

It's not possible for the PS5 to have fixed power and fixed temperature. Different workloads will cause different power numbers, up to a capped power number based on the power model in the APU. Also, he never said that the temperature was fixed.
 

M3rcy

Member
Oct 27, 2017
702
XSX fixed frequencies doesn't mean (as is the case with any computing device) that it would be running at that clock whenever any amount of load is passed onto the Chip. It fluctuates. It would be ridiculously inefficient to be running at max clocks ALL the time!!!

I think you're wrong about this, based on every way it's been described. It will have power states, so during media playback or when in the dashboard it may clock lower but it won't be continually variable according to load. When running a game power usage will be determined by load alone.

For that matter, the PS5 won't have massive variances in clock either, hence the "only a couple of percent difference" in clocks that was stated.
 

Cyborg

Banned
Oct 30, 2017
1,955
Can we compare PS5 with RX 5700 XT? If I understood it correctly RX 5700 XT wasnt targeting 4K gaming.
 
Mar 22, 2020
87
Can we compare PS5 with RX 5700 XT? If I understood it correctly RX 5700 XT wasnt targeting 4K gaming.
You might expect a similar power draw, but AMD included a lot of new functionalities and we don't know how they all impact performance:
  • VRS (wasn't mentioned, should be a part of RDNA2), probably fairly big impact over RDNA1,
  • Sony's cache scrubbers, if I understood properly, are meant to free as much of the VRAM as possible by eliminating useless or duplicated assets ? It's very important, AMD overutilizing VRAM is usually the first reason you see stalls of the compute units. I think they said the XboX doesn't have that ?
  • Smartshift, apparently adding up to 10% performance, regardless of either CPU/GPU needing more power.
  • clocking, power consumption changes a bit from TSMC N7P (~10% power headroom for higher frequencies, probably)
  • It's still unclear if AMD had additional improvements on IPC and general efficiency, they did claim 1.5x perf-per-watt over RDNA1, it is a big claim. I have to wonder where they calculated it (it's definitely a real value, but is it on the overall system ? If they used a console as reference, it is definitely bound to be quite efficient over a PC).
Also, consoles might do a bit of upscaling from 1800p to run 4K@60fps, Contrast Adaptive Sharpening on existing Navi GPUs make it work pretty well. I wouldn't be surprised to see developers tuning the level of detail as well.
 

McFly

Member
Nov 26, 2017
2,742
But that number will vary according to frequency, you cannot compared a fixed frequency with a max variable frequency since it will not be sustained for 100% of the workloads, otherwise it would be fixed.
That does not change the systems peak capabilities. Fixed clock does not mean that the system is performing at its peak capacity. It varies based on work being done. Clock is constant, power draw varies because workload varies. You're not doing 12TF worth of fp ops every second, there are tons more things that the gpu is doing that is not fp ops. This introduces the possibility of the system to thermal throttle and shut down if the cooling systems are not up to the task of keeping up with the power being draw. Allowing the system to vary its clock means that you never pass that point where the cooling systems cannot handle because they automatically drop clock a few points down to lower power draw by greater degree.