Reposting from last thread
Indeed. But is there any chance we can derive the number of stream processors from the die size? (estimation)
Or do we believe the GPU sports 3072 stream processors?
We can try!
We have two unknown variables unfortunately. Clocks and CU count. For reference, I'll be using Strange Brigade benchmarks from
here.
The knowns are:
RTX 2070 + 10% performance in Strange Brigade at 4K. This puts it within a few % of Vega 64, so let's call them equal for simplicity's sake.
Architecture gain of 1.25x per clock based on a suite of 30 benchmarks at 4K. This is a good comparison because it's more likely to stress any memory bandwidth disparities.
Perf/Watt gain of 1.5x over GCN at 14nm. I'll assume this is Vega 64 and immediately discard the metric. Why? Because we already know Vega 20 enjoys a 1.25x perf/Watt boost over Vega 64, so this is AMD admitting Navi is running at some clock where there are no additional perf/Watt advantages.
I think we should assume a minimum of 40 CUs based on the various leaks, and no more than 52 based on AdoredTV's numbers.
Vega 64 has a 1250MHz base clock and 1550MHz boost. To draw equal, Navi must make up any CU deficiencies not overcome with the 1.25X factor by clocks. This boosts 40CUs to an effective 50, meaning a 64/50 ratio boost to clocks. 1,600MHz base clock. 1984MHz boost clock. These don't seem totally far fetched given AMD says Navi clocks better, and Nvidia designs can clock that high.
44CUs: 1450MHz base, 1800MHz boost.
48CUs: 1333MHz base, 1650MHz boost.
52CUs: 1250MHz base, 1550MHz boost. (No change)
What's also interesting in my mind is CU sizing. If CUs have grown a lot, it really speaks to a lot of architecture rework. Vega 20 fits 64 CUs and a 4096 bit HBM2 controller in 332mm^2. I think it has more negative die area than strictly needed, and my personal belief is because this could have been a hurried refresh as a pipe cleaner, as well as current and power density meant it couldn't be shrunk further due to IR loss and heat density concerns. Navi is sporting a 255mm^2 die.
Navi has a 256-bit GDDR6 interface, and we know per 128 bits, it's 1.5-1.75x larger than a 1024 bit HBM2 interface. Let's assume Navi's 256 bit and Vega 20's 4096-bit are roughly equal, rather than GDDR6 being 20-25% smaller. I do this because I assume Navi will have less negative die area.
That means the rest of the area should be roughly equal, and so we can do an approximate CU sizing.
40 CUs: Navi CUs are 23% larger than Vega 20.
44 CUs: 12%
48 CUs: 2%
52 CUs: -5%
In any event, 255mm^2 is a good sign for consoles being able to include a full Navi 10 die along with 70mm^2 Zen 2 design, with some spare room for glue logic and misc. IO. If that leaked dev kit PS5 rumor is true (312mm^2), we're clearly dealing with a cut down Navi 10 (or a LITE version with smaller CUs?)
Which outcome is better for consoles? I would argue the smaller CU is actually better for consoles, because it makes the clock situation a lot more favorable. I think the 52CU scenario is infeasible because there's no way AMD would market a GPU with clocks that low and make the statements they did. I think we are likely looking at 44CUs for Navi based on it giving us a 1800MHz boost, which lines up perfectly with Radeon 7, and gives up the 0% perf/Watt advantage of Navi over Vega 20 that we expect. Of course, 40CUs is the best case if you believe the Gonzalo leak, because it tells us that console GPU clocks are actually 10% under desktop GPU boost clocks.
RX 580 clocks are 1257MHz/1340MHz, which means Xbox One X comes within 7% of desktop base clocks and 13% of boost clocks, and so I think Gonzalo clocks are completely believable for a 40CU Navi with ~2000MHz boost clock. They are far-fetched for anything beyond 44CUs.
And drum roll, teraflop time!
Given the clocks are scaled based on CU count, all above configurations have the same metrics. 8.2TF bass, 10.1TF boost. This puts us right in the TF band we expect for consoles (if a bit on the low side). I suspect the RX 5700 is not the top end Navi SKU though (I would expect an 8 or 9 in the name), and there's probably a full die version with 4-8 more CUs enabled, meaning all the above calculations are going to move up. Given consoles will most likely disable CUs for yield, this may absolutely still be a comparable situation. Conclusion, I remain team 10TF, but they punch like 12.5TF.