I skimmed for over 3 hours from page 74 to 265.
And that was already quite a ride and experience.
Well the changes to the graphics and compute array to support RPM was the most significant architectural change Vega brought.
And tbh, even Vega didn't launch with the "complete" package of features it was announced with... so by this logic no current Vega chip is actually Vega.
I would argue that the DSBR, ROPs as L2$-clients and HBCC were much more significant changes.
Actually, XB1's design was vanilla GCN2, before the arrangement of the GCA (graphics and compute array) into 4 Shader Engines.
Each SE has 2 ACEs, so going less than 4 SEs means fewer ACEs and thus lower overall asynchronous compute performance... I doubt they'd want that.
So I'm pretty sure the CU count (i.e. active plus deactivated) needs to be divisible by four for GCN4 and above.
Edit:
I misremembered. GCN2 introduced Shader Engines. But I think AMD does still maintain a 2x ACEs per SE ratio.
SE isn't just a name. It's an architectural arrangement. You're missing the point here. GCN2 didn't "add more SEs", it introduced the arrangement of SEs which wasn't present in GCN1.
It is just semantics in regards to the SEs.
AMD didn't changed the internal hardware scaling with GCN2, it was like that since GCN1.
The kernel driver for the software stack under Linux uses the same nomenclature and parameters for every GCN chip.
case CHIP_TAHITI:
adev->gfx.config.max_shader_engines = 2;
adev->gfx.config.max_tile_pipes = 12;
adev->gfx.config.max_cu_per_sh = 8;
adev->gfx.config.max_sh_per_se = 2;
adev->gfx.config.max_backends_per_se = 4;
adev->gfx.config.max_texture_channel_caches = 12;
adev->gfx.config.max_gprs = 256;
adev->gfx.config.max_gs_threads = 32;
adev->gfx.config.max_hw_contexts = 8;
https://github.com/RadeonOpenComput.../master/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
In addition "ACEs" don't scale with the number of SEs.
According to bridgman and the Linux drivers AMD actually counts and scales MECs (Micro Engine Compute).
Since GCN2 there is one block with 4 compute pipes (ACEs).
Kabini (low-cost APU with 128 GCN2 CUs) had 1 MEC and supported as such 32 compute queues.
It was just one Shader Engine.
Kaveri (Mainstream APU with 512 GCN2 CUs) has 2 MECs, support for 64 compute queues and also just one Shader Engine.
I remember a year and more ago, people were thinking PS5 and the next Xbox consoles would use Vega GPUs. Of course, that was absurd, especially when you realize that Vega was originally due around mid 2016 (then early 2017) but didn't reach Radeon gaming cards until Autumn 2017. Vega was very, very late. This GPU had been anticipated as far back as early 2015, as it was known as Greenland, AMD's next flagship GPU of Arctic Islands, to succeed Fiji. there would be Greenland desktop GPUs and there would be high core count Zen +Geenland stream+HBM APUs, etc.
March 2015 Fudzilla article -
AMD Greenland HBM graphics coming next year
April 2015 - AMD x86 16-core Zen APU detailed
Anyway, to keep things simple, Greenland = Vega 10.
The Arctic Islands series was meant to consist of 3 GPUs. Baffin, Ellesmere and Greenland. But things got split up into two GPU families. Arctic Islands was no more. Ellesmere was Polaris 10 and Baffin was Polaris 11 and Greenland was Vega 10.
Imagine consoles launching in Fall 2020 using Vega/Greenland GPUs which would be 3 to 4 year old technology at that point, depending on how you wanna look at Vega's development and delayed release. I've always believed that next gen consoles would use Navi or Next Gen, or a custom blend of both IP.
In the beginning I also thought that Greenland was renamed to Vega 10 like Ellesmere and Baffin to Polaris10 and 11 but that's obviously not what happened.
Greenland was supporting Half-Rate FP64, internal SRAM ECC and GMI for HPC APUs.
AMD cancelled the project and went with Vega10 and Vega20 instead, to adress the markets more specifically.
[...]
The pseudo-DX12 support on Win7 that was introduced with World of Warcraft a little earlier in the year is a bit of a hack to get multithreaded rendering on DX11.
Well it's not DX11.
It should be really the DX12 API and runtime retrofitted to work with the older WDDM model of W7.
PS: IIRC C1/Xenos from the X360 used an architecture branch which ATI developed for quite some time in regards to a Unified Shader Architecture.
But instead of using it around the 2005 time frame they developed the R300/400 base further and served the market with the R500 series till the R600 and derivatives served the PC market with ATI's first unified shader architecture.
The R600 had some crucial differences in comparison to the C1/Xenos.
The R600 was a VLIW5 archictecure where every "slot" could process one data element of it's own if no dependencies were present, what the marketing also liked to call "superscalar" back in the days.
C1/Xenos used Vec4 + 1 scalar ALUs, so less flexible and efficient.