Differences Between RDNA and Previous Devices
These architectural changes affect how code is scheduled for performance:
Single cycle instruction issue
Previous generations issued one instruction per wave once every 4 cycles, but now instructions are issued every cycle.
Wave32
Previous generations used a wavefront size of 64 threads (work items). This generation supports both wavefront sizes of 32 and 64 threads.
Conventions 2 of 232
"RDNA 1.0" Instruction Set Architecture
Workgroup Processors
Previoiusly the shader hardware was grouped into "compute units" ("CUs") which contained ALU, LDS and memory access. Now the "workgroup processor" ("WGP") replaces the compute unit as the basic unit of computing. This allows significantly more compute power and memory bandwidth to be directed at a single workgroup.
Programming Model Changes
- FLAT_SCRATCH and XNACK_MASK are no longer in SGPRs
They are in dedicated hardware registers accessed via S_GETREG_B32 and S_SETREG_B32
- Added a scalar source enum: NULL (reads zero and writes nothing).
- Image operations add a DIMension field
- Memory operations gain DLC bit (Device Level Coherence) to control level-1 caching
- Buffer clamping rules in MUBUF/MTBUF is explicitly controlled by the buffer resource
- Separated dependency counters for vector memory loads from stores
- Moved POPS_PACKER from mode to a hardware register accessed via S_GETREG_B32
and S_SETREG_B32
- SGPRs are no longer allocated: every wave gets a fixed number of SGPRs
Instruction Changes
• DS_PERMUTE/DS_BPERMUTE are limited to 32-lane permutation • DPP (renamed to DPP16) is limited to 16-lane access
• VALU ops can use two SGPR inputs instead of just one
• VALU VOP3 format can use a literal constant
• VALU V_CMPX writes only EXEC, not also an SGPR
• VALU Add & Sub instructions have change names to clarify carry-in and carry-out • VALU all float-16 math uses FMA instead of MAD
• T# and V# (resource constants) have some bit changes
• Added SALU ops to quickly set float round & denormal modes
• Removed:
◦ S_SET_GPR_IDX family of instructions (use V_MOVREL for GPR indexing) ◦ CBRANCH_FORK and CBRANCH_JOIN
◦ All non-reverse VALU V_SHIFT opcodes
◦ VSKIP
◦ Removed non-volatile instruction control