" For compute my expectation is also in the 40-50 TFLOPS range. I hope I'm wrong but I doubt the industry will be able to keep up as they have over next generation for reasonable amounts of $$$, and next-gen transistors like CNTs etc probably won't be ready yet. The really interesting question here I think is what kind of architecture will be used. Is the UE5 demo a sign we're moving away from GPUs and towards GPGPU solutions? If any devs or programmers have insight or opinions about this I'd be all ears. " - cjn83
I too would love to hear some insight on this.
Electrical Engineer / programmer (not game dev) here, but I think my insight may clear the air on this topic.
GPGPU is kinda a misnomer, at least from the modern perspective of a gamer. It seems self descriptive "general processing on a graphics processing unit", right? And furthermore is described as a GPU offloading CPU tasks. Not so much.
The term GPGPU first came around in 2005 or so. And from the view of a data scientist or engineer it was pretty accurate.
GPU vs CPU is more about the design philosophy of semiconductor engineering and the capabilities of hardware with various designs. When it comes to designing a device with any given number of transistors the main goals are a) to provide the most efficient use those transistors, b) not completely break away from previous designs so abruptly that the previous software still works.
These different design philosophies have their own approach that has resulted in the unique differences of the various components produced over the years.
CPU:
When it comes to design of a CPU the fundamental driving factor is speed. We all remember or at least heard of the clock speed wars of old between AMD and Intel. The truth is that wasn't the end of it and I'm not just talking about the shift from MHz to GHz. Transistor frequencies were always going to be roughly whatever we can reach at a given process node size. CPUs are so focused on doing a single thing as fast as possible that over time as designers were given exponentially more transistors but were still limited by frequency they decided the best thing to do with them was attempt to predict what would happen if an given transistor were to flip in two, three, four, etc. clock cycles and then if it were to flip again in X number of cycles. This is know as predictive branching.
The vast majority of the time a transistors output is exactly the same as the last cycle. On average a transistor switches once every 7+ cycles iirc. That's a lot of calculations happening for nothing to change, but knowing what will happen when it happens upwards of 20 cycles ahead has been invaluable to the advancement of CPUs. Effectively this is offsetting the latency of actually waiting for the system to perform a task.
Thus, CPUs are good at performing tasks that a) rarely change, b) are always being performed, c) absolutely have to have the result to immediately or everything falls out of place. This is practically the computer definition of resource management.
GPU:
The GPU takes a completely different approach to design. They are massively parallel. They perform the simplest of calculations, but they they can perform trillions of them per second. This is essentially Linear Algebra or "Matrices". In fact they can perform so many of them that they can create mathematically abstract many dimensional models ahead of time with large datasets to estimate what the result of a calculation should be in real time when the given situations are similar enough or enough data has been processed beforehand to estimate unusual circumstances. This also happens to be the fundamental idea behind AI, neural networks, and things using it like DLSS.
GPGPU:
This actually has nothing to do with the design of the hardware. The idea of GPGPU is to perform certain "traditionally" CPU tasks on the GPU. Believe it or not given enough time and specialized coding CPUs and GPUs can both ultimately perform the same operations. Fun fact: you can run the original crysis on a $4k server CPU by itself now, the same can't be said about only a GPU. That said, it is absolutely not efficient to attempt to reverse their true roles. These "traditional" CPU tasks are actually things that a GPU is extremely well suited for (e.g. matrices). A GPU is not going to be effective at performing predictive branching just like a CPU is not going to be effective at DLSS.
Background info:
For quite a long time CPUs were performing matrix math because anyone wanting to use a GPU for general purpose programming had to deal with OpenGL or Direct X, which had arbitrary limits relating to graphical primitives. Scientists and engineers were trying to do this for highly complex physical simulations back in 2000 or so. In the following 8 years or so there was a massive move towards general purpose programming languages like OpenCL and notably CUDA in 2006.
In the same time frame of 7 years or so after that scientists and engineers were rapidly advancing simulations with all this newfound GPU power while video games were just pushing for higher poly counts and resolutions.
Before the start of the 8th gen we really started to hit that diminishing return curve on resolution and poly count front. It actually made a lot of sense for game programmers to start honing in on using well known simulation models to improve interactivity and graphical fidelity.
What better way to do just that than using GPGPU on the very GPU already in your hardware. GPGPU is moreso a testament to how the capabilities and balance of CPUs and GPUs were and are shifting than an actual indication that a GPU is performing a CPU's task. It is literally just software running on the portion of hardware where it is most efficient. A GPU is never going to replace a CPU, and if it does it would deserve some sort of reclassification.
Looking forward:
As hardware continues to progress you can expect the the CPU/GPU balance to continue to follow the progression that the scientists and engineers are demanding now, which is swinging extremely in favor of GPUs, massive datasets, and AI based workloads.
If you can logically connect these two opposite end of the spectrum design philosophies, improve performance across the board, and not break every piece of code written in the last 10 years you have a trillion dollar design on your hands.
We went from ~35 MTr/mm^2 at the 14nm node to ~100 MTr/mm^2 at the 7 nm node. We may eventually create a single design that performs both effectively purely due to the shear number of transistors, and having no other efficient use, but that's a long way off from a design standpoint.
With regards to the physical limit of transistor size I believe there is a well understood path to about ~10 atoms, which would be a 6x reduction from where we are. I also expect a lot of other changes other than just size to take place in that time frame which will improve efficiency in other characteristics. Single atom and subatomic transistors (whatever they may be called) is an entirely different ballpark and I don't know if we would even go that route.