• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.
  • We have made minor adjustments to how the search bar works on ResetEra. You can read about the changes here.

Lukas Taves

Banned
Oct 28, 2017
5,713
Brazil
MS/Sony consoles are AMD, and this is NVIDIA technology (and powered by NVIDIA-specific hardware, namely the tensor cores). We don't know for sure yet, but the consoles quite possibly do not have the hardware needed to provide the boosts discussed for this technology (or actually for a similar, non-NVIDIA technology that AMD has yet to develop/announce).
This is Nvidia, but it's still machine learning and for sure the consoles will support it.

Heck Ms even already ported a similar algorithm from Nvidia to directx ML.

And consoles do have the hardware for it, Ms talked about adding instructions to CU instead of using dedicated cores last year, and on the SX reveal they announced that's what they did with AMD for rdna2.
 
OP
OP
ILikeFeet

ILikeFeet

DF Deet Master
Banned
Oct 25, 2017
61,987
I still don't understand this magics.

I expect next Nintendo console to use this so badly and games to look insane.
my assumption: devs submit super-high resolution images so it's learning information that can't be resolved a 1080p (like that Wolfenstein 16K resolution ground truth image Nvidia shown)
 

SiG

Member
Oct 25, 2017
6,485
This is extremely cool.

I do think people talking about DLSS and RTX on the next Switch are dreaming too high though. You do know that the 2060 alone has more than 10x the power draw than the entire Tegra X1 APU, right? Not even the jump from 14nm to 5nm is enough to cover that difference.
The Xavier line of Tegra chips have built-in tensor cores, so it isn't that far fetched considering Nvidia has been at it with mobile tech for automated cars.
 

Utherellus

Banned
Mar 31, 2020
181
This is revolutionary. If it can become industry-wide standart, PC performance will never be same. You will basically invest in RTX card to get free 150%+ fps and identical(if not better) image quality in supported games.

Megaton indeed. It can change mobile/low/mid/high end segment rules simultaneously.

39521135345892802294.png


72216767207545465433.png


64561269699739697261.png

75779822574017212582.png
 

Xando

Member
Oct 28, 2017
27,314
Thank you, so the "Shame this won't be on consoles." does not apply. it will have the same effect in the sense that developers will use it to run their games at a lower resolution to increase performance and then upscale them using machine learning.
Of course it's applying as having dedicated hardware and just running it on your standard pipeline are two completely different things.
 

Vimto

Member
Oct 29, 2017
3,714
my assumption: devs submit super-high resolution images so it's learning information that can't be resolved a 1080p (like that Wolfenstein 16K resolution ground truth image Nvidia shown)
I still don't understand this magics.

I expect next Nintendo console to use this so badly and games to look insane.

It's because with nativ res, you are still using TAA. but DLSS is also a form of AA so it replaces TAA.

And DLSS performs better than TAA with motion. (much more stable)

This is what I understood anyway.
 

Alucardx23

Member
Nov 8, 2017
4,713
yes, but the code needs to be continuously tied to a back end that keeps it updated. that was the problem with DLSS 1.0, it was on the devs to do the updating and they had no general pool to learn from


aye, as far as I can find, the tensor cores add 1.25mm2 per SM (this is comparing a TU106 with a TU116)

I recommend you to view the video below. With DLSS 1.9 they had to train the neural network per game and this was all done by Nvidia, not the developer. With DLSS 2.0 they are now able to train the neural network in a more general way, this means that what it learns it can be applied to basically any game and it will improve in general over time with more training.

GTC 2020: DLSS 2.0 - Image Reconstruction for Real-time Rendering with Deep Learning

In this talk, Edward Liu from NVIDIA Applied Deep Learning Research delves into the latest research progress on Deep Learning Super Sampling (DLSS), which uses deep learning and the NVIDIA Tensor Cores to reconstruct super sampled frames in real-time. He discusses and demonstrates why scaling...
 

Lukas Taves

Banned
Oct 28, 2017
5,713
Brazil
Extra work from the shaders is exactly the problem though, with no tensor cores. So you won't be seeing anything like the performance increases of DLSS, as it'll be taking away from rendering performance just to do the AI upscaling. So kind of misses the point.

The 2080Ti has 113.8 Tensor TFLOPS btw.
You are looking this the wrong way. Tensor cores allows the workloads to be independent from the shaders but they take significant die space which takes away from overall processing of the gpu.

Even for rtx that was the issue that made the cards look unappealing performance wise when they first launched, and on consoles you don't have that luxury with limited space.

For consoles having a CU capable of doing all kinds of workloads is much better than using dedicated cores (and I think in general as well, because then the extent of each feature is limited by the game and devs have more flexibility)
 

Vimto

Member
Oct 29, 2017
3,714
The tensor cores are just hardware support to deal with int matrix math. Both consoles have hardware for that even if not dedicated cores.
Ok, they still won't get dlss2.0 since consoles are using amd tech.

It remains to be seen if AMD have an equivalent tech, but they have not announced anything so far.
 

lukeskymac

Banned
Oct 30, 2017
992
I don't think anyone expect Switch 2 to be the equivalent of a RTX 2060 in performance or have raytracing etc but it could have the hardware to do DLSS at its supported resolutions, giving you better image quality and performance overall.

Nvidia most likely has a pile of patents on this which might make it difficult to bring to consoles in an alternate version. I don't know if there is possibility of MS/Sony/AMD licensing it and having Nvidia port it to work on the console hardware.
The Xavier line of Tegra chips have built-in tensor cores, so it isn't that far fetched considering Nvidia has been at it with mobile tech for automated cars.
Yeah, I could see DLSS happening if they go all in on it and dedicate a much larger relative area of the die to tensor cores compared to what we have on Turing. Apparently the tensor cores on the 2060 sum up to ~22mm2, which is 18% of the die area for the X1, so it's feasible.
 

eebster

Banned
Nov 2, 2017
1,596
so i finished watching the video now.
wow DLSS 2.0 is insane.

But it begs the question, what is the point of high end GPUs anymore when you can hit 4K 60 DLSS on ultra with a midrange card (i assume that the 2070/2070 super can roughly do that with a 1080p internal resolution)? I guess that high end card could be marketed for 120+fps.

the quote that you linked show no mention of hardware acceleration, you got confused with the "leverages unprecedented hardware performance" but its just talking about the performance of the console. it supports DirectML, it doesnt have specified hardware to accelerate it, basically like DLSS 1.9 the ML calculations are done via the shader cores. RTX cards dedicate a significant portion of their GPU die for the tensor cores.

Widespread implementation of DLSS will allow devs to use those extra fps elsewhere. The tech is still super new, if in the future both AMD and Nvidia cards have DLSS and all games support it, devs could go ham on the graphics so that only the highest end card with DLSS can do 60fps.
 

-Le Monde-

Avenger
Dec 8, 2017
12,613
This is revolutionary. If it can become industry-wide standart, PC performance will never be same. You will basically invest in RTX card to get free 150%+ fps and identical(if not better) image quality in supported games.

Megaton indeed. It can change mobile/low/mid/high end segment rules simultaneously.
:o

I hope this tech makes its way to the next nintendo switch system.
 

Alexandros

Member
Oct 26, 2017
17,811
If this becomes standard in games it will give Nvidia a massive advantage against next-gen consoles and AMD graphics cards.
 

lukeskymac

Banned
Oct 30, 2017
992
For consoles having a CU capable of doing all kinds of workloads is much better than using dedicated cores (and I think in general as well, because then the extent of each feature is limited by the game and devs have more flexibility)

Not when you know you're always using said dedicated cores (which is the case here). Dedicated hardware always in use is always superior.
 

Utherellus

Banned
Mar 31, 2020
181
Nintendo can benefit with this very much indeed. Imagine handheld device running BOTW at 1440p with help of DLSS. Future is almost now.
 

Deleted member 11276

Account closed at user request
Banned
Oct 27, 2017
3,223
You are looking this the wrong way. Tensor cores allows the workloads to be independent from the shaders but they take significant die space which takes away from overall processing of the gpu.

Even for rtx that was the issue that made the cards look unappealing performance wise when they first launched, and on consoles you don't have that luxury with limited space.

For consoles having a CU capable of doing all kinds of workloads is much better than using dedicated cores (and I think in general as well, because then the extent of each feature is limited by the game and devs have more flexibility)

You're forgetting that the tensor cores are much, much more efficient at doing these kinds of ML workloads than shader cores. The amount of taken up die space is not relevant in this case. And they don't take up as much die space as you think. I believe the number was 7%, could be wrong though.
 

GrrImAFridge

ONE THOUSAND DOLLARYDOOS
Member
Oct 25, 2017
9,674
Western Australia
OK, now you are being dishonest here. Why are you ignoring where it says "So we added special hardware support for this specific scenario."? If I ask you if the XSX has special hardware to accelerate machine learning code, is your answer no it doesn't?

Have you read the full quote yourself? That's referring to the shader cores being able to run 8-bit and 4-bit operations at that level of precision, which is why the TOP figures are what they are.

They added the hardware on the shader cores. Not only that they have said since last year directx ML makes use of any ML hardware available including the tensor cores.

Again, that's not what the quote is saying. No hardware was added; the shader cores were tweaked to run low-precision workloads more efficiently, not unlike "Rapid Packed Math" (AMD's buzzword for using a single FP32 operation to run two FP16 operations).
 

Alucardx23

Member
Nov 8, 2017
4,713
Of course it's applying as having dedicated hardware and just running it on your standard pipeline are two completely different things.

Read the quote below and tell me that the XSX will have special hardware to accelerate machine learning code or is it just running on the standard pipeline. This will be my last answer to you on this one. I remind you that this started with you saying "Shame this won't be on consoles.". I have provided proof that at least with the XSX it will have special hardware to accelerate the same type of machine learning code that allows machine learning upscaling and this will allow developers to use it on consoles. Please don't move the goal post to "Yes, but nvidia GPUs will have better performance", the second you say "yes", then it invalidates "Shame this won't be on consoles.". It will be on consoles using specialized hardware to run it.

"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

www.eurogamer.net

Inside Xbox Series X: the full specs

This is it. After months of teaser trailers, blog posts and even the occasional leak, we can finally reveal firm, hard …
 

Dictator

Digital Foundry
Verified
Oct 26, 2017
4,931
Berlin, 'SCHLAND
The tensor cores are just hardware support to deal with int matrix math. Both consoles have hardware for that even if not dedicated cores.
We still do not know if PS5 supports increased rate int8.
XSX does.
XSX is like 49 int8 TOPs - RTX 2080 Ti is 220 at ca. 1500mhz (most 2080 Tis run at 1800-1950 mhz in real life though).
 
Last edited:

Zojirushi

Member
Oct 26, 2017
3,297
Ok, they still won't get dlss2.0 since consoles are using amd tech.

It remains to be seen if AMD have an equivalent tech, but they have not announced anything so far.

Don't Sony already use their own reconstruction thingy on PS4Pro to upscale to 4k? Maybe they'll just iterate & improve on that?
 

Zedark

Member
Oct 25, 2017
14,719
The Netherlands
Thank you, so the "Shame this won't be on consoles." does not apply. it will have the same effect in the sense that developers will use it to run their games at a lower resolution to increase performance and then upscale them using machine learning.
The difference between having separate hardware (NVIDIA's tensor cores) and having hardware support inside the shader cores (XSX approach) is that in the former scenario, you don't have to compromise on GPU compute power for native rendering in order to support a DLSS-like feature, while in XSX case, you need to share resources between native rendering and the upscaling AI algorithm. For example: the RTX2060 has 100 TOPS of tensor core performance. If we assume that we need 25 TOPS to do upscaling from 1080p to 4K, then it can do that on the tensor cores, and use its full 6 TFLOPS of ALU power to render the native 1080p image. The XSX, on the other hand, needs to apply 25 TOPS of shader compute power out of its 49 TOPS total, or roughly half of its GPU, to the DLSS-like algorithm, leaving only half of its GPU for native 1080p rendering. If it takes half the GPU to perform the DLSS algorithm, you lose quite a bit of the potential benefit.

As Elfotografoalocado mentioned, we don't know if the Tensor cores need anywhere near 50 TOPS for DLSS, and we don't know if the tensor cores can run at full performance at the same time as the ALUs and RT cores are running at full performance. If they can, then the RTX support for DLSS is inevitably stronger than what XSX (and PS5) have. If not, then it is probably a better strategy still, but XSX would see better relative performance.

As you mention, you might just be discussing whether it is supported at all. It is, and it's good of MS to have included support for INT4 and INT8. But it's quite probable that the benefits are not nearly as pronounced as they will be on the NVIDIA GPUs due to there not being dedicated hardware for it.
aye, as far as I can find, the tensor cores add 1.25mm2 per SM (this is comparing a TU106 with a TU116)
Oh, very interesting! Looks like RT and Tensor combined are only about 20% of the GPU die size, so that in itself shouldn't be the major issue with including that hardware I think.
 
Last edited:

Black_Stride

Avenger
Oct 28, 2017
7,388
for the 10th time, this will NOT be in any next gen console.

It remains to be seen if they have something similar, but nothing so far. And without dedicated h/w? Doubt it.

The XSX has Int 8 and Int 4 acceleration built it.
And MS have been testing and working on DirectML Super Resolution for years now.
Its safe to assume with all their data centers and super computers they could have an algorithm that rivals DLSS.
Very strong chance they built the XSX planning on utilizing those accelerators to do Super Resolution stuff as well as other ML techniques.

And also the is a strong chance the Super Switch will have a few tensor cores inside it and will be able to do DLSS.
So saying "Will not be in ANY next gen consoles" is jumping ahead a bit.


The RDNA 2 architecture used in Series X does not have tensor core equivalents, but Microsoft and AMD have come up with a novel, efficient solution based on the standard shader cores. With over 12 teraflops of FP32 compute, RDNA 2 also allows for double that with FP16 (yes, rapid-packed math is back). However, machine learning workloads often use much lower precision than that, so the RDNA 2 shaders were adapted still further.

"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms, so we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."


- Andrew Goossen
 

Vimto

Member
Oct 29, 2017
3,714
Don't Sony already use their own reconstruction thingy on PS4Pro to upscale to 4k? Maybe they'll just iterate & improve on that?

Sure they have, But what Nvidia have here is vastly superior to all other reconstruction techniques.

It produces more accurate picture, and gives more performance.
 

Alucardx23

Member
Nov 8, 2017
4,713
Have you read the full quote yourself? That's referring to the shader cores being able to run 8-bit and 4-bit operations at that level of precision, which is why the TOP figures are what they are.

What? I ask you if you read the whole article. They are very clearly talking about special hardware to accelerate machine learning code.

1- "Machine learning is a feature we've discussed in the past, most notably with Nvidia's Turing architecture and the firm's DLSS AI upscaling.

2- "Microsoft and AMD have come up with a novel, efficient solution based on the standard shader cores.

3- "However, machine learning workloads often use much lower precision than that, so the RDNA 2 shaders were adapted still further.

4- "We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms,"
says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations."
 

Utherellus

Banned
Mar 31, 2020
181
We still do not know if PS5 supports increased rate int8.
XSX does.
XSX is like 49 int8 TOPs - RTX 2080 Ti is 212 at 1500mhz (most 2080 Tis run at 1800-1950 mhz in real life though).

Hello Alex. I wonder one thing: Can DLSS be implemented in Nvidia Control Panel in future to use for every game?

I just fear that DLSS will fade away just like Gameworks due to lack of Dev support...
 

Dictator

Digital Foundry
Verified
Oct 26, 2017
4,931
Berlin, 'SCHLAND
The difference between having separate hardware (NVIDIA's tensor cores) and having hardware support inside the shader cores (XSX approach) is that in the former scenario, you don't have to compromise on GPU compute power for native rendering in order to support a DLSS-like feature, while in XSX case, you need to share resources between native rendering and the upscaling AI algorithm. For example: the RTX2060 has 57 TOPS of tensor core performance. If we assume that we need 50 TOPS to do upscaling from 1080p to 4K, then it can do that on the tensor cores, and use its full 6 TFLOPS of ALU power to render the native 1080p image. The XSX, on the other hand, needs to apply 50 TOPS of shader compute power out of its 97 TOPS total, or roughly half of its GPU, to the DLSS-like algorithm, leaving only half of its GPU for native 1080p rendering. If it takes half the GPU to perform the DLSS algorithm, you lose quite a bit of the potential benefit.

As Elfotografoalocado mentioned, we don't know if the Tensor cores need anywhere near 50 TOPS for DLSS, and we don't know if the tensor cores can run at full performance at the same time as the ALUs and RT cores are running at full performance. If they can, then the RTX support for DLSS is inevitably stronger than what XSX (and PS5) have. If not, then it is probably a better strategy still, but XSX would see better relative performance.

Oh, very interesting! Looks like RT and Tensor combined are only about 20% of the GPU die size, so that in itself shouldn't be the major issue with including that hardware I think.
You have some things incorrect here. RTX 2060 is around 100 TOPs - the XSX is 49 for int8. int4 TOPs is double for both - so around 200 for RTX 2060 and 97 for XSX.
Hello Alex. I wonder one thing: Can DLSS be implemented in Nvidia Control Panel in future to use for every game?

No it requires the developer to implement it. It requires access to certain screen information that is not really accessed at the driver level.
 

Altair

Member
Jan 11, 2018
7,901
Nvidia ahead of the curve as usual. Hope all devs implement this in future games.
 
Apr 4, 2019
524
Extra work from the shaders is exactly the problem though, with no tensor cores. So you won't be seeing anything like the performance increases of DLSS, as it'll be taking away from rendering performance just to do the AI upscaling. So kind of misses the point.

The 2080Ti has 113.8 Tensor TFLOPS btw.

It happens as an extra step within the rendering of the frame (just before the final post-processing) - it's not done async - so yes/no as the idea is to reduce the main rendering load significantly.

Even so, Anaconda has roughly half the amount of ALU compared to the 2060's tensor core throughput for this sort of thing.

edit: I'm not sure if nV has revealed which precision mode they've used for DLSS, but perhaps MS could implement their own iteration at a lower precision than DLSS to increase theoretical math throughput, but at a lower quality.
 
Last edited:

gofreak

Member
Oct 26, 2017
7,736
This is Nvidia, but it's still machine learning and for sure the consoles will support it.

Heck Ms even already ported a similar algorithm from Nvidia to directx ML.

And consoles do have the hardware for it, Ms talked about adding instructions to CU instead of using dedicated cores last year, and on the SX reveal they announced that's what they did with AMD for rdna2.

The consoles may have mixed precision support in their shader ALUs, but DLSS here is running on tensor cores. So on the consoles, without further hardware, they'll be trading native rendering performance against reconstruction performance/time. Whether you wind up with a net benefit to performance as in DLSS would be an open question. It certainly wouldn't be as beneficial.

I wonder also how useful lower precision support in shader alus will be for image reconstruction, below fp16 anyway, before you start introducing new problems. Not sure what the answer is there.

I think it's maybe more possible we will see different reconstruction techniques on the consoles that are lighter at processing time than a DLSS-alike would be running on shader ALUs, but better than what we had before on consoles. And maybe even using some neural network processing, but maybe trading generality (as in DLSS) for lower overhead if running on shaders.
 

GrrImAFridge

ONE THOUSAND DOLLARYDOOS
Member
Oct 25, 2017
9,674
Western Australia
What? I ask you if you read the whole article. They are very clearly talking about special hardware to accelerate machine learning code.

1- "Machine learning is a feature we've discussed in the past, most notably with Nvidia's Turing architecture and the firm's DLSS AI upscaling.

2- "Microsoft and AMD have come up with a novel, efficient solution based on the standard shader cores.

3- "However, machine learning workloads often use much lower precision than that, so the RDNA 2 shaders were adapted still further.

4- "We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms,"
says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations."

No, they're very clearly talking about DirectML running on shader cores and how they were adapted to run low-precision workloads more efficiently. You're zeroing in on "special hardware support" and ignoring points 2 and 3, both of which explicitly state that DirectML relies on shader cores.

Again, nobody is disputing that the XSX is capable of DirectML-based reconstruction, but tweaked shader cores doesn't equal dedicated hardware, as those cores are still general-purpose -- there are no cores whose express purpose is to run DirectML code.
 

Alucardx23

Member
Nov 8, 2017
4,713
The difference between having separate hardware (NVIDIA's tensor cores) and having hardware support inside the shader cores (XSX approach) is that in the former scenario, you don't have to compromise on GPU compute power for native rendering in order to support a DLSS-like feature, while in XSX case, you need to share resources between native rendering and the upscaling AI algorithm. For example: the RTX2060 has 57 TOPS of tensor core performance. If we assume that we need 50 TOPS to do upscaling from 1080p to 4K, then it can do that on the tensor cores, and use its full 6 TFLOPS of ALU power to render the native 1080p image. The XSX, on the other hand, needs to apply 50 TOPS of shader compute power out of its 97 TOPS total, or roughly half of its GPU, to the DLSS-like algorithm, leaving only half of its GPU for native 1080p rendering. If it takes half the GPU to perform the DLSS algorithm, you lose quite a bit of the potential benefit.

I know the difference and I have even repeated several times that it is likely that this generation RTX GPU will have higher performance than the XSX running the same machine learning code. The point was to make clear that the XSX does have specialized hardware to run machine learning code and this means that Microsoft is turning this into a usable option for developers. For example, let's say that the XSX won't give good results with upscaling a 540P image to 1080P due to the low sample amount, but it will be able to do a good job upscaling a 1440P image to 4K, enabling developers to increase performance relative to running the same game natively at 4K. We are yet to see what will be possible, and if the improvement from DLSS 1.9 to 2.0 on the same hardware has shown anything, is that there is room to grow on the software side as well.
 

Zedark

Member
Oct 25, 2017
14,719
The Netherlands
You have some things incorrect here. RTX 2060 is around 100 TOPs - the XSX is 49 for int8. int4 TOPs is double for both - so around 200 for RTX 2060 and 97 for XSX.


No it requires the developer to implement it. It requires access to certain screen information that is not really accessed at the driver level.
Oh I see, I accidentally found the founders edition, and used the half-precision tensor cores number by accident. Will update my post to reflect it!
 
Oct 26, 2017
6,572
This could potentially solve any and all Image Quality issues in a Switch 2. Insane how it resolves so much accurate detail from such low res.
 

JaggedSac

Member
Oct 25, 2017
2,988
Burbs of Atlanta
We still do not know if PS5 supports increased rate int8.
XSX does.
XSX is like 49 int8 TOPs - RTX 2080 Ti is 220 at ca. 1500mhz (most 2080 Tis run at 1800-1950 mhz in real life though).

In your opinion, on the XSX(or the rumored Lockhart), would there be a net benefit of compute resource usage to render at a lower resolution and perform AI upscaling, as opposed to rendering native(or use other upscaling tech)? Even with the int8, int4 changes, would there be a benefit to using it for this? As in 1440p -> 4k with AI upscaling vs 4k native vs 1440p -> 4k other upscaling tech, what do you think would perform the best?
 

Edgar

User requested ban
Banned
Oct 29, 2017
7,180
It blows my mind how much they can do with a 540p image. You can barely make out the lampposts in the non-DLSS one
I like how TAA is is still one of the worst temporals AAs out there for majority of games. lol. Even nvdia had their own TXAA , that was short lived