This.
It's annoying that devs are wasting their GPU budget on RTX.
There's definitely time vs performance trade-off here. You can let your designers take the time to bake all the GI lighting into an environment and painstakingly produce and align cube-maps to achieve great reflections a la TLoU2, or you can adopt real-time ray-tracing implementations and probably suffer a drop in performance. Each approach has its pros and cons, but leaning on ray-tracing definitely leads to shorter development time up front and possibly more time optimizing performance on the tail end of production.
There is no inherent performance cost to ray-tracing. You can use it in such ways which won't hit performance more than what it will substitute visually.
Do you have any examples?
I wonder if raytracing will eventually become the new "pop-in" on console versions. Like you move 5ft away from something and it switches from raytracing to a cube map or something.
You might be interested in checking out Cryengine's software-based RT implementation. When RT-enabled objects are close to the viewer/camera, ray-tracing implementations of popular effects such as reflections are used; as the camera moves farther away, other less accurate techniques like SSR are used. Imagine it like LoD transitions except for lighting effects.
The DLSS2.0 process occurs in series with the rest of the rendering pipeline, not in parallel. The dedicated hardware doesnt allow you to simultaneously use other resources elsewhere. The GPU process the scene, the tensor cores do the upscaling work, then the GPU does its post work.
MS' solution will have the same work flow, but will a bit slower.
The point is that you have dedicated hardware to offload the work to. Even if you have to render everything in a series, there's plenty of pipelining solutions (e.g. work queues) that you can make use of that'll take advantage of the extra shader core budget.
Like anything in real-time rendering, it's all about compromise to reach what appears to be a better visual output, modern day ray-tracing is no exception.
Ray-tracing as we see it in games is true ray-tracing like what we see in path-traced renderers like V-Ray, it's calculating lights hitting surfaces according to how it happens in the real world based off of physical values, but even my RTX 2080 Ti doesn't have the power to do this at an unlimited amount of samples or bounce rate.
Typically ray-traced reflections for a real-time renderer like Unreal Engine 4 would only do a ray tracing calculation for a pixel if the material of the surface had a roughness of something like 0.7 or higher, all pixels below that would fall over to screen space reflections or prebaked reflection probes. Another way to improve speed would be to test if the pixel reflection is present in the screen space reflection map already calculated, using the SSR result instead of calculating an RT reflection for that pixel.
Now let's presume that we need to do an RT calculation for that pixel, how many reflection bounces should we do if the surface we hit in the reflection also has a roughness value higher than 0.7? The more bounces we need to do, the more compute power we need.
Now it's also really computationally heavy to do this for every pixel on screen, so what if we only calculate a limited amount of RT pixels every frame and temporally reconstruct them over 3-5 frames to make it even faster at the cost of ghosting artifacts if the viewport changes significantly?
However developers chose to implement RT features, it will always be about compromising until they reach the best performance to visual output ratio, upscaling algorithms like DLSS is just another way of compromising, RT can be implemented in many many ways.
This post details quite a few the things to take into consideration when balancing performance vs quality with regards to ray-tracing.
Personally, I would be more inclined to maximize the number of bounces to produce more accurate lighting. To try to keep performance cost low, you would want to perform this at a much lower resolution than native 4K and then make use of an image-upscaling solution like DLSS.
If the cost of bounces is too expensive, then you would probably want to look at image-denoising solutions to remedy the loss in accuracy of lighting data gathered from less bounces. The suggestion of using a temporal approach is pretty neat to think about, too.
Another approach is to try to minimize the cost of testing for ray-triangle intersections; maybe there are ways to further minimize the number of objects to take into account for on-the-fly construction of BVHs. Tree-traversal is incredibly fast, but there may be faster ways of pruning branches of the tree from the get-go.