The biggest reason is that unlike the PlayStation's and Xbox 360 before it, the X1 /XS don't use a purpose built, dedicated UI layer for the console, and instead uses a **very* general purpose UI layer (UWP XAML) that is optimised for general UI development and doesn't - and won't - have any specific low level, case by case optimisation.
It means its less efficient overall in terms of performance and RAM usage, but it also means its ridiculously easier and quicker for Microsoft to iterate and change their UI because they're using a mature and fully fledged UI stack with a lot of tooling, documentation and flexibility, and their developers have a lotmof experience with it. (The Windows 10 shell is even built with UWP XAML these days)
And every layer / element on the UI gets its own render texture rather than being flattened on top of each other. So even a simple game tile has a BGRA background image texture, an A8 texture for the text label and a texture BGRA texture for a colour behind the label that get placed on top of each each other, rather than creating a single composite texture.
Which makes sense for a general purpose UI system because we don't know who needs to be redrawn when, and don't want the cost of redrawing entire cascading trees constantly, but means when you increase resolution, you can get a lot of potential memory increase - but not even close to 1GB from XAML elements alone. The biggest increase is probably literally from the resolution of the bitmap image assets like game art and full screen background art - and the of course, all the apps would also render in 4K with their own increased memory usage, taking up parts of the allocated Windows side of the memory pool
It is, however, literally a case of them flipping a switch to get it to happen for the most part. XAML was designed even in the old WPF days to scale perfectly with DPI/ resolution, so if they told it to render at 4K right now, it would, with very little work.