Ok I have some free time on lunchbreak and maybe we can clear a few things up.
Edit: an error wiped out a 25 minute response here, so my reply is going to be more brief than I would prefer.
First- let's clear up some terms:
There's a range of complexity for spatial audio:
Stereo audio is the most basic spatial audio. It's recorded in discrete left and right channels. In headphones, you'd be able to easily place sounds on a two-dimensional axis, from left to right.
Surround sound audio — in most cases — relies on engineers to mix multiple audio channels (e.g. 5.1, 7.1) for playback on numerous speakers that literally surround an audience. You've probably heard surround sound in movie theaters, where it's presented by companies like DTS, THX and Dolby.
Binaural audio delivers a fully 360-degree soundscape through a specially-encoded stereo file that has to be experienced through headphones. It models the way sound reflects around the head and within the folds of the ear. In fact, it is often recorded with a microphone
that mimics the size and shape of a human head! As
demonstrated in this video, you can hear in every direction, but the audio is
not responsive to user input — if you move your head, the audio doesn't change accordingly. The industry refers to this as "head-locked" audio.
Ambisonics or 3D audio delivers a fully 360-degree soundscape that
is responsive to a visual field. When you move your head in one direction or another, the audio changes to reflect that movement. This is the type of spatial audio we're most interested in experimenting with as part of the J360 grant.
A team at NPR is experimenting with immersive video and audio — and has tips on recording, editing, building a rig and more.
training.npr.org
In terms of spatial audio complexity- Stereo > Surround > Binaural > Ambisonics. This will be important later.
That's not how things work.
Sound devices operate on large numbers of channels internally. An old Sound Blaster Live! from the '90s was using 32 channels of sound. The Audigy upgraded that to 64 channels, and the X-Fi supported 128.
That is basically the limit for how many sounds can be played, processed, and mixed, at once.
Why are we talking about how old sound technology worked in the 90s? We already know direct from Cerny that the Tempest Engine supports hundreds of simultaneous objects that can be played at once.
Sony's plans for 3D audio are expansive and ambitious - unprecedented, even. Put simply, PlayStation 5 sees the platform holder pushing surround significantly beyond anything we've seen in the gaming space before, comprehensively out-speccing Dolby Atmos in the process by theoretically processing hundreds of discrete sound sources in 3D space, not just the 32 in the Atmos spec.
Immediately after that quote from Cerny, Dolby put out PR to clarify that Atmos is also capable of handling hundreds of discrete audio sources simultaneously, though there are reasons why you would not do this.
What you described is actually how Atmos/DTS:X work.
Those do output the channels to your receiver along with positional metadata, which is then processed to the appropriate speaker layout; whether that's 5.1.2, 7.2.4 or something else.
But that doesn't mean games are limited to 32 channels of audio. The XSX could process hundreds of channels which are then mixed down to a 32-channel output for your Atmos-capable receiver.
Ok, I'm aware this is how Atmos and DTS:X work (as does Auro 3D and a few others) and Tempest is a competing format to these. Sony has been explicit that the PS5 will not support Dolby Atmos, because Tempest serves the same function. Any object based surround solution is going to be processed and output by Tempest.
I also never said anything about 32 channels of Audio. CERNY did, but his quote was dealing with the current limitations of Atmos as it exists on Xbox and Windows. The XSX at present *cannot* process hundreds of objects currently as Tempest can, because the tools Dolby provides currently only support up to 32 objects. In THEORY this limit is higher, but Dolby has not yet made these tools available. They likely will eventually in the future, but at present? It's not possible.
Sony are not supporting additional discrete channels for things like height with Tempest audio.
They are planning to use virtualized channels instead. You won't get anything more than a 7.1 LPCM output.
Ok- to be as clear as possible, Object based audio doesn't use discrete channels as Surround Sound does. It is all virtualized.
Object-based audio is different from older surround systems, which send audio signals through a set number of channels to the speakers that are positioned at particular points in a room. It's also distinct from simulated object-based audio formats like DTS Virtual:X, which use just a few speakers to give listeners the sense that sounds are coming from different directions.
3D audio formats instead create discrete audio objects, drawing on as many surround sound speakers as your AVR can support to immerse you in rich sound from all directions, including overhead. The result is a highly flexible approach to a home theater layout that provides exact positioning of sounds and a far deeper level of detail than previously possible.
Object Based 3D audio will use as many speakers as your receiver can support to create the 3D sound field. It ALL virtualized and the concept of discrete channels of 5, 6, or 7 speakers is no longer valid. You have two dozen speakers, it can support it. *Atmos* requires speakers to address height and create these sound fields but this isn't required. DTS:X as well as Tempest can create these fields without them.
Our obsession to build best-in-class audio products all starts in the home and is forever aligned to exceed your unrelenting expectations. Shop here.
www.definitivetechnology.com
So moving on, It's important to note that the PS5 is using Ambisonics in creating its 3D audio:
The Tempest engine is also compatible with Ambisonics, which is effectively a virtual speaker system which maps on to physical speakers. An enhanced feeling of presence is generated because any given sound can be rendered at one of 36 volume levels per speaker and it is likely to be represented at some level on all speakers. Discrete audio tends to 'lock' to physical speakers and may not be represented at all on some of them. Ambisonics is available on PlayStation 4 and PSVR right now, but with fewer virtual speakers, so there's already a big upgrade in precision via the Tempest engine - and it can be matched with Sony's more precise localisation too.
And Ambisonics defined:
Ambisonic technology is a method to render 3D sound fields in a spherical format around a particular point in space. It is conceptually similar to 360 video except the entire spherical sound field is audible and responds to changes in head rotation. There are many ways of rendering to an Ambisonic field, but all of them rely on decoding to a binaural stereo output to allow the user to perceive the spatial audio effect over a normal pair of headphones.
Ambisonic audio itself can be of n- orders comprising of various channels. More channels results in higher spatial quality, although there is a limit to the perceived difference in sound quality as one goes beyond 3rd order Ambisonics (16 channels of audio). Regardless of the number of channels used for encoding the original signal, the decoded binaural audio output will always be to two channels. As the listener moves their head the content of the decoded output stream shifts and changes accordingly, providing a 3D spatial effect.
So Ambisonic audio- which the PS5 is using- uses an undefined (N-orders) number of digital sound channels, with sound quality increasing with the number of channels used, and once decoded by a receiver that can handle it, decodes to a binaural stream. This is, once again- NOT stereo. Important to note though, that this is not a technical limit. Sony has already committed to a solution that creates an object oriented 3D soundfield using 5.1, 7.1, or higher but this will not come at launch due to the complexity of the software alorithm involved:
Once we're satisfied with our solution for these two channel systems we will turn to the issue of 5.1 and 7.1 systems," adds Cerny. "For now, though the 5.1 and 7.1 channel systems get a solution that approximates what we have now on PS4, which is to say the locations of the sound objects determine to what degree their sounds come out of each speaker. Note that 5.1 and 7.1 channel support is going to have its own special issues, in my talk I mentioned that with two channel systems the left ear can hear the right speaker and vice versa - it's even more complex with six or eight channels! Also note that if a developer is interested in using the Tempest engine power to support six or eight channels, game code is aware of the speaker setup so bespoke support is quite possible."
So no- your TOSLINK cable is not capable of supporting this kind of solution. It is far too bandwidth intensive, because Tempest and the object oriented 3D audio solutions like it need a high bandwidth connection capable of handling a ton of channels before they can decode that into a binaural sound field, EVEN IF that's just going to headphones. And eventually we'll be looking at decoding to not just a binaural output, but 8 or more speakers if the dev wants it. There is no hard limit to the amount of speakers it can support, because it's an adaptable, virtualized solution.