The why: along with 360 video comes 360 sound. Ambisonic, rather than surround, as it has height as well as direction – it is a sphere. Not much point unless you have speakers above and below your head – unlikely to become popular in the home on that account. More likely it will be consumed on headphones as a binaural image – there are many technical problems with that, to which I’ll return.
In VR the sound image should track/rotate with the visual image. Hasn’t been a concern until VR goggles came back into style – speakers remain fixed as your head turns, but now the image has to compensate for head movements. Some artistic issues here about composing for speakers versus phones – obviously it’s easier to just compose the one image (e.g. strings always to your front right) than to have it rotate around your head.
As with all media there’s a confused plethora of formats.
Orders: the number of divisions into which the sphere is cut. More divisions means more precision – you get better imaging if you add ‘higher’ order subdivisions. Each division needs its own sound channel.
- 0 order – mono – requires one channel
- 1st order – front-back, left-right, up-down – requires a total of four channels
- 2nd order – in between the 1st order – requires nine
- 3rd order – in between the 2nd order – requires a total 16 channels
… and so on
Most DAWs cannot handle more than 5.1. on a track: Cubase, Logic no good. Reaper, Premiere are good. Vegas?
The next problem is how these channels are arranged and of course different people have mucked this up. It means that you have to choose one version and stick with it, or spend your life translating.
Traditional B Format: a 1st order, four channel version that’s a standard. W+XYZ, where W is the signal strength and XYZ are a right handed coordinate system. This becomes complex as you start adding orders.
Furse-Malham continues to add right handed coordinates:
ACN is a cleaner format that numbers them by a sorting formula, therefore is extensible:
The next problem is normalisation – choose from maxN, where each signal is 0-1 or SN3D, where is below the volume of the mono signal, or N3D, which is universally louder.
Common Versions and implementations.
- AMB is 1st order, Furse-Malham, maxN (according to Wiki) but also higher order (according to http://www.ambisonic.net/fileformats.html)
- AmbiX is n-order, ACN, SN3D http://www.matthiaskronlachner.com/?p=2015
- Blue Ripple “Higher Order Ambisonics” is 3rd Order, Furse-Malham, SN3D
- Harpex uses B Format, and skeptical of HOA channels. It uses phase to tease out direction from the standard 4 channels. Unfortunately quite expensive. http://harpex.net/about.html
- Ambisonic Toolkit B-Format http://www.ambisonictoolkit.net/
UHJ is a horizontal only muxed version of B Format. https://en.wikipedia.org/wiki/Ambisonic_UHJ_format
YouTube uses 1st order, ACN, SN3D and is based on AmbiX. This tends to indicate it will become the standard. https://support.google.com/youtube/answer/6395969
The Google tool is Jump. https://vr.google.com/jump/
Oculus uses 1st order, ACN, SN3D based on AmbiX. Again, a good sign that this will be a standard. https://developer3.oculus.com/documentation/audiosdk/latest/concepts/audiosdk-features/#audiosdk-features-supported
Headphones: the ambisonic image is created by binaural encoding, where sounds arrive out of phase at each ear. But different shaped heads get different results, therefore it’s unreliable. There is a database of heads available in AES69 format which are usually averaged to approximation. Google assumes that the head is symmetrical (probably true of Oculus as well). BR Rapture can load these files.
The home of the AES69 format is here: https://www.sofaconventions.org/mediawiki/index.php/Main_Page
Microphones: the Zoom H2 is the easiest solution but has no height information. This is the model I am using alongside my spherical camera.
Binaural Microphones: https://www.roland.com/us/products/cs-10em/
Some sources of expertise: http://www.brucewiggins.co.uk