Conversion from Ambisonics to TBE format

(and back)

This is a new version of this web page, rewritten after discovering a severe bug in current implementation of the TBE format.

The old version is here for archiving.

Ambisonics is a widely employed format for capturing, transmitting and decoding spatial audio, employing 1, 4, 9,16 or 25 channels. The Two Big Ears TBE format is an 8-channels format employed by the Facebook's Spatial Workstation and being part of the "spatial audio" 8+2 channels available for Facebook's 360 videos.
As it will be shown here below, TBE is basically 2nd-order Ambisonics, where two of the nine channels have been (wrongly) mixed together, for reducing the number of channels to 8.

This web page provides information about the conversion between High Order Ambisonics (typically 3rd order, 16 channels) and TBE format, allowing users of the excellent Ambisonics suites of plugins currently available such as Ambix by Matthias Kronlachner or O3A by Richard Furse, to convert their mixes for creating soundtracks for the Facebook platform with minimal loss of spatial information, and circumvengting the severe bug affecting Facebook's Spatialiser module.

Some info is also provided on techniques employed for converting back the TBE signals to horizontal-only 2nd-order Ambisonics, allowing for monitoring of the soundtrack of a Facebook 360 video employing a standard Ambisonics decoder feeding a loudspeaker rig, instead of employing headphones.

Ambisonics signals

An Ambisonics multichannel sound track is not based on the concept of "position of the loudspeakers", as in a stereo (Left-Right) or "surround 5.1" (Left, Right, Center, Sub, Left Surround, Right Surround) formats.

Instead, the complete spatial information is represented by a number of "spherical harmonics" signals, at increasing orders. The number of signals for each order is increasing, and the corresponding polar patterns of the equivalent "virtual microphones" are shown here:

An Ambisonics stream is said to be "of order n" when it contains all the signal of orders 0 to n. For example, an Ambisonics signal of order 3 (the most widely used, nowadays) contains 16 channels: 1 of order 0, 3 of order 1, 5 of order 2 and 7 of order 3. Please note that, following the Ambix convention, numbering of Ambisonics channels starts from 0 instead of 1.

Recording Ambisonics signals

An ambisonics stream of order 3 can be easily recorded, employing a suitable microphone system, such as the Eigenmike(TM) or other microphone arrays of different shape. Here you can see an Eigenmike and the RAI-AIDA cylindrical microphone array built by dismounting an Eigenmike and remounting capsules and electronics inside a cylindrical body, for providing increased accuracy in azimuthal information:

Image result for eigenmike

Eigenmike (Left) and RAI-AIDA Cylindrical Microphone Array

Creating synthetic Ambisonics signals

A 16-channels Ambisonics stream (3rd order) can also be created synthetically, starting form a mono track and employing a software tool (typically a VST plugin) known as an "encoder" or a "panner".

Two excellent and free tools are widely used nowadays, which are included in the plugin suites named Ambix and O3A:

Recently O3A was updated, including a "graphical" panner, which allows to use a background equirectangular image for showing the position of the sound source being panned around:

Downstepping Ambisonics signals

Due to the current availability of excellent microphone arrays and panning tools, capable of full 3rd order, nowadays it is considered completely obsolete and low-quality to employ the crappy, old 1st-order Ambisonics format (4-channels) either for recording and for mastering.

Furthermore, as it will be shown, the Facebook Spatial Workstation contains a "spatialser" module affected by a severe bug, hence it is strongly recommended NOT to use it!

Better to employ a complete workflow all at third order (16 channels), and eventually downsize it to a smaller number of channels for delivery, but also keeping the master at the highest possible spatial resolution.

Ambisonics is hierarchical, so downsizing to 2nd or 1st order is trivial: just remove the last 7 channels for getting a 9-channels 2nd-order soundtrack. Further discard the last 5 channels, and you get a a 4-channels 1st order. This is true both employing the old-style choice for channel numbering and scaling, called Furse-Malham (FuMa), not used anymore nowadays, or the modern version of Ambisonics channel numbering (ACN) and scaling (SN3D), called Ambix.

But converting a 3rd-order (or 2nd-order) Ambisonics soundtrack to the "proprietary" TBE format (8 channels) is not so trivial, and hence the need for the information contained in this web page.

The TBE 8-channels format

First of all: why the TBE format only employs 8 channels? This is due for compatibility with a number of old, but still widely employed, Digital Audio Workstation programs (DAW), such as Avid Protools.

Of course, modern DAWs, such as Reaper, do not suffer of such old-times limitation, and can easily manage soundtracks containing 32 or even 64 channels.

However, Protools is still the most widely employed DAW on professional studios, albeit Reaper is growing rapidly, not only for his superior technology, but also for the more affordable price.

The Facebook Spatial Workstation is mostly based on two VST plugins: the first is called Spatialser, and operates as a panner for a mono source:

Up to 7 mono sources can be panned around with this tool.

The second plugin is called Control, and operates as a virtual room simulator for an already-encoded 8-channels TBE stream:

The Control plugin can also work as a realtime binaural rendering tool, staying in synch with an equirectangular video being played on an Oculus Rift device attached to the computer. Its main function, indeed, is to allow the Video Player module to feed the background video image to the Spatialser plugin, allowing to see where each input is panned (same function as O3A Panner Large).

The following Plogue Bidule setup is a typical workflow for encoding two sound sources into an 8-channels TBE soundtrack, which is saved to an 8-channels file:

However, currently no microphone system is capable of delivering directly a ready-to-go 8-channels TBE stream, and most sound engineers skilled in Ambisonics (not last-minute arrivals to Ambisonics scene) prefer to master in 16-channels 3rd Order Ambisonics, instead of in 8-channels TBE.

Disclosing the TBE format

The TBE format is not a standard Ambisonics format, instead is some sort of "mixed order" Ambisonics, in which two of the 9 channels of 2nd order (namely, channels named Z and R) have been summed together, for reducing the channel count form 9 to 8.

The conversion formulas from Ambix to TBE, currently implemented the Spatialiser module, are the following:

TBE(1) = 0.486968 * Ambix(0)

TBE(2) = -0.486968 * Ambix(1)

TBE(3) = 0.486968 * Ambix(3)

TBE(4) = 0.344747 * Ambix(4) + 0.445656 * Ambix(6)

TBE(5) = -0.630957 * Ambix(8)

TBE(6) = -0.630957 * Ambix(4)

TBE(7) = -0.630957 * Ambix(5)

TBE(8) = 0.630957 * Ambix(7)

This is what is currently implemented in the Spatialser module by the inventors of this format (Two Big Ears). But it is a very bad choice, and here we explain why.

In practice, the information of the vertical position (elevation) of a sound source is contained just in channel 4, which is a mixture of Ambisonics channels Z and R.

But this causes the mapping between source elevation and amplitude of channel TBE(4) to be the following:

While the original Z channel increases monotonically with elevation, and hence at each elevation corresponds one and only one value of Z, the TBE(4) signal has an oscillating behaviour, making it impossible to reconstruct the elevation of the sound source from the amplitude of the TBE(4) channel.
The choice to include the Ambisonics channel R appears to have been a last-minute decision of the guys at Two Big Ears, as most of their framework is in reality expecting that TBE channel n. 4 only contains Z. This appears quite evident making some listening tests with the demo soundtracks which have been created for testing the behaviour of Facebook's rendering software for the Samsung Gear VR, by side-loading them in Oculus Video, and which can be downloaded here:

http://www.angelofarina.it/Public/Jump-Videos/TBE-elevation-error/Oculus-Video-Facebook/

The contents of the 4 demo tracks is explained here:

Test-Ambix-4ch-TBE_360.mp4               4-channels Ambix soundtrack imported in 8-ch TBE format by the Spatialser (channel 4 equates Z, channels 5 to 8 are silent)
Test-Ambix-4ch-Youtube_360.mp4           4-channels Ambix soundtrack directly merged with the video by the Encoder (a single 4-channels audio track is saved inside the MP4 container) 
Test-FB-spatialiser-8ch-TBE_360.mp4      8-channels TBE soundtrack created using the SPatialiser - channel TBE(4) contains a mixture of Z and R, and is rendered wrong  
Test-Ambix-8ch-OK-TBE_360.mp4            8-channels TBE soundtrack created from a 16-channels 3rd order Ambix, employing just Z for channel TBE(4)

The last two sound samples have also been loaded to Facebook:

https://www.facebook.com/angelo.farina.1958/videos/1520288121328070/ Spatialser output (wrong)

https://www.facebook.com/angelo.farina.1958/videos/1520291077994441/ 3rd order Ambix properly converted to TBE using X-volver

These experiments did show that the Spatialser module produces a TBE signal which is not properly decoded by Oculus Video, nor by Facebook itself.
Instead, if the spatial audio information is first created with proper tools (O3A, Ambix) in high quality 3^rd Order Ambisonics (16 channels), then it is possible to convert it to "proper" TBE 8-channels format by using these modified formulas:

TBE(1) = 0.488704 * Ambix(0); W
TBE(2) = -0.488603 * Ambix(1); Y
TBE(3) = 0.488603 * Ambix(3); X
TBE(4) = 0.488603 * Ambix(2); Z
TBE(5) = -0.630783 * Ambix(8); U
TBE(6) = -0.630783 * Ambix(4); V
TBE(7) = -0.630783 * Ambix(5); T
TBE(8) = 0.630783 * Ambix(7); S

And these formulas can be implemented in Plogue Bidule, as follows:

It can be seen as Ambix channel R is not employed anymore, resulting in channel TBE(4) only containing Z.
If also the Spatialser plugin had worked this way, then it could be used for creating "correct" TBE signals.

Unfortuntaley, the Spatialser plugin still outputs a TBE(4) channel contaminated by the Ambisonics channel R, and hence creates a TBE signal which provides wrong elevation. This even with the latest Facebook Spatial Workstation version 2.1. We hope that the guys at Two Big Ears will fix this error very soon....

Hence the current recommandation is NOT USE Spatialiser Plugin !!!

Instead, use TOA Panner or Ambix Encoder for positioning your sources in the spherical horizon, and then convert the resulting 16-channels 3rd-order Ambix soundtrack to TBE format as explained in the following chapter. Of course, using these 3rd-order tools requires that your DAW supports 16-channels tracks, which is well feasible in Plogue Bidule, Audio Mulch, Pure Data or Reaper, but is not feasible in Protools HD. Time to leave PT and switch to a better DAW, in my opinion...

So we have now a nice 16-channels Ambix recording:

And we want to convert it to TBE 8-channels. The solution, of course, is NOT to drop the last 8 channels!

We can use Plogue Bidule as shown ealier, applying 8 gains and reordering 8 of the 16 channels, discarding the others. But doing this can be tricky in most DAWs, so here we have another solution:

Using X-volver for 3^rd order Ambisonics to TBE conversion

If your DAW does not allow to enter easily the 8 gain factors and to reorder the channels, you can always use X-volver with a proper 16x8 FIR filter matrix, as shown here:

The FIR filter matrix and the Plogue Bidule configuration file can be downloaded here:
http://www.angelofarina.it/Public/Xvolver/Filter-Matrices/TwoBigEars-encoder-OK/

From TBE back to Ambisonics (to surround)

In theory it is impossible to go back from 8-channels TBE to 9-channels 2nd-order Ambisonics, as one channel is missing.

However, the loss of information only affects the vertical information.

This means that from the TBE signal it is possible to recover a perfect 2nd-order horizontal-only Ambisonics stream (5 channels), which is perfect for driving a "surround" horizontal-only loudspeaker rig, such as a "standard" 5.1 system, or, better, a perfect centered octagon rig (8 channels, with loudspeaker #1 in front of the listener):

Image result for surround 5.1 speaker placement

A number of excellent 2nd-order Ambisonics decoders are available for free: for example, for a 5.1 loudspeaker layout, the Wigware decoder by Bruce Wiggins provides excellent performances:

For an Octagon, I would recommend the Ambisonics Bidules by Aristotel Digenis.

For feeding a standard 2nd-order horizontal-only decoder, typically a set of just 5 channels is required, with FuMa channel ordering and scaling.

The following Plogue Bidule setup retrieves these 5 channels form the 8-channels TBE, and feeds the 5.0 Ambisonics decoder, also providing the Subwoofer signal:

It can be seen that the gains required for converting back from TBE to Ambisonics (FuMa) are the following:

W = 1.446968601 * TBE(1)
X = 2.047502048 * TBE(3)
Y = -2.047502048 * TBE(2)
U = -1.839587932 * TBE(5)
V = -1.839587932 * TBE(6)

Again, if your DAW does not allow easily to implement the set of gains and the channel reordering required, you can always use X-volver with a proper 8x6 FIR filter matrix, as shown here (please note that also the Wigware Ambisonics decoder and bass management for LFE is included in this filter matrix):

It must be noted that the CPU usage with X-volver is much smaller than using Wigware (0.29% instead of 13.25%), albeit providing exactly the same results.

Monitoring TBE on a 7.1 system

Following the same lines explained in the previous chapter, here we show how it is posisble to setup a live monitoring system to be used when working the the Facebook Spatial Workstation, using a 7.1 horizontal-only loudpeaker system.
Only a few 2nd-order Amsbionics decoders are available for the 7.1 layout, and most of them are not really compliant to the 7.1 layour specifications.

The one used here is TOA Decoder - 7.1. The follwoing table shows the mismatch between "standard" 7.1 channels and channel assignments of the outputs of this decoder plugin:

N.	Standard Name	Standard Azimuth (deg)	TOA Name	TOA Azimuth (deg)
1	L	30	Front Left	30
2	R	-30	Front Right	-30
3	C	0	Center	0
4	LFE	0	Bass (LFE Only - muted)	0
5	LS	110	Back Left	150
6	RS	-110	Back Right	-150
7	LB	150	Side Left	90
8	RB	-150	Side Right	-90

As we do not have anything better, we employed this 7.1 decoder by rewiring its outputs so that at least the channel ordering is correct, as shown here:

Recreating this setup in any DAW not so versatile as Plogue Bidule is probably impossible. So we did sample the FIR filter matrix of this processing network, and stored it in a nice 8x8 WAV file, ready to be used with X-volver, as shown here:

So, just employing X-volver on an 8-channels bus (which is available in any DAW) it becomes possible to monitor on a 7.1 loudspeaker system, preserving the sharpness of full 2nd order Ambisonics.

Downloads

The FIR filter matrices and the Plogue Bidule configuration files can be downloaded here:
http://www.angelofarina.it/Public/Xvolver/Filter-Matrices/TwoBigEars-encoder-OK/