Conversion from Ambisonics to TBE format

(and back)

 

    This is an even newer version of this web page, rewritten on 22 August 2017 after the release of the Facebook Spatial Workstation 2.2, which contains the fix of the bug discovered in previous implementations of the TBE plugins.
It has been further updated on 10 December 2017, taking into account the release of version 3.0 of he Facebook Spatial Workstation, which finally supports full standard 2nd-order Ambisonics (9 channels).

 The older versions are here and here for archiving.

 

Ambisonics is a widely employed format for capturing, transmitting and decoding spatial audio, employing 1, 4, 9,16 or 25 channels. The Two Big Ears TBE format is an 8-channels format employed by the Facebook's Spatial Workstation and being part of the "spatial audio" 8+2 channels available for Facebook's 360 videos.
As it will be shown here below, TBE is basically 2nd-order Ambisonics, where one of the Ambisonics 2nd-order channels (R) has been removed, for reducing the number of channels to 8. TBE is currently considered a "fading out" format, as te current version of Facebook Spatial Workstation 3.0 and newer supports also full 2nd-order Ambisonics (9 channels). So in the near future no one should use anymore the TBE 8-channels format...

This web page provides information about the conversion between High Order Ambisonics (typically 3rd order, 16 channels) and TBE format, allowing users of the excellent Ambisonics suites of plugins currently available such as Ambix by Matthias Kronlachner, O3A by Richard Furse, Ambipan by Noisemakers, 360Pan Suite by Audioease, etc., to convert their mixes for creating soundtracks for the Facebook platform with minimal loss of spatial information.

Some info is also provided on techniques employed for converting back the TBE signals to horizontal-only 2nd-order Ambisonics, allowing for monitoring of the soundtrack of a Facebook 360 video employing a standard Ambisonics decoder feeding a loudspeaker rig, instead of employing headphones.
If instead you want to leave the Ambisonics arena, and convert TBE to the new, emerging Spatial PCM Sampling (SPS) approach, have a look here: http://www.angelofarina.it/SPS-conversion.htm


Ambisonics signals

An Ambisonics multichannel sound track is not based on the concept of "position of the loudspeakers", as in a stereo (Left-Right) or "surround 5.1" (Left, Right, Center, Sub, Left Surround, Right Surround) formats.

Instead, the complete spatial information is represented by a number of "spherical harmonics" signals, at increasing orders. The number of signals for each order is increasing, and the corresponding polar patterns of the equivalent "virtual microphones" are shown here:

An Ambisonics stream is said to be "of order n" when it contains all the signal of orders 0 to n. For example, an Ambisonics signal of order 3 (the most widely used, nowadays) contains 16 channels: 1 of order 0, 3 of order 1, 5 of order 2 and 7 of order 3. Please note that, following the Ambix convention, numbering of Ambisonics channels starts from 0 instead of 1.

 

Recording Ambisonics signals

An ambisonics stream of order 3 can be easily recorded, employing a suitable microphone system, such as the Eigenmike(TM) or other microphone arrays of different shape. Here you can see an Eigenmike and the RAI-AIDA cylindrical microphone array built by dismounting an Eigenmike and remounting capsules and electronics inside a cylindrical body, for providing increased accuracy in azimuthal information:

Eigenmike (Left) and RAI-AIDA Cylindrical Microphone Array

 

Creating synthetic Ambisonics signals

A 16-channels Ambisonics stream (3rd order) can also be created synthetically, starting form a mono track and employing a software tool (typically a VST plugin) known as an "encoder" or a "panner".

Two excellent and free tools are widely used nowadays, which are included in the plugin suites named Ambix and O3A:

  

Recently O3A was updated, including a "graphical" panner, which allows to use a background equirectangular image for showing the position of the sound source being panned around:

 

Downstepping high order Ambisonics signals

Due to the current availability of excellent microphone arrays and panning tools, capable of full 3rd order, nowadays it is considered completely obsolete and low-quality to employ the crappy, old 1st-order Ambisonics format (4-channels) either for recording and for mastering. Even 2nd order is currently obsolete.

Better to employ a complete workflow all at third order (16 channels), and eventually downsize it to a smaller number of channels for delivery, but always keeping the master at the highest possible spatial resolution.

Ambisonics is hierarchical, so downstepping to 2nd or 1st order is trivial: just remove the last 7 channels for getting a 9-channels 2nd-order soundtrack. Further discard the last 5 channels, and you get a a 4-channels 1st order. This is true both employing the old-style choice for channel numbering and scaling, called Furse-Malham (FuMa), not used anymore nowadays, or the modern version of Ambisonics channel numbering (ACN) and scaling (SN3D), called Ambix.

But converting a 3rd-order (or 2nd-order) Ambisonics soundtrack to the "proprietary" TBE format (8 channels), or going backwards, is not so trivial, and hence the need for the information contained in this web page.

 

The TBE 8-channels format

First of all: why does the TBE format only employ 8 channels? This is due for compatibility with a number of old, but still employed, Digital Audio Workstation programs (DAW), such as old versions of Avid Protools.

However, ProTools has been updated (as of November 2017) to a new version fully supporting true 3rd order Ambisonics (16 channels in a single WAV file, in a single track or in a single bus), so this 8-channels limitation is simply anachronistic. And in facts, starting with version 3.0 of the Facebook Spatial Workstation, also full 2nd-order Ambisonics format (9 channels) is supported (and full 3rd order is announced), hence I do not see any reason for employing this obsolete 8-channels TBE format anymore.

The Facebook Spatial Workstation is mostly based on two VST plugins: the first is called Spatialiser, and operates as a panner for one or more sound sources:

Up to 7 mono sources can be panned around with this tool. It is still not easy to understand why the plugin cannot be used with 8 mono sound sources, as it would make sense, as often the plugin is employed on an 8-channels bus.

The second plugin is called Control, and operates as a virtual room simulator for an already-encoded 8-channels TBE stream:

The Control plugin can also work as a realtime binaural rendering tool, staying in synch with an equirectangular video being played on an Oculus Rift or HTC Vive device attached to the computer. Its main function, indeed, is to allow the Video Player module to feed the background video image to the Spatialser plugin, allowing to see where each input is panned (same function as O3A Panner Large).

The following Plogue Bidule setup is a typical workflow for encoding two sound sources into an 8-channels TBE soundtrack, which is saved to an 8-channels file:

However, currently no microphone system is capable of delivering directly a ready-to-go 8-channels TBE stream, and most sound engineers skilled in Ambisonics prefer to master in 16-channels 3rd Order Ambisonics, instead of in 8-channels TBE.


 

Disclosing the TBE format

The TBE format is not a standard Ambisonics format, instead is some sort of "mixed order" Ambisonics, in which one of the 9 channels of 2nd order (namely, channel R) has been removed, for reducing the channel count form 9 to 8.
These are the encoding formulas providing the proper gains to be applied to a mono soundtrack for being encoded in TBE format, depending on the values of Azimuth (a) and Elevation (e) of the sound source:

TBE(1) =  0.488603
TBE(2) = -0.488603*sin(a)*cos(e)
TBE(3) =  0.488603*cos(a)*cos(e)
TBE(4) =  0.488603*sin(e)
TBE(5) = -0.546274*cos(2*a)*(cos(e))^2
TBE(6) = -0.546274*sin(2*a)*(cos(e))^2
TBE(7) = -0.546274*sin(a)*sin(2*e)
TBE(8) =  0.546274*cos(a)*sin(2*e)

Comparing the above formulas with the standard Ambisonics formulas (as published here), we find the transcoding formulas from Ambisonics (Ambix version) and TBE:

TBE(1) =  0.488603 * Ambix(0); W
TBE(2) = -0.
488603 * Ambix(1); Y
TBE(3) =  0.
488603 * Ambix(3); X
TBE(4) =  0.
488603 * Ambix(2); Z
TBE(5) = -0.630783 * Ambix(8); U
TBE(6) = -0.630783 * Ambix(4); V
TBE(7) = -0.630783 * Ambix(5); T
TBE(8) =  0.630783 * Ambix(7); S

 

You can see that transcoding it is not just matter of gains and channel order, also some signs are reversed.

This is due to the fact that the TBE format employs a "wrong" polarity for azimuth, not compliant with current international standards.

Please note that the current ISO2631 standard defines an XYZ and azimuth-elevation coordinate systems as follows:

As shown in the figure, the X axis is pointing in front of the listener, Y axis is point to his left ear and Z axis is pointing up to the sky.

So azimuth is 0 in front of the listener, going left it assumes positive values (the left ear is at a=+90 degrees), and going right it assumes negative values (the right ear is at a=-90 degrees).

Elevation is 0 on the horizontal plane (equator), becomes positive going up towards the North Pole, and assumes negative values goung down towards the South Pole.

The above-described reference system has been standardised by the the International Standard Organization (ISO) and is described in the ISO-2631 standard. This reference system is usually correctly implemented in Ambisonics software and hardware, since more than 30 years.

Despite this standardisation currently some software is not compliant with the ISO standard, for example the GUI of Ambix plugins and Facebook Spatial Workstation, which apparently employ azimuth with reversed polarity. Nevertheless, these software tools internally use the correct convention, and the resulting Ambsionics-encoded WAV files have correct polarity of the Y-related channels (positive polarity for sources on the left of the listener).

However, as shown in the tables above, when exporting TBE signals the Facebook Spatial Workstation reverses the polarity of Y-related channels, and this caused a lot of troubles to people who did not have the possibility to understand the contents of this web page.

Eigenstudio (the software accompanying the Eigenmike)  instead has correct azimuth, but uses an elevation counting "down" from 0 at North pole, 90 at Equator and 180 at South pole.

So be careful, always check that the software you are using is adhering (both internally and in their GUI) to the official ISO 2631 standard, as explained here. If your software is not ISO-compliant, it is warmly suggested to complain with the author, for having it fixed. Ambisonics is the de-facto standard for Spatial Audio and VR, so it is very important that all the developers adhere striclty to the standard, for ensuring intercompatible operation, and avoiding problems to their users.
 


Conversion from Ambix to TBE

The transcoding formulas shown above can be implemented in Plogue Bidule, as follows:

It can be seen as Ambix channel R is not employed at all.

Please note that, before the release of the Spatial Workstation version 2.2, the Spatialser plugin was outputting a TBE(4) channel contaminated by the Ambisonics channel R, and hence created a TBE signal which provided wrong elevation.

So it is highly recommended to re-encode in TBE format all previous released recordings, which were created with previous version of the Facebook Spatial Workstation, as the resulting TBE signals were wrong. The re-encoding should be done using Facebook Spatial Workstation version 2.2 or higher, or the conversion methods for TOA to TBE presented in this web page.

The recommended workflow is currently to use TOA Panner or Ambix Encoder for positioning your sources in the spherical horizon, and then convert the resulting 16-channels 3rd-order Ambix soundtrack to TBE format as shown above. Of course, using these 3rd-order tools requires that your DAW supports 16-channels tracks, which is well feasible in Plogue Bidule, Audio Mulch, Pure Data, Reaper or in the latest version of Protools HD.

So we start with a nice 16-channels Ambix recording:

And we want to convert it to TBE 8-channels. The solution, of course, is NOT to drop the last 8 channels!

We can use Plogue Bidule as shown ealier, applying 8 gains and reordering 8 of the 16 channels, discarding the others. But doing this can be tricky in most DAWs, so here we have another solution:

 

Using X-volver for 3rd order Ambisonics to TBE conversion

If your DAW does not allow to enter easily the 8 gain factors and to reorder the channels, you can always use X-volver with a proper 16x8 FIR filter matrix, as shown here:

The FIR filter matrix and the Plogue Bidule configuration file can be downloaded here:
http://www.angelofarina.it/Public/Xvolver/Filter-Matrices/TwoBigEars-encoder-OK/

 


From TBE back to Ambisonics (to surround)

In theory it is impossible to go back from 8-channels TBE to 9-channels 2nd-order Ambisonics, as one channel is missing.

However, the loss of information only affects the vertical information.

This means that from the TBE signal it is possible to recover a perfect 2nd-order horizontal-only Ambisonics stream (5 channels), which is perfect for driving a "surround" horizontal-only loudspeaker rig, such as a "standard" 5.1 system, or, better, a perfect centered octagon rig (8 channels, with loudspeaker #1 in front of the listener):

Image result for surround 5.1 speaker placement 

A number of excellent 2nd-order Ambisonics decoders are available for free: for example, for a 5.1 loudspeaker layout, the Wigware decoder by Bruce Wiggins provides excellent performances:

For a centered Octagon, or otehr "unconventional speaker layouts, I recommend Rapture3D by Richard Furse (Blue Ripple Sound).

For feeding a standard 2nd-order horizontal-only decoder, typically a set of just 5 channels is required, with FuMa channel ordering and scaling.

The following Plogue Bidule setup retrieves these 5 channels form the 8-channels TBE, and feeds the 5.0 Ambisonics decoder, also providing the Subwoofer signal:

It can be seen that the gains required for converting back from TBE to Ambisonics (FuMa) are the following:

W =  1.446968601 * TBE(1)
X =  2.047502048 * TBE(3)
Y = -2.047502048 * TBE(2)
U = -1.839587932 * TBE(5)
V = -1.839587932 * TBE(6)

Again, if your DAW does not allow easily to implement the set of gains and the channel reordering required, you can always use X-volver with a proper 8x6 FIR filter matrix, as shown here (please note that also the Wigware Ambisonics decoder and bass management for LFE is included in this filter matrix):

It must be noted that the CPU usage with X-volver is much smaller than using Wigware (0.29% instead of 13.25%), albeit providing exactly the same results.


Monitoring TBE on a 7.1 system

Following the same lines explained in the previous chapter, here we show how it is possible to setup a live monitoring system to be used when working with the Facebook Spatial Workstation, using a 7.1 horizontal-only loudpeaker system.
Only a few 2nd-order Amsbionics decoders are available for the 7.1 layout, and most of them are not really compliant to the 7.1 layout specifications.

The one used here is TOA Decoder - 7.1. The following table shows the mismatch between "standard" 7.1 channels and channel assignments of the outputs of this decoder plugin:

N. Standard Name Standard Azimuth (deg) TOA Name TOA Azimuth (deg)
1 L 30 Front Left 30
2 R -30 Front Right -30
3 C 0 Center 0
4 LFE 0 Bass (LFE Only - muted) 0
5 LS 110 Back Left 150
6 RS -110 Back Right -150
7 LB 150 Side Left 90
8 RB -150 Side Right -90

As we do not have anything better, we employed this 7.1 decoder by rewiring its outputs so that at least the channel ordering is correct, as shown here:

Recreating this setup in any DAW not so versatile as Plogue Bidule is probably impossible. So we did sample the FIR filter matrix of this processing network, and stored it in a nice 8x8 WAV file, ready to be used with X-volver, as shown here:

So, just employing X-volver on an 8-channels bus (which is available in any DAW) it becomes possible to monitor on a 7.1 loudspeaker system, preserving the sharpness of full 2nd order Ambisonics.


Downloads

The FIR filter matrices and the Plogue Bidule configuration files can be downloaded here:
http://www.angelofarina.it/Public/Xvolver/Filter-Matrices/TwoBigEars-encoder-OK/


All the contents are Copyright by Angelo Farina, 2017