Final Methodology | Synthesized 3D Audio – Stage 1 (Development in progress)

DAW: Reaper


The thought behind the encoding system was to simulate the object-based audio by synthesizing a 2nd order ambisonics signal, which is not depending on a certain speaker’s array, using a virtual microphone and a binaural panner which will allow the signal to be positioned with high accuracy in the three-dimensional space.
To create the simulation, the following chain of plug-ins was used:

FB360 Converter:


FB360 received the monophonic signal, and it was spread across eight channels based on the Spatial Workstation 8ch algorithm, an Ambisonics hybrid format. It was discontinued by Facebook in recent updates due to the new ambiX format, which uses the channel ordering: W, Y, Z, X (SN3D normalisation) instead of the FuMa (B-Format) ordering: W, X, Y, Z. The FB360 Converter version 2.2.1 still has the Spatial Workstation format available.

IEM Room Encoder:


The IEM Room Encoder received the eight channels signal from the FB360 Converter and it placed it into a room space where the depth, width and height,the listener position in the horizontal or vertical plane and the number of reflections can be controlled. The signal is then converted into 2nd order ambisonics with SN3D normalization including the room information.

Visual Virtual Microphone:

The VV Microphone was used to simulate the OctoMic microphone which is a 2nd order ambisonics microphone with eight cardioid capsules, four facing top, four facing down. This type of microphones can capture a more detailed and accurate signal.
The virtual microphone was set to simulate the OctoMic capsules type and positioning.

Visual Virtual Tetra VST:


This plug-in was used to simulate the physical tetra microphone. Even if the actual microphone has only four capsules, the received signal contains eight channels, which makes it clearer and more accurate.

My Byno:


This is the most important plug-in in the chain. It takes the eight channels ambisonics signal simulated by the virtual microphone which contains also the room information, and it converts it into a binaural signal using HRTF measurements pre-sets. As the synthesized ambisonics signal contains so much simulated space data, when it gets converted into binaural, it acts like an object-based audio, resulting in very accurate and detailed positioning of the signal into the three-dimensional space.
My Byno can position the signal on a single point in space or it can spread the signal into a 9.1 Auro 3D speakers array.

O3A Panner – Two Channel:


As the My Byno turned the signal into binaural, which contains two channels, the O3A Panner – Two Channel was added to the chain, with no setting, to take the binaural signal and spread it back into eight channels. The O3A Panner algorithm does not affect the binaural information when it is converted.

Other Plug-ins:

The signal becomes weak after all the processing, so the O3A Gain was added to boost the signal as it supports multichannel audio.
When it comes to filtering, the mcfx filter was inserted, once more as it supports multichannel audio.



The reasoning behind the decoding system was to simulate the classic channel-based audio. To simulate it, the signal needs to be decoded using virtual speakers array.
Combining the encoding with the decoding systems, it will result into a simulation of Dolby Atmos.
For the decoding it was used the following chain of plug-ins:

Surround IO:

Surround IO simulates the inputs and outputs of an Auro 3D speaker’s array system. The encoded eight channels signal was mapped into fourteen virtual speakers, including the height channels.
There was no rule in the mapping procedure. The channels were routed based on the information available on that specific channel. As a result, for example, channel number three was mapped to the right as well as the left and right surround. Also, channel number seven and eight which both contain height information, were mapped to the entire height speaker.
The mapping is very unusual; nonetheless combining the channels this way meant the three-dimensional information was not lost. An explanation to this mapping is that phasing has kept the 3D space intact because of the overlapped signals.

O3A FuMa Injector and Decoder:


This stage of the decoding is still a mystery. At this stage, the problem faced was that the mixing was predominantly to the left side.
The decoding format needed was FuMa, an extension of the classic B-Format, in 2nd order ambisonics which is compatible with all the streaming platforms and in theory can keep all the three-dimensional information’s on any platforms after export. ( 2018)
O3A Decoder was inserted on the track for the conversion of the ambisonics signal into FuMa. The problem was that the polarity of the signal was flipped, and the signal became predominant on the right side. Several other decoders where tested, however all failed.
In the process of testing different plug-ins, the O3A Injector was inserted in the chain ahead of the O3A Decoder, by accident. At that point, the three-dimensional image was back to normal (same as before the decoding process) but in FuMa format.
The only unconfirmed theory regarding this situation, is that the O3A Injector, as the name suggests, and according with their official website, “applies some gentle processing to first order ambisonics material that sometimes helps when injecting it into a third order O3A mix” ( 2018). One way or another the first four channels of the received signal were modified and combined with the O3A Decoder processing, it resulted in correct signal routing, with minor effect on the three-dimensional space.

Ambix Decoder:


The previous decoding process reduced the spatiality of the three-dimensional signal. To fix this matter, a decoder was inserted, which will simulate a 22.2 speaker array, explained in the literature review. This decoder exaggerates the three-dimensional space, and when it goes into the last decoding process, where most of the 3D image will be lost, it will keep the amount of 3D spatialization, similar to the one in the encoding process.

Surround Phone 3D:


This is the last plug-in inserted into the decoding process and completes the chain of plug-ins used for this methodology.
The Surround Phone 3D receives the signal from the Ambix Decoder, converts it into a 9.1 Auro 3D format and outputs the complete three-dimensional audio into a stereo format using HRTF’s measurements.
As it narrows the sound too much, it was blended only 15% into the mix. This way, the three-dimensional image remains clear and will not be lost in the exporting process.

Sound Design and Mixing

The sound design process used the classic methodologies. Most of the sound effects have been used from royalty free databases and personal libraries. Sounds like steps, clothes movements and ambient sounds, were recorded specifically for this project using an sE Electronics Laser Pro microphone and a Tascam DR-22WL. Due to the limited period left for the sound design process, there was no time to create an entirely new library for this demonstrational video.
The mixing process was based on surround sound methodology, the only difference being the full three-dimensional space which gives more accuracy for the placement of sounds in the mix. It was done scene by scene and in the mastering process, all the scenes where mixed and brought to the standard loudness for film (-24 LUFS).


The second and final methodology tests and survey results, demonstrates that by using the available tools (not necessary in the way they were intended to be used, or combined) an immersive sound can be achieved with freeware plug-ins.
The biggest difficulty of this methodology is the large number of plug-ins needed on each track for the encoding process which results in CPU and Memory RAM overload. The computer used to develop the methodology was an Intel Quad Core i7 3770 (4 cores, 8 threads) with 8Gb of RAM and a Nvidia GTX 670 graphic card built in 2015. Considering the age of the computer and the old generation of components, it would be possible that a computer from a newer generation would not overload so easily.
Another difficulty with this methodology is the lack of sound in front. With the two-dimensional bus added, there was a small improvement but not a problem solving one.
Overall, it is reflected that the methodology fulfilled its purpose of synthesizing the 3D Audio for Headphones including the elevation plane using only freeware plug-ins and software as well as standard mono or stereo audio sources, and that the methodology in this stage can be used for small project, although there is room for improvement.

Further Development

A further look into the build of the plug-ins used in the methodology, and a good programmer, could result in software’s and plug-ins with capabilities to convert any surround mix into an immersive experience or, to the creation of a simpler and more efficient suite of plug-ins based on this work methodology.

I would like to offer special thanks to all the participants who helped with the testing.

NOTE: A video with the final implementation will be posted soon!

References and Bibliography



Anon., 2005. Ambisonics: File Formats [online]. Available from: [Accessed 15 Feb 2018].

Anon., 2005. University of York: Music Technology Group: Ambisonic hints and tips [online]. Available from: [Accessed 13 Feb 2018].

Anon., 2005. University of York: Music Technology Group: Sound in Space [online]. Available from: [Accessed 13 Feb 2018].

Anon., 2015. 3D Spatial/Binaural Audio in film • r/binaural [online]. reddit. Available from: [Accessed 12 Feb 2018].

Anon., 2017. Ambisonics Explained: A Guide for Sound Engineers | Waves [online]. Available from: [Accessed 10 Dec 2017].

Anon., 2017. Oculus VST Spatializer for DAWs Integration Guide [online]. Available from: [Accessed 27 Nov 2017].

Anon., 2018. Dolby Atmos for Mobile Devices [online]. Available from: [Accessed 13 Feb 2018].

Anon., 2018. Dolby Atmos in the Cinema [online]. Available from: [Accessed 13 Feb 2018].

Anon., 2018. HOA Technical Notes – SN3D B-Format [online]. Blue Ripple Sound. Available from: [Accessed 20 Apr 2018].

Anon., 2018. Introduction — Facebook Audio 360 documentation [online]. Available from: [Accessed 26 Mar 2018].

Anon., 2018. Pro Tools | Ultimate Subscriptions and Upgrades – Music Software | Avid [online]. Available from: [Accessed 12 May 2018].

Anon., 2018. REAPER | User Guide [online]. Available from: [Accessed 12 Mar 2018].

Anon., 2018. Theatrical Releases in Dolby Vision and Dolby Atmos [online]. Available from: [Accessed 10 Feb 2018].

Anon., 2018. [online]. Available from: [Accessed 28 Mar 2018].

Osborne-Walker, S., 2018. What is Dolby Atmos? All you need to know [online]. Available from: [Accessed 17 Mar 2018].

Pike, C., 2017. Sounding Special: Doctor Who in Binaural Sound – BBC R&D [online]. Available from: [Accessed 11 Feb 2018].

Robjohns, H., 2001. Surround Sound Explained: Part 3 | [online]. Available from: [Accessed 10 Dec 2017].

Soetendorp, R. and Meletti, B., 2018. Education – CopyrightUser [online]. CopyrightUser. Available from: [Accessed 17 Mar 2018].

Trusted Reviews, 2017. [image] Available from:$/$/$/$/$/$/$/$ [Accessed 25 Oct. 2017].

Ganz, C., 2008. The 1933 Chicago World’s Fair : A Century of Progress [online]. 1st ed. ebook. Illinois: University of Illinois Press. Available from: [Accessed 9 Jan 2018].

Roginska, A. and Geluso, P., 2018. Immersive Sound : The Art and Science of Binaural and Multi-Channel Audio [online]. 1st ed. ebook. New York: Taylor & Francis Group. Available from: [Accessed 11 Mar 2018].

Potisk, T., 2015. Head-Related Transfer Function [online]. ebook. Ljubljana: University of Ljubljana Faculty of Mathematics and Physics. Available from: [Accessed 14 Mar 2018].

Schnupp, J., Nelken, I. and King, A., 2011. Auditory neuroscience. Cambridge, Mass.: MIT Press. pp.177-189.

van Opstal, J., 2016. The Auditory System and Human Sound-Localization Behavior [online]. 1st ed. ebook. London: Elsevier. Available from: [Accessed 21 Mar 2018].

Wightman, F. and Kistler, D., 1997. Monaural sound localization revisited. The Journal of the Acoustical Society of America [online], 101 (2), 1050-1063. Available from: [Accessed 13 Feb 2018].



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s