We'll start with the 3 dimensional capture issues: What the
heck is B Format anyway? Now imagine a sculpture of a cloud. It has all 3 dimensional components in evidence-we see depth, width and height. If the sculptor was highly skilled, the sculpture might look like a real cloud sitting on the table. We could probably confuse a really accurate sculpture for a real cloud. The same thing is true in audio, that high resolution, two dimensional audio (stereo) can be very good indeed, but it is not even close to describing the real live event. The "three dimension" concept can be confirmed with your own ears. When you hear a natural acoustical event, such as a bird singing in a tree, you can almost instantly locate the source as "in front of you, to the left and up there". If you heard the event in only two dimensions, you might know the bird was singing in front of you to the left, but you would not know if it was "up there" or "down there". Without the vertical information, you couldn't be sure exactly where the sound was coming from. B Format compares to stereo just as our cloud story does to sight. Instead of defining the world of sound in just two dimensions, B Format defines it in three. B Format contains information to define front to back (1 dimension), left to right (2 dimensions) and up and down (three dimensions). There is one more issue to understand B Format and three dimensional descriptions of audio events. It's called the "reference". Imagine that you are standing between two trumpets, both being played at the same time. We have sound emanating in all three dimensions from each trumpet, front to back, left to right and up and down, but we have two different "points of origin" of sound. These trumpets are in different locations when compared to our location, which can be thought of as the reference point to these two sounds. To fully describe to someone else the real event of hearing the two trumpets, we need to describe all three dimensions of the sound of each trumpet and the location of the two trumpets compared to where we were standing (the reference). So we really need to describe four elements, the 3 dimensional components of the trumpets and the reference point from which the trumpets were heard. Any attempt to accurately define the real acoustical event of standing between two trumpets playing at the same time in any less than three dimensions, including a central reference point, will fall short of being accurate. The cloud and trumpet analogy give us a good understanding of the technical value of B Format and how it works, delivering 4 channels of audio containing the 3 dimensional components plus a reference. A particularly good scientific description of B Format can be found in Ron Streicher's "The New Stereo Soundbook". Ron discusses all types of microphone methods and recording practices. In chapter 13.11, Ron says: "Gerzon's process focused on the acoustical components of the SoundField at a particular point in space: the absolute sound pressure[the reference] and the three pressure gradients [the three dimensions of sound] that specify the cardinal directions-left/right, fore/aft, and up/down. By accurately preserving these four components, all information needed to recreate that point in the SoundField could be recorded and later reproduced precisely". Click here for a link to Ron Streicher and his book.
More
about the B Format "W" Reference and "Point Source" When you set up a recording microphone on a stand, its capsule location within the microphone becomes the "reference" from where the audio originates. As long as there is only one microphone picking up audio, you have only one reference point and the microphone is a true single "point source". The minute we set up two microphones, we have two pick-up points, two references (two "W's") and we cease to have a "point source". Anytime you have two different pick-up locations of a single audio event, you introduce error if audio from those two locations is ever mixed together. In addition, as we have no central reference, we do not have a realistic image. Just like not being able to explain to someone else how the trumpets sounded without referring to your own position and the trumpets relative position. With two microphones, the sound arrives at each microphone at a different time, and are therefore not phase coherent with each other. Unless the sound you are recording is very compact (like a single voice), and the two microphones are perfectly equidistant from the source, you will have problems staying free of phase cancellation and building a realistic image. Even with equidistant mike positioning, reflections from the environment can cause phase cancellation that impacts realism. A choir or orchestra is a good example that invites error, as many would think to set-up 2, 3 or more microphones to record such as large source. The sound from one "end" of such a large source cannot be prevented from reaching a live microphone at the other "end". There could potentially be a significant distance between the two or three microphones. While this can be avoided by using a single mono microphone, it doesn't help us much when we want a realistic image. The ideal scenario for stereo imaging would be all microphones in the same exact location, (i.e. the same "reference point"), which we know is physically impossible. Perhaps now you are seeing the value of SoundField point source technology, that only a SoundField presents a "single point source" multi-capsule microphone that delivers stereo. Single point recording solves all kinds of phase related and imaging related problems in audio, for it includes a single reference. In stereo mode, a SoundField provides two "virtual" microphones placed at the same exact location. In mid/side, the same thing is true, as is mono recording. In multi-channel, B Format components X, Y, Z and W define just about everything we need to know about an acoustical event. Once we have this information, we can pull specific information from B Format to define any number of points within the 3 dimensional B Format sphere for an audio output. This means we can derive any kind of direction related information to yield any kind of multi-channel surround with precise localization. Do all
SoundField microphones offer B Format? What's the future
of B Format with 5.1? Other B Format
decoding One example of surround with vertical localization is a B Format to 10.1 system designed by Thomas Chen we saw at AES in San Francisco in 1998. Basically its an "upper" 5 channel and "lower" 5 channel system, offering similar horizontal localization to traditional surround systems but adding vertical components for precise up/down localization. Thomas Chen has written a few pages about this system, which will be posted here on this site in the very near future. He will present this information on B Format/16.1 at AES 99 in NYC. Soon we will also post a Third Party Development page on this site for other ideas and information on multi-channel decode and consumer formats derived from B Format. Further Reading: www.ambisonic.net |