The essential component of the field recording process is the microphone. The choice of the recording equipment will inevitably affect the outcome of the recorded audio and ultimately listener’s perception of the soundscape environment. This study aims to compare a few common stereo field recording techniques, by the tonal quality, stereo image, sense of space/envelopment.
This study adopts a two-stage procedure to gather the acoustic data needed to address the research objectives. The first stage attempted to identify potential equipment choices, including recorders and microphone, location, and collect sound recordings. The second stage attempted to analyse the characteristics of these recordings by pinpointing the difference of the tonal quality.
Equipment
The first step of this study is to identify and determine which types of recording equipment will be the research centered around. Various types of microphone polar patterns are utilized among the field recording communities, including XY, Omni, Cardioid pairs and Ambisonics.
-XY
XY microphone appears on various handheld mobile recorders, including some of the most popular models like Zoom H5, Sony PCM-D100, and the latest release from Tascam, Portacapture X8. The X8 is one of the first handheld mobile recorders on the market offering 32bit audio recording with a built-in microphone, and having a lower noise floor than the competitors. For these reasons the Tascam Portacapture X8 is chosen to represent the XY polar pattern.
-Omni
Omni polar pattern is one of the most popular choices for field recordists, providing a transparent and wide audio image resembling the real world. The most sought-after microphone on the market, LOM Uši Pro is the go-to omni microphone for the field recording community because of its exceptionally low noise and great sensitivity, but its low availability makes it extremely difficult to get your hands on. Despite that some other microphones made by various independent manufacturers, including the Clippy EM272 made by Micbooster, are using the identical Primo EM272 electret condenser capsule that appears in the LOM Uši Pro with no difficulties to purchase, providing the same low noise and high sensitivity. The Micbooster Clippy EM272 is chosen to represent the Omni polar pattern in this study.
-Cardioid Pair
Cardioid microphone is the most popular choice in the music recording industry, but speaking of field recording it may not be everyone’s go-to option. Different stereo techniques can be used when placing the microphones, including NOS, ORTF, DIN. The NOS stereo technique places two cardioid microphones 30 cm apart and angled at 90 degrees from one another, and has a realistic stereo effect. It is chosen to be the technique used in the research as it sits right in the middle between XY and Omni speaking of the stereo image.
For the microphone, Haun is not a well-known microphone manufacturer brand, but it is often praised as offering a similar tone quality seen in the much more expensive counterpart such as Schoeps in an affordable price point. Haun MBC660 is chosen to represent the Cardioid pairs.
-Ambisonics
In addition to the horizontal plane, the full-sphere surround sound format known as ambisonics also includes sound sources above and below the listener. Recording engineers, sound engineers, and composers are becoming more interested in Ambisonics as sophisticated digital signal processing becomes more widely available. However, there is a difficulty presented as ambisonic recording equipment can be inaccessible in terms of selling price. Some microphones such as the Rode NT-SF1 can cost up to USD$1000 and require an external 4-channel recorder.
The most approachable solution is the H3-VR recorder by Zoom, with the built-in ambisonic microphone, its ease of use and low price makes it the most popular ambisonic recording equipment. The down side of H3-VR is the noise floor is a little higher than expected, and the gain setting is not flexible enough which could easily clip the microphone in a loud environment. Nonetheless, its price point at only USD$200 makes it the most suitable equipment for this study.
(The Ambix Format B ambisonic recording will be converted into two stereo recordings, emulating omni and cardioid virtual microphones.)
Location
To test the similarities and differences of different microphones in depth, a place with rich and varied sound is required. An urban place with multiple types of transportations, large amount of human activities which generates man-made noise can fulfill these requirements.
The chosen recording location is a crossroads outside a housing estate named Grandeur Terrace, located at the north side of Tin Shui Wai, where the Light Rail meets with traffic, a fresh food stand, and a pedestrian crossing. It produces a very rich soundscape with an excellent mixing of various elements, making it a fantastic area to sample the sound recording.
(Tin Shui Wai is located at the North-West side of Hong Kong, housing more than 290,000 residents with a population density of 66000/km². It has a unique transportation system called Light Rail which runs through the whole Tin Shui Wai. )
(GPS location: 22.468895, 114.000079)
Setup
To reduce the variance caused by distance differences, the recorders and microphones are mounted on the same tripod in close proximity facing the same direction, without interfering with each other.
-The Omni pair is spaced 30cm apart.
-The cardioid pair is positioned in NOS stereo technique (spaced 30 cm apart and angled at 90°).
-The built-in XY and Ambisonic microphones are recorded directly into the recorders respectively.
-Haun MBC660 and Micbooster Clippy EM272 are connected to Tascam Portacapture X8 external XLR input.
-Tascam Portacapture X8 recorded all tracks in WAV 32bit/96kHz format.
-Zoom H3-VR was recorded in WAV 24bit/96kHz format.
Post Processing
A field recording session was conducted in August 2022 at the selected location. Four 15-minutes tracks are recorded simultaneously.
The sound recordings were subsequently processed in Avid Pro Tools. Each track is applied the same processing chain, and gain matched at -28LUFS range.
-Low shelf EQ at 120Hz to eliminate the unwanted low end.
-Limiter at -0.5dB to prevent clipping after gain matching.
-Tracks are exported in WAV 24bit/96kHz format.
-All recordings are complete, uncut and non-re-touched.
H3-VR only
-Ambix B-Format is converted to stereo tracks using the plugin Soundfield by RØDE
Recordings
Play Tascam Portacapture X8 – XY
Play Micbooster Clippy EM272 – Omni
Play Haun MBC600 – Cardioid
Play Zoom H3-VR – Ambisonics convert to Omni
Play Zoom H3-VR – Ambisonics convert to Cardioid
Comparing and Discussion
The recordings will be compared in the following criteria; Tonal Quality, Stereo Image, Sense of Space/Envelopment
Tonal Quality
-XY
The sound of the X8 built-in XY microphone is the dullest in terms of frequency response among the five examples. It is lacking the crispness at the 5kHz-10kHz areas, resulting in a less than ideal representation of an open space. Several elements, such as a passing car or the Light Rail, might sound muddy, as if captured from an enclosed place with a lot of low frequency accumulation. This issue may be solved by boosting the frequency range from 5kHz to 10kHz.
-Omni
The recording of the Clippy EM272 provides the crispiest sound of the five recordings. Contradictory to X8 built-in XY microphone, it is particularly rich at the 5kHz-10kHz range, allowing certain elements, such as the pedestrian traffic light, to benefit from it and become the dominating sound. In the Omni audio recording, at 06:39, the pedestrian traffic signal sound sits on top of the passing by Light Rail sound, opposite to the XY audio recording the pedestrian traffic signal sound has been buried by the passing by Light Rail sound.
-Cardioid Pair
The cardioid pair has the most balancing tonal out of the five recordings, not overly dull or sharp in term frequency response. It accurately reproduces how things sound in real life. At 13:10 in the recording it can be heard that the human voice sounds very natural. It has a gentle slope after 10kHz, but also a good amount of content after 20kHz, which is inaudible yet beneficial to sound design work involving pitch shifting.
-Ambisonics (Converted to Stereo)
Both versions of virtual microphones recordings converted from ambisonics are very sharp in the 10kHz range, especially audible on the omni one. This characteristic could make the listener feel annoyed and cause fatigue over a long period of listening. At 4:00 in the recording the high frequency sound of a bike braking is emphasized by the excessive amount of high end response from the microphone.
Stereo Image & Sense of Envelopment
-XY
The XY setup has the narrowest, flattest stereo image among all the recordings. It has a two-dimensional feel, and the sense of envelopment is very weak. It sounds like a mono recording, yet there is a large amount of separation between left and right channel, content only appears on one side without any coherence to the other channel. Listeners may not be able to feel the depth and sense of space of the environment.
-Omni
The Omni setup has the best sense of envelopment, it successfully reproduces the three-dimensional sensation on a normal stereo playback system. However due to its polar pattern nature both microphone receives similar content, the directional information of a sound source is lost in the recording. An example is the pedestrian traffic signal which is positioned on the right side to the recorders appearing in the center of the omni recording. A Stereo Microphone Array/Parallel Boundary Array could be placed between two omni microphones to resolve this issue and produce a realistic stereo image.
Reference: https://www.trackseventeen.com/mic_rigs.html
-Cardioid Pair
The NOS Cardioid setup has the most natural stereo image which offers a good amount of directional information yet maintains the coherence between the two sides. It delivers a reasonable level of envelopment and a natural feeling for the listeners. Due to its incapability of receiving information behind the microphone, certain sound, for instance the Light Rail passing by at the back at 06:38 in the recording, fades quicker than the real world counterpart and not reflecting the actual state of affairs.
-Ambisonics to Stereo
The major advantage of the ambisonic recording is the flexibility to alter the stereo width and microphone polar pattern to suit the circumstances. The Ambix to Omni example shows how to generate a stereo recording with an even wider stereo image than a real-world equivalent. Yet in the Ambix to Cardioid example, it produces a close emulation to a genuine cardioid microphone. With only a few minor tweaks, this flexibility may be quite useful if the equipment is constrained.
Conclusion
While the emergence and formation of the auditory environment, either in urban or nature, is beyond our control, it relies on one’s judgement on how to capture and reproduce a soundscape recording. There is no one approach or piece of equipment that is absolutely right or wrong for performing a field recording session. It’s crucial to concentrate on the recording subject and the range of sounds aimed to collect; with the appropriate decision, an evocative and compelling audio recording will be made.