Audio Global Clari-fi's Enhancement for Headsets
Another Rant by O. Gadfly Hertz, Jr. (perceptual gadabout)
A few years ago we were challenged by our peers to join a panel at AES 135 on the subject of "listening fatigue". There we joined various emissaries of the audio industry in an attempt to generate a standard model of listening fatigue, define it and design scientific methodologies to alleviate it for the purposes of improving Quality of Experience (QoE) as measured by Time Spent Listening (TSL).
We brought to bear research on different types of fatigue scenarios, what their causes were and what would have to happen to the compressed content after decoding to reduce fatigue. What would happen during and following our lecture on the panel is the subject of this white paper.
The "Big Four" Problems that Cause Listening Fatigue
Our research was performed to define real world, implementable algorithm stacks which would serve as predictors that model characteristics best defining what causes fatigue for the naive listener in a real world scenario. While there was no "meta-theory" or perfect answer for quantifying listening fatigue per se, our research and methodology presented at AES 135 pointed at the four most glaring shortcomings of compressed audio. They are divided into the following categories:
I) Spectrum sparcity in music and damaged hearing (which everybody has)
Kinocilia Damage: sensory/motile cells within the ear's Organ of Corti are missing/damaged/dead along several frequency places within the cochlea
Time Varying Spectrum Zero's in the L-R parse of the stereo multiplex of compressed music
Quantization Noise (a natural part of the compression process) that degrades instantaneous SNR and worsens as bit rates drop
II) Narrowed image perception
Codecs narrow image width as a function of bitrate (lower bitrate = narrower width)
Age related experience (through aural cortex plasticity) reduces our expectation of lateral axis stimulus, the result: reducing perceived width of music.
Narrower expected/perceived image increases inter-parse masking, degrading separation of mix elements
III) Mask/Artifact decoupling
Artifacts become more noticeable as a result of spatial decoupling during audition through less than optimal stereo presentation
Artifacts are concentrated in the surround channels during upmixing/rendering
IV) Over Equalization in response to failed expectation
Over Equalization exacerbates downward adaptation of human hearing system. This natural, adaptive equalization feature of human hearing alters the gain of spectrum groups. An active musculature process, the motile kinocelia will eventually tire of (unmet expectation driven) perceptual over-compensation and eventually fatigue to failure.
6dB of equalization (very typical) quadruples the needed amplifier power/headroom of both the amplifier and transducers leading to excessive DC (long term clipping) and possible premature driver failure due to thermal cyclic stress.
There are several problems to consider, let alone address. Applying a best possible fit for continuous correction of these shortcomings, (not waveforms or original spectrum) could theoretically improve the users QoE dramatically, reduce battery drain (due to reduced power demand) and save/enhance the hearing experience for all users.
A Small Eureka
So the question was asked, "what if we overcorrected best possible fit as determined by the four predictors?" Would this result in an even greater level of enhancement to the listening experience? Would it increase the TSL of compressed musical content?
Changing Fatigue to Intrigue
Over the next few months our team configured methods for parametrically modifying stereo audio content using the predictors we designed for measuring human response to compressed audio shortcomings into a tunable, real time method of measuring and perceptually shaping the compressed audio to more of what the listener expects to hear.
Our predictors are so good, in fact, that applying over-unity best possible fit (BPF) compensation to compressed audio results in even more improvement in QoE as represented by TSL.
An analytical version of the technology allows us to analyze the "e-sense" (what the individual model of wearable excels at) of various popular wearables, intra/supra aural headphones and earbuds/earpods and apply our powerful psychoacoustic predictors in a way that augments the perceptual coupling of that unique wearable and the wearer. The result is a listening experience that is far closer (by a large margin) to the expectations of the listener than just the wearables and available content alone.
This contextual/perceptual approach to applying these audio modifiers will in no way correct or make whole what was lost or distorted. To say otherwise is ridiculous.
The Real World
At first it isn't obvious. The idea of altering content for any use just seems to go against everything we believe in, but the truth is - It happens in several places already and how it is altered is far more egregious than anything we could ever conceive of in the control room. Here's a short list of what typically affects the perfect audio that artists and engineers worked so hard to produce:
How compressed the song is
Technology origin bias (caused by era and genre of the song)
The playback medium (headphones, automobile, dock, soundbar, home)
Background environment contamination
The condition of the listener's hearing
The listener's expectations
Think about it. The song is compressed. But what if it's really compressed. What if the recording technology of the song era was less than stellar? What if the playback environment is different from what the song was mixed and mastered on? What if your listening environment is mobile and affected by local noise contamination that isn't stationary? What if the listener has damaged hearing? What if the listener just isn't hearing what he/she wants?
The scientists at Psyx Research and Audio Global LLC dba MindMagic®Audio realized that there has to be psychoacoustic analysis performed on the user with their wearable of choice (this analytical pass must be incorporated into the music player's processing predictors for each unique brand and model of wearable). Without the listener and environment in the psychoacoustic analysis pass, it would be pointless to include any processing to the music in the signal path because it would be without purpose...just different.
MindMagic®Audio allows the listener to audition content with their wearable environment. Engaging and disengaging MindMagic®Audio trains the user's expectation while allowing them to experience the drastic difference between the raw and augmented experience. At the end of selection the user knows the best choice has been made by them personally with their wearable of choice.
MindMagic®Audio modifies the following parameters:
Broadband Spectrum Extension
Front Image Management
Back Image Management
Low Frequency Baseline
Low Frequency Modulation
Low Frequency Persistence
Mid Frequency Baseline
Mid Frequency Modulation
Mid Frequency Persistence
High Frequency Baseline
High Frequency Mod
High Frequency Persistence
These parameters act on different domains of the content and wearable to bring the highest QoE of the unique song with it's unique compression to the unique user that is listening to it through their unique playback apparatus under unique circumstances.
The powerful predictors of MindMagic®Audio are made even more powerful by including the listener in the quality of experience determination. The "Big Four Problems" have been addressed effectively and now solutions for every single instance of the problem are engaged.
The Big Four Solutions
Recognizing the problem, designing the solution basis and including the listener "in the loop" results in the following QoE corrective processes:
1. Kinocilic Recruitment
This is a method of parametrically creating shaped double-sideband modulation spectrum to increase frequency density. The kinocilia (hair cells) within the cochlea neurally group together, filling in spectrum gaps along the Organ of Corti.
Benefit: everybody's hearing improves
2. Adaptive width recovery/management
Codecs narrow image width. Age driven learned response reduces aural cortex expectation of lateral stimulus, reducing perceived width. AWR fixes the problem by widening the content just enough to plastically restore perceived width
Benefit: Larger image area reduces inter-parse masking, improves separation of mix elements
3. The Auxiliary mask
Perceptual artifact reduction in the L-R parse of the stereo multiplex through adaptive width synthetic aperture in L+R parse of stereo multiplex. Artifacts that become more noticeable as a result of the adaptive width recovery are effectively masked, allowing for better control of the soundstage.
Benefit: Artifact free stereo (when used with compressed source), Fuller center image, no artifacts in surrounds when used with up-mixing systems.
4. Adaptive, derivative based equalization
Equalization that drastically reduces/eliminates downward adaptation of human hearing system. This adaptive equalization is applied as a function of the inverse derivative of the transient, not simply boosting or cutting frequencies, which leads to natural adaptation by our hearing system.
Benefit: Great for extending bass and brightness. Saves power, reduces transducer temp, saves/improves hearing and wins the war for the Allies.
Our Worthy Competition
So, you see...it's kind of difficult for us to sit back and watch our competitors take wild swings at the very processes we pioneered while making impossible claims and otherwise cheapening the market we created.
The information we freely disseminated at AES135 has become guidance for our competitors. Some things they've got kind of right, and the rest...well they just flat missed the mark entirely. Here's an example of what they think we're doing. Only the names of the manufactures have been hidden...the rest of it is verbatim. We'll address their widely distributed claims one at a time.
"Rebuilds Lost Details"
Now hold on. Unless they're using deep machine learning (we're sure they're not...unless they've got an ExaFloP of processing power laying around) they can't effectively predict which details have been lost, much less "rebuilding" them. We're still scratching our heads on that one.
"Cl****** intelligently corrects waveform deficiencies"
Now one thing we can't have is deficient waveforms. What is a deficient waveform anyway? Could someone explain to us what constitutes a waveform that is deficient? What does it look like? What does it do?
I think we can all agree that we cannot restore what we don't understand to be missing. The term "High Fidelity" refers to the exact replication of the original source. Without knowing the pedigree of the source restoring might just be guessing. We would love to see their prediction models.
"Recaptures the missing highs and lows"
We didn't know they got away...lol. "Recapturing" is right in there with "Restoring". Perceptual compression doesn't cause low frequencies to escape. It does, however, trade-off high frequency bandwidth for less compression in the lower octaves...which is a good trade-off. High frequency synthesis is effective in the hands of an expert, but it may not be without understanding the original artistic intent of the piece of compressed music. In fact, it may create more annoyance.
"Restores the vocals to their natural tone"
Again, when is tone natural?? What is meant by tone?? Is it the intent of this product to pass judgment on what is or is "not natural," followed by the infliction of some effect on the already damaged content?...not cool.
"Returns to a true stereo sound"
At what point does stereo turn from a false sound into a true sound? Seriously, are we going to have this argument again? We need to stop "correcting" things we have no measure for (stereo width is defined in dB as the difference/sum power ratio of a stereo multiplex...that has a reference)
"Removes unwanted distortions and artifacts"
You CAN'T remove artifacts (or distortion for that matter). You CAN, however mask artifacts through image re-parsing...which makes sure that there is uncorrupt masker spectrum in the same image sector and in sufficient temporal alignment with the offending artifacts.
We celebrate great ideas and their effective execution from anywhere, even if it is our competition. There is nothing wrong with losing out to a superior technology. This, however, is not the case. Here we see a shoddily prepared, rushed to market application that doesn't even come close to matching the hyperbole with which it is promoted.
"First to market at any cost" is an antiquated, tired philosophy that has no place in the fast paced and highly competitive app market we live in.
The source of this product and the promotional "documentary" (which is as flawed as the product itself) should rethink their strategy and get in sync with the rest of us.
Bullshit is destroying the reputation and efficacy of this business.
Preservation, dissemination and protection of music and audio art is important... important enough for tiny companies like us to stand up and call out the giants when they do something stupid. It costs everybody real money and real time to recover from the market being flooded with still yet another partly baked idea. The giants may be able to afford it, but the result is a dumbed down audience with 50,000 downloaded songs they didn't pay for, that they never listen to.
Whatever happened to goose bumps and actual emotional impact (remember that?); music being used for something other than noise masking or wallpaper?
What the musician, engineer and producer do.
What the world of music appreciating consumers do.
We're doing everything we can to assure ever-improving quality of experience. We want to see this industry restored to its original integrity. Why don't they??
We work very hard to improve or solve seemingly impossible scenarios in content capture, editing, distribution and consumer productization. Too hard, in fact, to stand silently by while someone (even a very large company) makes a mockery of the science of perception and it's uses in improving the state of the art of audio in general.
Responses are welcome...