Double blind tests to prove a difference between two 'sounds' is often seen by people as
a way to prove me and my colleagues wrong.
Let me give you my take on it.
Let me first define double blind.
A test where the subject is unaware of what he or she is judging is called a blind test.
This is a dangerous method for the tester, when in sight, might give away what is played
by non verbal or other clues.
Therefore the double blind test has been designed where both the subject and the tester are
unaware of what stimulus is offered to the subject.
These test principles are seen as good practice in many fields of science and research although
exceptions do occur.
There are cases where it is considered impractical or unethical to apply, like having children
grow up while being deprived of any physical contact to their parents or caretakers.
Although such experiment has been done a few ages ago, supervised by a French king if I
remember well, today - fortunately - it considered unethical and a violation of human rights.
Double blind tests are often performed on a larger group of subjects and the outcome
is statistically analysed.
To keep the outcome conclusive, always one single identifiable item is tested amongst
a larger test panel.
If it is a medicine against a medical condition, one half of the test panel is given the real
medicine and the other half a placebo that looks the same but doesn't contain that
- or any - medicine.
In the end the result in both groups is evaluated and if the difference is statistically in
favour of the group that did get the medicine, it is considered to be effective.
Things would become a lot more difficult if patients having several deceases and undergoing
several treatments at the same time.
But it still can be done since the result is reasonably objectively measurable.
It just takes a lot more effort and a very large group of subjects.
I bring this up since this comes more in the direction of audio evaluation.
I have mentioned this before: my wife loves to cook - hence my physique - and when we
dine out, she always evaluates the food: she describes how the lemon zest was a clever
move to freshen up an otherwise too sweet dish - or things along this line.
I just enjoy the food and am not able to analyse the dish in that way.
My wife has many years of learning, trying, making errors and so on behind her and since
she loves to cooke, I never cook.
But when it comes to audio she is the music lover that enjoys music to the full without
analysing the music nor the audio in any way.
That is my speciality and I have many years of…..
Well you get the point.
But although my wife can evaluate food in any restaurant immediately, I can only detect
major flaws in an audio system when not in my trusted environment and using my own review
references - that by the way are listed on theHBproject.com/en/about.
Evaluating audio is done by subjectively quantifying a large number of sound properties.
A number of properties are easily identifiable like heavy or lack of base, sharp highs, nasal
sound and some others.
You don't need to be an expert to hear these properties.
Just as too much of too little salt or pepper is easily identifiable.
Or when food is burned - not that this ever happens to my wife's food, of course.
More subtile properties take more time.
And then there is the adaption of the brain to situations.
I used to love a rear steak but lost appetite for it a number of years ago.
When I visited a small town in Croatia a few years ago I ordered a steak to find that what
my brain had stored as 'a steak' was that steak that was kept frozen for a long time
and traveled half the world to my supermarket and that the fresh, locally produced steak
in Croatia was so much better.
What if I was asked to join a test panel for frozen globe trot steaks prior to Croatia?
I would have been far less critical and might have helped that producer to a good rating
for a far from good steak.
Many people today grew up with far from good audio, positioned far from ideal and fed with
far from ideal MP3's.
What if we have these people judge audio equipment.
It would give a perfect picture on how the average consumer would judge the main stream
stereo.
Tools for a proper evaluation has to be learned to them.
And even trained listeners like me and my colleagues always need to train when not in
familiar settings with known equipment and known music.
The difference is that we can do that quicker or easily decide to reject the test conditions
- which I did more than once.
As we have seen before both the subject - or subjects - and the tester should not know
what is playing.
This means that changing the setting from situation A to B and back needs to be done
remotely.
When testing sources, this could be done using the two sources, connected to one amplifier
that is remotely controllable.
To be sure the component or components between the sources and the amplifier have no influence,
they should exchange places somewhere halfway the test program.
The sources might - depending on the type of source - be placed outside the listening
room to prevent auditive clues on what player is started.
It should also be evaluated whether switching from input A to B and back is identifiable
by acoustic noise.
If so, the amp and the sources should be placed in another room.
But then the long loudspeaker cables might pick up stray magnetic fields that might get
in the feedback loop of the amp to have it correct for errors that were never present
in the amp.
Reviewing loudspeakers at audiophile level this way is even more difficult since you
don't want two pairs of speakers placed next to each other for the playing speaker
will exite the non playing speaker.
And in all cases there need to be someone present in the room with the subject and someone
else to do the changes.
For you don't want to use extra switches in-between the equipment.
For experienced listeners - experienced in evaluating home stereo equipment in this case
- things become easier when the 'fingerprint' of artefacts is known.
I would relatively easy identify jitter since I know its fingerprint, which by the way doesn't
mean that I always will identify all types of jitter.
But identifying MQA still is difficult since I don't yet have the identifiable fingerprint
of MQA - if there is any.
Perhaps there are more fingerprints of perhaps the influence of the MQA mastering settings
confuses me.
So if I were asked to do a double blind test on MQA, I would demand the time to discover
what those fingerprints are.
Professor Ad Houtsma once developed a scheme that evaluates the progression of the subject
and ends the test when the score doesn't further improve.
These tests were performed for Philips for their DCC codec and the subjects were sound
engineers from Philips Classics, amongst others.
Funny enough I found the end result of the DCC codec to be agreeable while my colleague
at that time didn't like it at all.
And that's the next problem.
When I am asked to do a double blind test and would have agreed to do that, the only
thing you then know is my opinion using my auditory system with its far from standard
training and experience.
And as I mentioned before, even someone with the same level of training but with another
physique and another history, like the colleague I mentioned, can have a different opinion.
That is why double blind tests are preferably performed using a larger panel of subjects.
The limitation here is the quality of the people in that panel.
If you want to advise audiophiles, you better use audiophiles in the panel.
If you want to advise people that have no particular interest in audio, use a panel
of this group of people.
There are many more arguments against using double blind testing for audio evaluation.
Dolby used such a panel to prove that Dolby Digital - in essence a kind of 5 channel MP3
- sounded so good that the panel could not distinguish it from linear PCM.
It makes you wonder why they later developed Dolby True HD if Dolby Digital was already
perfect.
I would also like to mention the test panel that judged the DAB+ codecs to be equal to
MP3 128 kb/s.
You can download the report on ebu.ch.
It is far better consider the work of me and my colleagues like the work of a good restaurant
critic.
There are - of course - critics that like every restaurant for a free meal - but there
are also well respected restaurant critics.
Some might like what you like while others have a taste that differs greatly from yours.
Once you found out with what food critic your are compatible, he can guide you the way to
nice restaurants and food.
The same goes for the work I do.
If you don't like my conclusions, that's fine.
I encourage you to find a reviewer that does and stick with him.
But if you find yourself to be compatible with my taste, my findings might help you
further.
For those that don't agree and want to have me do a double blind test: if you transfer
$ 50.000 to me, I will hire specialists and facilities from the Dutch research institute
TNO and do a double blind test.
For all reasonable people with ears: if you want to be informed when someone does transfer
50 grand, subscribe to this channel, my newsletter or follow me on Twitter, Facebook or Google+.
See the show notes for the links.
If you liked this video, please consider supporting the channel through Patreon so I can remain
independent.
As a bonus you get access to super exclusive videos too.
The link is in the show notes.
And don't forget to tell your friends on the web about this channel.
I am Hans Beekhuyzen, thank you for watching and see you in the next show or on theHBproject.com.
And whatever you do, enjoy the music.
No comments:
Post a Comment