(HOA LORANGER) Here's part of Jacob Nielsen's keynote
address at the NN/g UX conference in
Las Vegas
(JAKOB NIELSEN) Now we can turn to the second topic which
is how do we maximize the insight we get out of
doing user research and they're really
kind of two things that one typically
looks at when doing research which is
reliability and validity of the study. So
reliability is a question of if I do the
same study a second time will I get the same
result? If I do third a time do I still get the
same result, if I do it 10 times do I get the
same result all 10 times?
Yeah, this is really important because
if you get a different result every time
we try then it's just random. We might as well flip a
coin and it's not of any good. So we do
want reliability but we also want
validity. Validity is do the findings mean anything for
the real world not just for the research
world but for the real world and
reliability we can operationalize based
on these probabilities and the people
who are going to the measuring user experience
course we have a lot of details about
how to do this, but we have very good
formulas and ways to operationalize
reliability to say with whether something's
statistically significant or just
random. Validity, on the other hand, you don't really
have. So I would like to operationalize validity by saying if we
make a business decision based on this
research recommendation are we actually
going to make more money in the company?
Is it actually going to work, does it move the needle
and one of those clichés and I think
that is at least as important as
reliability because if you study
something that doesn't translate to
the real world is also falls for nothing and
there's so much attention to
reliability I think it's because we do
have great formulas for calculating
it and so people forget about the
validity and I think this is very similar
to this old anecdote about the drunk guy
who's looking for his lost car keys
under the streetlight and the police
officer comes around and says
what happened? Oh I lost my car keys. And the police
officer tries to help him find the keys
and they can't find it. Finally the police
officer turns the drunk guy and says are you sure
you lost your car keys is here under the
street light?
And the drunk guy says no... I lost them over at the park but
you know, it is easier to look here, because there's light.
I think it's the same here, it is easier to
think about reliability because we have
a very firm handle that concept really
great formulas for looking at and then
people often forget about validity and
you know particularly when you think about
the way that reliability is usually
thought about, it's usually thought about
as things like statistical
significance which we usually like to
say well if the P is the probability is
less than five percent then we'll say
this is a good research finding but
think about that really means P as a
probability is just a random outcome
So this means that if i do a study is if I
do 20 studies and 19 of them
is the right finding and one time is
the wrong finding well if I'm doing it for
website or for my project and then this is
pretty good odds right? 19 times out of
20 studies I'm going to give you the
correct recommendation for what to do.
One time out of 20 I'm going to give you the wrong recommendation. This
is still so much better than just guessing
which means half the time I'm giving you the
right recommendation half the time I'm giving you the wrong
recommendation, so for any individual
problem
those odds are good this is why 5 percent into
something useful recognized number you do
for reliability. If you think about the world
not your product, but the world, right?
There are thousands upon thousands of user
research studies being done every year and both
people do the confidentially but also people
do them for publication in various places. That
means that one out of every 20 research
papers you read is a bogus finding and now
is now it is much worse, and it's actually even worse than that
because what you hear about so all in
all the people who do a research study and
their finding comes out the same as
what we already know and expect. You're not going to
hear about that. So let's return to let's say banner blindness
so somebody else does a study about
banner blindness and say oh banner blindness this is there.
That is not going to get kind of big hyped up and be
covered everywhere and get millions of
tweets and everything, right? You don't
hear about those studies but you do hear
about to studies that come out with some
weird unknown, or previously unexpected
finding which is very often wrong. Like for
example take the question of response
times and how fast let's say your web pages
appear so basically almost every study
ever done this shows that response times
are important. The faster the website
the more you do your business
e-commerce site they can see that the sales go up
Google can see people do more searches
and everybody that studied this almost will find
this finding so if one more place goes and
does response time studies and say
faster webpages are better, you're probably not going
to hear about it. However, I think about
10 years ago somebody did a study that says
response times don't matter and that
got a lot of attention at the time and so that's one
of the problems about just kind of relying on the statistical
analysis because if the P number
says it is only very rare that this would be a bogus
finding, yeah if it is only your own project, fine
but it is you're looking at all over the
world tens of thousands of studies there's going to be a
lot of bogus findings there. You're going to
hear about often the wrong ones
so my recommendation for how to deal with
that is if you kind of imagine one of these old kind
of balanced scales and you put all the research
findings that say one thing on one
scale. Like say all the findings that say response
times are important, this big list. Over here
you have the other balance scale, you put
that one study that says response times
don't matter, maybe let the balance scale come out that says
woah this is so much more
heavy evidence.
This is the evidence you should trust not this
little tiny one thing. So that's one
of dealing with that and so you can
very easily be led astray if you think
about only about statistical
significance so let me ask you what I
just talked to talked to you about the
study we did about websites in 2016.
What makes that an interesting
study? Why did I tell you about it?
Is it because we had more than 1000 data points?
Is it because we had more than 200, more than 200
tasks 43 websites or we tested in 2 different
countries. Which of these numbers
makes this a good study? Well the 1,000 data points
means that our statistics and numbers are
reasonably tight so that's good
but what I think really makes this study
where the results are
interesting is actually the 43 websites
and the 215 different tasks because
its diversity that makes this
generalizable this is not just like I
just one website for some weird unknown reason
you know people couldn't find what they were
looking for this is across a broad set of
websites of different industries and
when people tried many different things
it's not just try one thing and they couldn't find it
they tried many things and they often
couldn't find it that's what's made it
believable. So I really want to encourage more
diversity in research and trying
different things not just kind of
trying one thing with an enormous huge N
so N is kind of like the number of users
typically and that's what a lot of
people think, that almost like the only thing many
people care about. I think that's the
least important. You want to look at how
many different things we're trying, how
many different designs. Another example
from another thing we did
five different studies of mobile text
comprehension. Kate Meyer led that
project. So what is the best of these studies?
A remote study with 20 users and 4
articles another remote with 25 users and
6 other articles, and lab study with
37, 8 focus groups or
another remote study with more than 200
users. Well this is a bit of a trick question
here because actually what makes this the best is
all of them because what happened is that they
all came out with the same result and
the result was that text comprehension is
actually either the same or slightly
better on mobile than it is on
regular computers which was a big
surprise to us because back in 2010 other
research had found that text
comprehension was much when reading from
mobiles than when reading from desktop screens
and so when we did our first study with
like the 20 users and the study came back
saying, the results came back saying
completely opposite result of the
old research I just said I don't trust this
let's go and do another study with some other articles
maybe these 4 articles there was something
weird about them. Let's do another study.
OK same result.
Now I thought, so we did these two
studies with Userzoom which is a
platform for remote studies that we use
a lot and we've had good results with
but that said I was just thinking well
maybe there is something wrong with Userzoom
in text comprehension, maybe it works for like
e-commerce studies but not for
text comprehension studies so we dragged people
in the lab and we watched them while
they were reading these articles, it was a rather boring study, but we watched them
while they read the articles and same result
and then the focus group is more like talking to people of
how they read stuff so that give us a sense of oh it's
really nice to read a book, like I usually do
that, I like reading from my phone.
And then finally, a more big sample size to
see if we could nail the statistics down
and again same result. Now so why is this? Well
I think one big explanation is that if you
think about what was a phone in 2010 and what's a phone
now
Current phones have six and a half times
more pixels on the screen so you know
one this comprehension on this little crummy screen
another one is on this kind of still not
big but much bigger much better screen
so that's at least one possible explanation. In any
case, my general point was our result
was like the opposite of the old research.
You should not trust that. When
something is like it used to be A
and you do a study and now it's B that is usually
wrong
Usually you're going to be one of 5 percent
cases where you happen to
be wrong but it does happen from time
to time. So if you do a study and it's the
same as you expect, then you can kind of be
ok let's move on to the next thing
because you can't spend endless
resources and every single problem you
have
but if you just study and it comes out
opposite of what all previous studies
show then worry that maybe something's
wrong with your study and then I said
then do it again do it a little bit
different way and see if you still get the
same. After a while you know
yeah we can trust that things have
changed the world but not the very first time
you do a study that's different than
anything else
Ok, so I wanted to emphasize this point
about diversification in the research
that we don't just do one thing and kind of go by that
so there's a variety of ways in which we
can diversify. So one of them is just
different people and we do want to do
that because we do know people are
different. So if I just take one person and
have them use a design
can we go by that what happened in that
one session? Probably not, he could be just a
werido. That happens. So want to have more
than one person but if you got to crank
it up hundreds of people thousands of
people usually it's not worth doing. It is
more important to have different personas, than just different
individuals. So in other words, different types of people
and what we say if anybody went to the
personas class today that Kim taught is
that it's more important to have
behavioral differentiations, have personas
that are defined by people's
behavioral characteristics than their
demographic characteristics because
usually the type of things that we look
at the user experience is usually not
that much difference between demographic
characteristics let's say like a or
gender or location if they are in one city or
another city they usually tend to become
about to see for other types of market
research type of things, that could be important, but for our type
of interaction research, behavioral
characteristics that says their job, their computer
experience, their technical skills. They
tend to be more important, but also
diversify.
Not just for the users but also what you're testing so
test different designs, not just like
your one best idea, but test your five best
ideas. I have five different
designers each come up with their best
idea and test that. That's a much better
way of getting insight into what's going
to be the best solution to your design
problem. Try to have people do different things
not just one thing but variety of
different things and finally also try to
employ different methods so I'm you know
such a super advocate of usability
testing and putting a user down in
front of the computer have to see what
they do
yes but there are also a variety of other methods we
also want to use and use, you know
measurement methods use qualitative
methods use a bunch of different things
and they all kind of together will help you
get a much richer insight and rather than do
the same thing and again and again and
so pouring you know the gold into
this one pot or this one
one approach try to spread it out a little
bit and get a little bit more of different
types of insights. It is also worthwhile
having good with methodology, so here is a
chart that shows a project I did a while
ago where we actually had 20 different teams
do the same study, which is not something we usually
do but in this case 20 different teams all
tested the same software product which
had 8 serious design problems in it.
The dots show on the Y-axis how many been usability problems that team found
and on the X-axis we have kind of rated how well they
did the study in terms of the recommended
ways of doing usability testing that people
would have heard if they went to the
course we had a few days ago, I think.
The course on usability
testing anyway and what you can see is there's a
pretty strong correlation that the
better you run, the more you follow
the methodology advise the more you'll actually
learn about your software. That said, it's
not a 100 percent correlation and also even the
worst teams that only did one-fifth of what
recommended
that's really a bad study, they still found
2 out of the 8 things, they still found a
quarter of the things and if you think
about if you're doing a product
development and you can remove a quarter
of the really bad things in your design
that's still worth doing. I mean, it's certainly more worth doing
to a better study which is why we do
have courses on how to usability testing
right, but even a bad study will still be
worthwhile and still find something so really
go and do this stuff. So to conclude I
want to show you a painting that I saw in London
here, we had our conference in London a few weeks ago. The
title of the painting is "Man Proposes,
God Disposes" and it's from 1864 and
it shows what the artist imagined happened
to Sir John Franklin's polar expedition and
so that was lost and never heard from
again
and so the artist imagined, you see that the
polar bears are like gnawing on the remains of Sir
John and so that's what happened so the
man, in this case Sir John,
proposes we'll go and explore the
Arctic regions but God says no my polar bear
should have something to eat.
So that's a nice little bit moral lesson
there for you from the Victorian age, but
I want to use this sort of as a little
metaphor to think about our world
because our world, you know
the similar slogan would be designers
propose, but users dispose. So you
propose something and say this is gonna be such
a wonderful design yay! And then users say
man I can't just got to go elsewhere and
so we still have that
kind of reality check of facing up with nature.
In this case, nature being the users not the polar
regions and we have to design for
reality. So designers proposed, but
users dispose. So better do your user
testing or your design will be eaten by polar
bears
No comments:
Post a Comment