♪ I love it when you call me Big Data ♪
Welcome to the Dr. Data show!
I'm Eric Siegel.
Data science, big data, what the hell do these buzzwords
really specifically mean?
Are they just cockamamie?
Intentionally vague jargon that overhypes and overpromises?
Or are these terms actually helpful?
Do they somehow designate, like, the most profound impact
of the Information Age?
Well, I'll start with the vague and overhyping side
and then circle back to why these buzzwords
may matter after all.
It's time for the Dr. Data Buzzword Smackdown.
There are a lotta problems with these words.
First, data scientist is redundant.
It's like calling a librarian a book librarian.
If you're doing science, it involves data, duh.
Furthermore, and don't tell anyone I said this,
but real sciences like physics and chemistry
don't have science in their name.
Your science is trying too hard
if it has to call itself a science.
Social science, political science, data science,
and I gotta say, even though I have three degrees in it
and was a professor of it,
computer science is an arbitrarily defined field.
It's just the amalgam of everything to do with computers,
as a concept and as an appliance,
from the engineering of how to build them
and the deep mathematics about their theoretical limitations
to how to make them more user friendly,
and even business strategies
for managing a team of programmers.
Universities might as well also have
a toaster science department,
which covers the engineering of better toasters
as well as the culinary arts on how to best cook with them.
But I digress.
Okay, next buzzword, big data.
First of all, it's just grammatically incorrect.
It's like looking at the Pacific Ocean and going big water.
It should be a lotta data or plenty of data.
But the real problem with big data
is that it emphasizes the size.
'Cause what's exciting about data isn't
how much of it there is per se.
It's about how quickly it's growing
which is amazing, by the way.
There's always so much more data today
than there was yesterday.
So we're gonna run out of adjectives really quickly.
Big data, bigger data, even bigger data, the biggest data.
Actually, there's been a long-running conference called the
International Conference on Very Large Databases since 1975.
I'm not joking.
That's before the first Star Wars movie came out.
Now, in some cases, people use the terms data science
and big data just to refer to machine learning,
i.e. when computers learn from the experience
encoded in data.
That's the topic of most episodes of this program,
The Dr. Data Show.
It's a show about machine learning,
which is a well-defined field
and, by the way, is also often called predictive analytics,
especially when you're talking about its deployment
in the private or public sector.
I would urge folks to use the well-defined terms
machine learning or predictive analytics
if in fact that's what you're specifically talking about.
But as for data science and big data,
in their general usage they suffer from
a terrible case of vagueness.
They have a wide range of subjective definitions,
which compete and conflict.
Basically, they're often used to mean nothing more specific
than some clever use of data.
The terms don't necessarily refer
to any particular technology, method, or value proposition.
They're just plain subjective.
You can use them to mean whichever technology you'd like.
Machine learning, data visualization,
or even just basic reporting.
But much worse than that, this vagueness often serves
to mislead and misrepresent by alluding to capabilities
that don't exist.
For example, the popular press,
as well certain analytics vendors,
sometimes use data science to denote
some whole collection of methods
that includes machine learning
as well as some other advanced methods.
The problem is, those other advanced methods are implied
but often actually don't really exist.
They're vaporware.
This confusion is sometimes inadvertent,
such as when journalists aren't fully knowledgeable
of the topic yet want it to sound as powerful as possible
but, either way, the end result is souped-up hype
that overpromises and circulates misinformation.
All these issues, by the way,
also apply to the older-school term data mining,
also totally subjective.
Besides, calling it data mining is like
instead of gold mining, saying dirt mining.
Malfunction, failed analogy.
'Cause we aren't searching for data,
we're searching within data.
So now you're probably asking yourself,
how could Dr. Data come down so hard on these words
if he loves data so much?
Well, no, Dr. Data doesn't hate these words,
only the misleading ways in which they're often used.
Dr. Data's love for data is fully intact.
After all, he named himself after it.
Anyway, let's talk data for a moment.
These buzzwords are all data this and data that.
So what exactly is all the fuss about data?
I mean, most people couldn't be less interested in data.
The non-geeks out there think it's the driest,
most boring word ever.
The word data is a deal-killer at cocktail parties.
I know from personal experience.
I have the data.
And data just grows like a weed anyway.
It's so indiscriminately collected and warehoused,
like some bland, uninteresting residue
that companies dump into the cloud
as they transactionally churn away endlessly.
But, no, that's wrong.
Actually, let me make a correction.
It isn't indiscriminate.
The stuff logged into all these memory banks
are exactly the things that matter.
That's why they're being recorded.
People think data's boring because they're overlooking
the fact that data is experience.
It's a long list of prior events
from which it's possible to analytically learn.
In fact, we could say that data is powerful
and all-encompassing for the very same reason
that it's misconstrued as boring,
which is that it's very abstract.
Data can mean anything and everything.
In its most abstract, it means nothing in particular,
but in the particular, it always means
something valuable and interesting.
Every medical diagnosis, medical procedure,
credit application, phone call, Facebook post,
movie viewing, ad click, fraudulent transaction,
spammy e-mail, traffic camera passed, flight taken,
earthquake, purchase, successful or failed sales call,
each positive and negative outcome of any significance
is encoded as data somewhere.
There are quintillions and quintillions of bytes.
That's my Carl Sagan impersonation.
Data grows by an estimated 2.5 quintillion bytes per day.
A quintillion is a one with 18 zeros after it.
And here's the big win.
We can improve everything
with this data.
All the main functions and day-to-day operational decisions
of companies and governments are exactly
what these data streams are recording.
Therefore, data records exactly the right,
relevant experience to apply predictive analytics
where it's needed most.
We have just the right data for this technology
to learn how to streamline the major operations
behind financial risk management,
fraud detection, marketing, law enforcement,
healthcare, and manufacturing.
Boom!
This is major.
We're witnessing an epic, fundamental shift
in how technology integrates with, alters,
and improves society and its functions.
And so data isn't the most boring after all.
In fact, it's the most
sexy?
The Harvard Business Review declared data scientist
the sexiest job of the 21st century.
I mean, really?
Data people are the most sexy?
That's great news!
Geek is the new chic.
It's hip to be square.
You know, I had always assumed the sexiest profession
was firefighters, but who knows.
Maybe it's just the hard hat.
This is a picture of me dressed up
as a data miner for Halloween.
Actually, the New York City Fire Department
uses predictive analytics to triage
and prioritize the inspections of buildings
with the highest risk of fire.
Yet another priceless application of machine learning.
Anyway, we actually produced a rap music video
about predictive analytics and how being a data geek
affects your social life.
It's the the best ever educational predictive analytics
rap music video ever created ever, period.
And also the only one.
Just three and a half minutes long.
You can check it out at PredictThis.org.
In conclusion, there's a lot to be excited about
when it comes to the data explosion
and what we can do with it.
The buzzwords are kinda inane when viewed up close.
Perhaps an equally appropriate and less misleading buzzword
for all this would be datapalooza,
but, in any case, the terms really allude to a culture
of smart people doing creative things
to make value of all this data.
Today's totally historic advent of having data
about everything and using data for everything
is mind-blowingly profound and important.
I'm Eric Siegel, thanks for watching.
Hit like and share this video if you think your friends
were also wondering what the hell data science
and big data really mean.
And for access to the entire web series,
go to TheDoctorDataShow.com.
♪ Who's your data? ♪
♪ Provide me the data to improve ♪
♪ And I'll apply the computation ♪
♪ I love it when you call me Big Data ♪
♪ Predictive analytics can help you with decisions ♪
♪ You can call, mail, credit, or hire with precision ♪
♪ On law, love, and life, you can prognosticate ♪
♪ Whom to investigate, incarcerate, ♪
♪ Set up on a date, or medicate ♪
♪ Charlie Brown never gets his kicks ♪
♪ That's why every old dog needs a brand new trick ♪
♪ If you get sick of chasing sticks ♪
♪ Or clicks with just a quick fix ♪
♪ You need to learn to predict ♪
♪ I can predict your every move ♪
♪ Just gimme all your information ♪
♪ Who's your data? ♪
♪ Provide me the data to improve ♪
♪ And I'll apply the computation ♪
♪ I love it when you call me big data ♪
No comments:
Post a Comment