Friday, April 27, 2018

Youtube daily report Apr 27 2018

[ INTRO ]

Earlier this week, you might've noticed headlines about how having asthma or hay fever

can increase your risk of developing a mental illness.

But you don't have to worry and run off to your doctor.

These headlines are reporting on a study published this week in the journal Frontiers in Psychology.

And if you take a closer look at that paper, the results aren't as clear — or dramatic

— as they might seem.

Your body is one giant interconnected system, so it makes sense that some diseases might

be related or have the same root cause.

Researchers have spent a lot of time looking into this idea, especially when it comes to

psychiatric disorders and allergies.

Allergies are when your immune system overreacts to something harmless, like animal dander,

and treats it like a threatening invader.

And some studies have suggested that the stress or inflammation from how our bodies fight

allergens could throw off brain chemistry.

Admittedly, the research is ongoing, and a lot of things are up for debate.

But if it's true, and if we understood how that connection worked, we could maybe prevent

some mental illnesses.

So far, studies have mostly focused on links between specific conditions — like schizophrenia

and being allergic to things like pollen.

What's special about this new paper is that it looked at the likelihood of someone with

certain allergies developing any psychiatric disorder.

Specifically, this team looked at patients who had three of the most common allergic

diseases: asthma; allergic rhinitis, which is sometimes called hay fever; and atopic

dermatitis, which is a kind of eczema.

Using a national database from Taiwan, the team collected 15 years of health data from

over 180,000 patients.

Almost 47,000 had these allergic diseases, and around 140,000 didn't.

Then, the team calculated how many people with allergies were /also/ diagnosed with

a psychiatric disorder over that decade and a half — something like anxiety or depression.

And the results were kind of surprising.

They found that almost 11% of those with allergies developed some kind of psychiatric disorder,

while only about 7% of the control group did.

In general, that meant the people with allergies were statistically about 1.66 times more likely

to develop a mental illness over that 15 years.

Now, just because these data suggest that these conditions might be related doesn't

mean that one causes the other, or that they definitely come as a pair.

For instance, the team also found that those who just had eczema, or who had hay fever

and eczema together, actually had a lower risk of developing psychiatric disorders than

the control group.

Also, the data from this study came from only one population, so it might not be representative

of everyone.

Besides, a risk factor of 1.66 really isn't that high, especially if you compare it to

the risks associated with things like genetics.

So it's not actually that alarming.

Not to mention, scientists still have to figure out the mechanisms that could link allergies

to mental illnesses — which this study didn't look at.

So if you get sniffly in the springtime, don't worry about your health any more than you

normally would.

On a much smaller scale, a team of researchers announced on Monday in the journal Nature

Chemistry that they've found a new form of DNA hiding in your cells!

It's called an i-motif.

And it's not, like, changing how we think about humans as a species or anything.

But it can help us understand how our DNA gets processed in our cells.

When you think of DNA, you probably think of the textbook double-helix structure, which

was discovered in the 1950s.

But for a while, scientists have known that strands of DNA come in at least a couple other

shapes.

Instead of looking like a twisted ladder, the i-motif looks like one strand of DNA got

tangled in a knot.

To make that happen, DNA's building blocks — called nucleotides — pair together in

a weird way.

Normally, cytosine bases always pair up with guanine.

But in the i-motif form, cytosine pairs with other cytosines.

I-motifs form in parts of DNA that are cytosine-rich.

Part of one DNA strand folds over itself and gets bunched up, so the other strand is alone.

It kind of looks like one side of a zipper got twisted into a knot.

This happens because one of the cytosines gets an extra proton from its environment

and becomes slightly positively charged, while the other stays neutral.

So they can bind.

Researchers had observed the i-motif structure before in test tubes, but now, we've seen

it in a few kinds of living human cells!

Like epithelial cells, which do things like line your organs.

To track down this elusive DNA, the team designed a new antibody fragment from pre-existing

molecules.

Antibodies are proteins your immune system uses to track down invaders, so they're

really good at binding to specific substances.

This one was made to cling to i-motif's funny structure, and to glow under certain

lights so the scientists could know where it was.

And it worked like a charm!

After tracking the i-motifs for a bit, the team also realized why they might've been

hard to find: these structures don't exist all the time.

Mainly, i-motifs form during the G1 phase of the cell cycle, when DNA is being read

by other molecules to make proteins and get ready for cell division.

Once that phase is over, most i-motifs seem to go away.

Although it's not clear if the DNA changes shape by itself, or if another molecule helps

untangle it.

Either way, this suggests that i-motifs are mainly involved in helping dividing cells

express certain genes and make certain proteins.

So even though it's been over 50 years since we first discovered the main structure of

DNA, we've still got so much to learn about it.

Thanks for watching this episode of SciShow!

We're able to bring you new science news every week thanks to support from our Patreon

patrons, like Matthew Brant, our president of space!

If you'd like to be president of space, submit questions for us to answer, or join

us in our monthly hangout, you can go to patreon.com/scishow.

[ OUTRO ]

For more infomation >> Your Asthma and Allergies Aren't Causing Mental Illness - Duration: 5:48.

-------------------------------------------

The Buzz At 4: April 27, 2018 - Duration: 3:24.

For more infomation >> The Buzz At 4: April 27, 2018 - Duration: 3:24.

-------------------------------------------

Taskar Memorial Lecture 2018: M. Jordan (UC, Berkeley) - Duration: 1:02:02.

- It's my pleasure to introduce Mike Jordan as the

invited speaker for the 2018 Taskar Memorial Lecture.

This memorial lecture recognizes the late Ben Taskar,

who was an exceptional machine learning

researcher, colleague, and friend.

Ben made contributions in many aspects of the field,

including probabilistic modeling, computer vision,

and natural language processing.

Today, we also celebrate the Taskar Center for

Accessible Technology that's led by Anat Caspi.

One of the focuses of this center is on the translation

and deployment of open source technology to improve

the lives of people with various abilities, or disabilities.

On the academic side, the center focuses on

the integration of universal design principles

into the engineering curriculum.

And as an example of something that the center has done,

they've created something called access maps

that are really popular ways of automating

pedestrian routing, based on people's abilities.

I also wanna acknowledge Ben's family,

who's come here today from the Bay area.

So it's exciting to have Mike giving today's lecture.

Mike was Ben's post-doc advisor at Berkeley,

and Mike is one of the most distinguished researchers

in machine learning and statistics.

Mike's won tons of awards that I won't attempt to list,

but these include being inducted into the National Academy

of Sciences, National Academy of Engineering,

and the American Academy of Arts and Sciences.

And today, Mike is gonna be talking on aspects of

optimization, which have connections to some of the work

that Ben did with Mike on structure predictions.

(applause)

- Thank you very kindly, it's always a pleasure to be here.

I've been in this room several times and I love coming here,

it's such an amazing crowd and you're all so big.

Thanks to Emily, she was really a post-doc with me.

She came unofficially, but she'd come from MIT

to visit us at Berkeley, and did some really great work

that helped spur the field forward.

So I wanna thank, obviously, Anat and family for inviting me

but there's really a deeper thanks I wanna give to them

which is, thank you so much for doing all the activities

you've done in honor of Ben, and keeping him with us.

Because we miss him,

and the fact that you have the center,

that you have this event, it gives us a place

and a grounding that we can come back to and remember Ben,

and feel like he is still with us.

So I wanna thank you profoundly that you didn't do

just something for your family, but you did something

for really for all of us, so I really appreciate that.

So if you get to my age in life, you have lost many people,

and so in fact my parents are gone, and some great friends

like Ben are gone, and you get used to that feeling.

It's not a great feeling, but it is part of life.

But I particularly notice it when something happens

that I wanna tell somebody about.

So with my mother, it's like if something happens

with my children, I wanna tell somebody, and I think,

ah, my mom's not there, that's too bad.

Well, with Ben, he was probably the number one person

when there was an academic idea that either I'd thought of

or someone else had thought of, I wanted to tell Ben.

And so in fact, this talk is really for Ben,

this is material that I've been working on for the

last few years that I would wanna tell Ben about.

And so in my own mind I'll be giving the talk to him.

It's somewhat technical, and I normally don't give

quite as technical a talk, it's kinda math-y.

But that's the kind of things

that Ben and I would talk about.

We both loved math, both the

beauty of it and the fun of it.

So I certainly remember being around Ben and having this

little playfulness, you know, twinkle in the eyes, little

smile kind of always on the face when we'd talk about math.

He understood the playfulness of being an academic,

and the beauty of that, and the fun of it.

And the humor associated with it.

So anyway, I wanna thank all of you for bringing me here

to participate in this really, celebration of

all of us, but particularly Ben.

So in fact, Ben taught me a lot about optimization,

and so optimization is the theme of this talk.

Lot of people are talking about

AI these days in machine learning.

Frankly, I don't think we're there yet, I don't think those

fields have yet to do anything really that serious.

So it's amazing how much hype we have now about it.

Most of the ideas have been discovered in statistics,

optimization, operations and research,

areas of computer science, economics and so on.

And machine learning is basically an integrative discipline,

it brings these ideas together to solve some problems.

Ben really created a whole lot of new ones,

it per say, frankly.

And the fact that it's being rebranded now as AI,

as if we've solved the big problem of intelligence,

which we haven't even begun to solve.

Maybe in my lifetime we'll start to perceive it,

but we're not perceiving any yet.

And so the fact that we're talking in this kind of

exalted language is alarming to me.

And so I, as the kind of wave is pushed in that direction,

of all the hype and all the anxiety,

I've headed in the exactly opposite direction,

which I typically do in my career,

I go back to the nuts and bolts and try to do some

more theory, and try to figure out what won't work,

and get more guarantees, and so on and so forth.

So for the younger people in the audience, I wanna

encourage you, this is not the big era of AI.

It is just not, all right, it's not gonna work

in the way that people are talking about it.

We're just kinda learning some mappings from some data

and there are some problems associated,

there was a nice panel this morning on the issues

that are now rightfully coming up.

And it's not because we thought we were gonna solve

the big problems and suddenly we see they're problems.

It's rather, this is an interesting discipline,

it's gonna take decades to roll out, just like

chemical engineering, just like civil engineering.

And I wasn't around in those eras, but the 50s and 40s,

when those were rolling out, I bet they didn't have

the level of hype that we have now.

So I have a lot of respect for what they did,

they changed the world, so I think that this,

we're building in some sense the first

engineering discipline for humans.

Chemical engineering and civil engineering weren't

really about humans, I mean, we've built buildings

that people lived in, but this is really something

that's really all about humans, and their

interaction with science and technology.

And so that's an exciting but

extremely challenging thing to do.

So we need to be sober and take the decades

it needs to be taken, to do this.

So I think I've said enough about that, but just to give you

a little of the context to what I'm thinking about.

So optimization is a wonderful playground,

but it's a lingua Franca, it's a place where

many people come to and talk together.

So it came out of many many fields and many have

contributed, but I think not that many people

went into optimization as their first field.

They kinda end up there later in life when they realize

how useful it is, and how communicative

it becomes to work in that field.

Okay, so, let me start then with

some of the actual content here.

Okay, so it's certainly played an increasingly

central role in statistical machine learning.

It can't really be the whole story, because machine learning

is also about uncertainty, and so you need sampling ideas

and all that, just optimizing can't be the whole story.

But it has really been very, very central.

It supplies algorithms, that's great,

but it also supplies lower bounds

and therefore fundamental understanding.

So let me say something about that.

When I learned optimization, it was

in the 1980s, I was a student.

And I took some classes and read some books,

and it was presented to me as a field that was

dead and done, and a big success.

You had simplex method, you had conjugate gradients,

you had Newton's method and so on.

And they really worked, and there were rates of convergence

and there was good software, and there's just

a tool to be used by everybody.

And there was this famous book by Press et al,

that you just picked up and used it.

And that was in the 1980s.

So great.

That was false, it was not dead and done at all.

So there were a couple of revolutions happening around that

time, and mostly associated with the former Soviet Union.

And that's another connection to Ben.

I think that may be why Ben was so interested in

optimization, he'd been trying to pick that up.

And one of the revolutions was

called the interior point method.

It was developed also in the US, but the Russians

really, really took it much, much further.

And it was just an older, but a revitalized perspective

on how to do optimization, and it's had a huge impact.

But the other one was gradient based optimization.

And stochastic gradients and stochastic approximation

and these ideas that went back to the 50s.

But they were picked up in that era, and taken

to the, really, a beautiful level.

And in particular, lower bounds were discovered.

And that's a conceptual leap,

most fields don't have lower bounds.

Statistics has lower bounds, computer science has very few.

And I think it's a sign of the maturity of the field

that it has lower bounds, and optimization finally got

lower bounds in about, circa 1970.

And so this was all done in the Russian school,

and once you get lower bounds you start

to get fundamental understanding.

You see the best you can possibly do.

And so that strips away a lot of the details

of specific algorithms, it tells you about

what's the best you could possibly do?

All right?

Now that's an interesting concept,

because we're doing optimization, here,

so we're developing optimization algorithms.

So we're asking for the best optimization algorithm.

The optimal optimization algorithm.

So already, that's kinda conceptually neat,

what could that possibly mean?

So that kind of thought inspires a lot of my work here,

and the work of other in this field,

and that's the kind of thing that I,

again, say, Ben would have loved.

This kind of recursivity and so on.

So, let's get into that.

Now, once that happened in the 80s,

as is often the case, brand new issues came out

and it was realized that this field is not done at all,

there's a lot of open problems.

And now as people with statistical interest have

moved into this field and started to use it,

it's even more clear how many problems are open.

So I hope by the end of this talk you see that

I'm not actually trying to show you any real results,

I'm just trying to show you some

cool problems and some open areas.

Okay, so let's talk about saddle points a little bit.

So this is gonna be some work that's actually joint

with Sham Kakade in the audience, you have a lot of

good people in optimization here at UW,

and Sham is a particularly noteworthy person

who has kinda helped, from whom I've learned quite a bit

from his papers and with talking with him.

So we used to think, at least in the 1980s,

that bad local minima were the main problem,

kind of learning about these from the complexity

theorists that told us that that was

the source of all the complexities,

and physicists with spin glasses and so on.

And as time has gone on, it's sort of not been

the case that it's been the local minima.

That a lot of problems there are no local minima,

but there are a lot of saddle points.

Saddle points, in two or three dimension don't look

like a big deal, there's kind of like a little saddle,

and you just kind of move your way around it.

But in a hundred thousand dimensions,

saddle points can be a very, very big deal.

It might be that almost all the dimensions are bad ones.

You stay stuck, and it's hard to find

the dimension that gets you out.

And if you have lots of saddle points, you'll get stuck

for a long time, then you'll move to the next one,

you'll get stuck for a long time, I mean you'll get

kinda stuck for way too long, for months.

In fact, if you train a neural net, and I won't say the word

neural net ever again in this talk, but (laughter)

if you train one, it goes down in

the air, and then it flattens out.

And then after a while it goes

down again, and it flattens out.

Those are saddle points.

And it's probably a ring of saddle points and then

another ring, and it's kind of going past those rings.

Again, hundred thousand dimensions, or million dimensions.

So understand that the behavior on saddle points is actually

really, probably a good chunk of the problem here.

Okay, so what ingredients are we gonna use?

What's gonna be in our toolbox.

Well, again, we're gonna be mathematicians

in this talk, we're gonna simplify.

So we're gonna use a few ingredients.

Gradient descent, critically,

stochastics, critically, partly for algorithmic speed

reasons, but also 'cause we are statisticians, too,

and we wanna always have probability in mind.

And then we'll talk a bit about acceleration,

'cause acceleration, you wanna move as fast as possible.

We're trying to move as fast as possible so we have

to have some notion that you're accelerating.

That's one of the reason why we have those ingredients,

the other is just that experience has taught us that the

simpler the methods, the better they tend to work and scale.

And so as computer scientists, we're interested critically

in scaling and a robustness in working.

And the more complex methods with second order and Hessians

and so on, are a little bit deprecated right now because

they don't scale, and they don't tend to be very robust.

Now that will end, the pendulum will swing back,

but these three ingredients are already

getting us quite far in our thinking.

So here's a saddle point.

Look at the saddle shape.

We're gonna be talking about strict saddle points,

where there is a negative eigenvalue which is

strictly negative, and our question is gonna be

how to escape these things efficiently.

Now here's a computer science word.

We don't want it to take exponential time to escape

a saddle point, if exponential is exponential

in the number of dimensions, for example.

That would be a disaster in a hundred thousand dimensions.

So can we do some theory and develop some algorithms

that assure that this doesn't happen?

Okay, so here is this first little line of work,

I'm gonna put together a whole bunch of little lines of work

and this one'll probably be one of the longer ones,

and again, Sham is on the list there, but also

Chi Jin, student at Berkeley, Rong Ge, and

Praneeth Netrapalli, so it's a very geographically

distributed collection of people.

All right, so a few facts from some recent work that

several people have been doing, is gradient descent.

Well this an old algorithm, it goes back to at least Newton,

and some historian can tell me how much further it goes

back, but you're going down the steepest descent direction.

Now, in discrete time, it's kind of

hard to analyze some of these things.

So it's often useful to go into continuous time

and develop a differential equation.

When you go into continuous time with gradients,

you get something called gradient flow.

And that's been studied by the

mathematicians for a long, long time.

It's just a differential equation

that flows along the gradient.

And there's something called Morse theories,

and there's an area of mathematics that tells you that with

probability one, you will not end up at a saddle point,

if you're doing gradient flow.

But had it been an open problem for the discrete version,

this gradient descent, if I started gradient descent

in an arbitrary place, could I end up and sucked in

sucked in sucked and just kinda asymptotically

end up at a saddle point.

And we did some kind of Morse theory kind of arguments

to show that that's not the case.

But that is an asymptotic argument.

So no efficiency statement.

Can you say something about efficiency?

Well, there's a negative result.

We had a follow up paper which showed that gradient descent,

even though asymptotically it'll never end at a saddle point

it can take exponential time to escape saddle points.

So it does get sucked in for kind of a long time,

and then sucked in for a long time and sucked in

for a long time, and then you set that up in a

not too weird way, such that it'll take exponential time

to actually finally get past all the saddle points.

So that was a negative result.

But now there's an important paper, Ge, et al

that showed it turns out if you just add, you do

stochastic grading, which we all love for other reasons,

it can actually escape saddle points in polynomial time.

So we went from exponential to polynomial,

the thing that computer scientists love to do.

Now their polynomial was not necessarily good.

Polynomial is not good enough for us, we want fast.

We want linear or sub-linear.

Logarithmic is where we are typically trying to get.

But none-the-less, that, as usual, opened up

the, oh, maybe one could go faster.

Okay, so let's set up a little structure here,

let's do convex optimization at first.

So we have a minimum of a convex

function, a D dimensional space.

Here is gradient descent, which

Isaac Newton would have recognized.

And I should say the date on this talk was January 15th,

'cause I gave this talk at the Newton Institute in

Cambridge, and as I was giving the talk, there was this

huge portrait of Newton sitting right there, and I realized,

as you can see, there's several other ingredients

that Newton would recognize in this talk.

And I kept kinda turning to him and saying, see?

(laughter)

So I left that date on here just

'cause it's kinda stamped into me.

So anyway, there's Newton's algorithm.

So it converges, provably, to a global minimum,

and here's the critical part, is that the number of

iterations that it takes to get to global minimum,

it's independent dimensionality.

And I don't think that's widely enough appreciated.

Everyone thinks gradient descent is good,

because the steepest descent is cheap and all that,

but it's number of iterations is independent dimension,

if you do it in infinite dimensions, then it's just as fast

in terms of number of iterations, which is amazing.

Now, but what about the world that we mostly

really care about, these non-convex problems?

So now we can local minimus saddle points

and local maxima, and all that.

Now here's a list of problems that are actually non-convex.

But they don't have any local minima, provably.

So they have a bunch of saddle points,

but it turns out moreover, all those saddle points

are strict saddle points, so they don't have

these kind of flat, bad regions that could get you

really stuck, and you couldn't say much.

So these are really interesting problems.

There's not neural nets up here, but these are other areas

if you're a machine learning person, that you need to know

about, they're really working in real world applications.

So a lot of them, phase retrieval, matrix sensing,

these are, a lot of practical applications of these.

So notice these local minima, all these (mumbling).

So if you can avoid these things efficiently,

you're done, you solved the problem,

you get to the global optimum.

Okay, so let's set up some structure

to talk about convergence.

So we need a bit of smoothness

so we can see anything at all.

So here's the standard assumption, we assume that we have

a gradient which is Lipschitz, that if you move between

two points, x and y, that the gradient doesn't change much,

as relative to the distance.

And then we now talk about a first order stationary point.

If the gradient were zero, that would be the classical

notion of a stationary point from calculus.

So we put a little epsilon ball around that, and we say

how long does it take to hit that epsilon ball?

And we do that for various values of

epsilon, so now we get a rate.

All right, and here's the theorem.

Let's say it's due to Nesterov.

Which is that gradient descent, with a certain choice

of step size, the one over the Lipschitz constant,

finds one of these guys in a number of iterations

which is proportional one over epsilon squared.

So this is somewhat of a slow rate, but not too bad.

And that's true for any convex function.

Lipschitz, smooth, convex function.

We have a distance to the original distance that has to be,

if you start really far away, you're just gonna

take a long time to get there, fine.

And then you have a very simple

two times L sort of expression, here.

But then again, here is the key,

which is that this is independent of dimension,

number of iterations.

So that's the best one could kind of possibly hope for

for a gradient-style algorithm.

All right, so now what are we gonna try to do?

Well we're gonna try to now go to beyond that,

we're gonna try to avoid saddle points,

that algorithm will get stuck at saddle points, potentially.

So now we're gonna define a second order stationary point.

So first of all, we need a little bit more smoothness

assumption, we need a Hessian to exist,

and we need it to be Lipschitz.

We don't wanna move around, and Hessian changes a lot,

then we wouldn't know how to get

out of the saddle point, really.

So we're gonna make that assumption,

we'll have now a parameter row.

The second order stationary point is again that we have this

first order stationary, plus the minimum eigenvalue of the

Hessian has gotta be greater than or equal to, if that were

zero, that would be your classical notion of a not saddle

point, but we got ourselves a little slop again to have a

little ball around the solution so we can get a rate.

And that ball happens to be parametrized by

row times F from the square root,

it's just a standard parametrization, and that's it.

But it's proportional to the row of the Lipschitz bound.

All right, now we're gonna analyze it, extremely simple

algorithm, and it's a mathematician's algorithm,

one that you can analyze and say something about.

But it also is something you can implement and practice.

So it is just gradient descent.

But from time to time we're gonna add noise

from a ball of radius R, selected uniformly.

Now, time to time means that we're

gonna not let this happen too much.

And so this perturbation condition is that

if the gradient happens to be really small

where you're currently at, and you haven't

added perturbations too recently.

So a little bit of a historicist, if you will.

If that's true, now you add some noise.

And the noise is isotropic.

So that's a dumb little bit of stochasticity

that we've added to gradient descent.

We needed something like that 'cause gradient descent

itself will take exponential time, that would be bad.

So we've done this little bit of extra noise.

And here's the results, so here is the convergence rate now,

now we're looking for second order stationary points,

Nesterov did first order stationarity.

So we have again these Hessian Lipschitz assumptions.

We set the learning rate to a certain choice,

and now we're gonna find a second order

stationary point and a number of iterations,

well that looks exactly the same as before.

And it is, except for that little tilde sign right there.

But before we get to that, it is

one over epsilon squared, still.

So we're able to evaluate saddle points just as fast

as gradient descent in the convex case,

this is the non-convex case.

That's pretty striking fact about gradient descent, which

is it doesn't seem to be too troubled by saddle points.

Moreover, we have the Lipschitz constant there, and

everything is the same, and so what does that tilde mean,

that's where we're hiding the dimension dependents.

But the dimension dependents

is logarithmic, poly-logarithmic.

It's logarithm to the fourth power, D.

And D could be a hundred thousand, that's not

a bad penalty to pay to avoid saddle points.

So if you wanna pump this result up to the most you can,

you would say that perturbed form of gradient descent goes

through the saddle points as if they're almost not there.

And so this, to me, is one of the first kind of proofs that

establishes why gradient descent is so effective in things

like neural nets, oops, I said it twice.

In high dimensions.

All right, so here just to summarize what we got,

so Nesterov for convex problems got this rate,

we get exactly the same rate up to

this penalty of log to the fourth.

Now, is that four real?

Probably not, probably that's an artifact of the proof

technique, and I suspect log is a lower bound, but I don't

think we know a lower bound, currently, for this problem.

Now, here's a little bit of the history

just briefly, I already alluded to this.

But here are a bunch of different methods

that do find second order stationary points.

We've been talking about the ones based on gradients,

but you could also go to Hessians, and this is

a little more of a classical literature.

If you give me a Hessian, maybe you can get a Hessian

cheaply, but in a lot of problems you can't,

but if you did, then I can compute the eigen structure

of it, and I could find the negative eigenvalue

direction, and I could just hop out.

And so if you do the analysis there, you get a faster rate,

it's one over epsilon to the one point five.

But these algorithms are not being used,

at least in most of the problems I'm aware of.

There is an intermediate set of things where you allow

yourself a Hessian vector product, and this is interesting.

Because that is something you can do realistically

in lots of problems, and you get a faster rate here, 1.75,

and you get this nice log D dependence.

So this is, a bunch of people, mostly at Stanford,

and I think it's very interesting comparing.

This will kinda play out over the year to come

as to which of these approaches.

Anyway, we're up in gradient land here,

there was an original Ge et al work and it was

just polynomial, so not nice, but

not that strong of an assertion.

There was then a paper by Levy,

who was able to prove D to the third,

and still polynomial in whatever epsilon.

And then this work who goes all the way down to

one over epsilon squared, a good polynomial,

and then does logarithm to the fourth,

starting to get to the lowest it could probably possibly be.

Just a little word about proof technique here,

so really this is differential geometry.

This is just, we're trying to prove something about

the geometry of saddle points and how they interact with

dynamics, a particular kind of gradient dynamics.

So you'd like to analyze around a saddle point,

where is the kind of the bad region, where if you're

in that, it's gonna take you a long time to get out.

And this is a probabilistic statement because

I'm gonna be adding noise from time to time,

and the noise will maybe get me out of that region.

And if I'm out of the region then kind of just the

power method will get me out, I have a negative eigenvalue,

boom, it'll just shoot me out.

So I need to know how thick that region is.

And the argument that's used in this proof

is a coupling kind of argument.

You don't know what that region looks like,

you could probably do some differential

geometry to figure out more about it.

But what you do is you start a couple pair of processes,

the probability argument, he started a couple pair

of processes on both sides of the region,

with R apart, where you don't know what R is.

And now you adjust R until these two things

start to flow quickly apart from each other,

when a phase transition occurs.

Now you know you're at the right value of R,

and you choose that as your R.

And that R is capturing now a pancake-like

shaped object that's pretty thin.

And that gives you this rate of logarithm, the D.

The previous work had replaced this whole argument

with a flat slab, and the slab was too thick.

And so in high dimensions especially,

that's where you get the D cubed.

Okay, so I always like it, again, I think

Ben would love the probability coming together,

the differential geometry coming together, the algorithms.

All right.

Okay, so that was the first line of

work that I wanted to talk about.

Right now let's talk about, let's go further,

we're gonna talk about saddle points.

Can we get past the saddle points even faster?

So there's this notion of Nesterov acceleration

which if you know about optimization is one of

the big discoveries the last 40 years

about what's the fastest you can go.

It kinda gives you insight into that.

And so do those techniques allow us

to move past saddle points quickly?

What other kinds of stochastic models

can we use to escape saddle points?

There's just this brown ball of

noise, is there other, better ways?

And how do these two kind of ideas interact?

So this is kinda the last year of research.

And to get there, we're gonna need to have

a deeper understanding of acceleration

than has been available in literature today.

So we just started to say what it is.

The deeper understanding is based on going into continuous

time, and just talked about differential equations

and stochastic differential equations.

And this'll be a shock to the computer scientists

in the room, they thought they could develop their

whole career outside of differential equations.

And I'm gonna tell you that's wrong,

and it's because these optimizations,

these items that hop around in a space,

they don't have any real topology.

So it's hard to talk about notions like acceleration

and going faster, 'cause you're just hopping along

a sequence space, you need some embedding into a continuum.

And that embedding leads to insights,

there's gonna be phase transitions in continuous world,

which you don't see in the discrete world.

And so it's phase transitions which are

giving us the difference in speeds.

So I'll get there in a minute.

All right, so this work is one of my favorite lines of work

in the last few years, and it's principally been led by

two of my favorite grad students, Andre, well they're

all my favorite, that's not fair, but (laughter),

I'm pumping their work right now, so Andre Wibisono

and Ashia Wilson have really mightily contributed to this

in the last several years, now, and then Mike Betancourt

from Columbia's joined us in the last little phase

of this research, to help us along a bit.

Okay, so let me be now really provocative

and grandiose sounding, which I normally am not.

I'm provocative but not grandiose.

In mature fields, ones that mathematically really seem deep

and mature, at least in my experience, there's a really

deep interplay between integration and differentiation.

You can call that interplay the

fundamental theorem of calculus,

also Talagron's work, and...

Quantum field theory does a lot of this.

Really mature fields you sometimes work

with the integrals and sometimes work

with derivatives, and you go back and forth.

And these can be very generalized notions of these objects.

So physics certainly is the poster child of this,

there was Newton who wrote down F equals MA,

it's based on derivatives, it's a differential equation.

And wrote down the laws of nature using derivatives.

Then Lagrange came around about 100 years later,

and said, I can rewrite all of that using integrals.

So I'm gonna integrate an energy, and moreover

I'm gonna optimize other paths under those integrals,

so he introduced also optimization, not just integrals.

And then he gets out the same physics that Newton got out

using this other formulism, it's called variational.

And as you probably know, that led to then Hamilton's

perspectives, and that led to a lot of the 20th century

physics was based on that, but a good physicist

goes back and forth all the time.

A good probabilist goes back and forth all the time.

Statistics is full of these kind of arguments.

If you're doing Laplace expansions or saddle point

expansions, that's a derivative in the complex plane

for the purpose of doing integrals.

The numerical disciplines, finite elements is take a PDE

and turn it into a bunch of little integrals, patches,

that you put together in a variational way.

Monte Carlo, hybrid Monte Carlo, the best existing

Monte Carlo algorithms are based on taking a derivative

to get fast speed as you move along in some direction.

So I could go on, but I think this is true that fields that

kinda go on both sides of this are really somewhat deep.

I would argue that a lot of our computational fields

of this century are not so deep in that regard.

In particular, I'd argue that optimization is not.

Might be surprisingly.

It's all full of derivatives.

Everywhere you see, in every paper, it's tons of derivatives

and Hessians and border derivatives or whatever.

But you almost never see an integral sign.

And there's not a variational perspective

as fundamental in that field.

So I'm not gonna argue this, but I'm gonna put it out there,

so I think there's an opportunity to take

a variational perspective on optimization.

To ask, what is the best optimization algorithm.

That's, in essence, a variational question,

and it's gonna require the other side of the coin.

All right so let's go back now to gradient descent again.

So here's gradient descent, I'm now

using K as my variable and not N.

So again, known for hundreds of years.

What's maybe not so known is it has a convergence rate.

So now we're starting to talk

about competitional complexity.

So we're doing convex optimization in this part of the talk,

and what this convergence rate is, is that after you've

gone through K steps, no matter what the convex function is,

I can guarantee you'll be within a

ball of size one over K of the optimum.

So that is not an old idea,

that's maybe in the last few decades.

It's kind of two or three pages of proof

using Lipschitz, convexity, Jensen's inequalities and so on.

But anyway, it's a nice fact that

we can analyze the rate of this.

So already we're starting to do something

a little more computer science-y to have an actual rate

over a class of functions, not just for a specific function.

Okay, now at some point, various of the Russian school,

Nemirovski is the most well-known name having done this

in the 70s, started asking, can I get a complexity theory

of optimization?

And I think that was a really important moment.

He didn't just go to Turing and say, well,

the complexity theory already exists,

the computer science gave it to me.

'Cause that complexity model's just too strong.

You can do anything with a Turing computer.

So you can't say much.

And I think that's, with all due apologies to the complexity

theorists in the audience, you can't say much, right?

So Nemirovski said, no, I'm gonna have a different

computational model, it's gonna be an oracle model.

And so for example, if I'm looking at optimization of

convex functions, what I'm gonna do is, the computer

is allowed to know the function value, the gradient

at any point that it wants, and it's allowed to take its

steps within the linear span of all previous gradients.

So that's a computer, that's a computational model.

It's a pretty strong one, but it's not

everything, it's not Turing complete.

But you can prove things with it now.

And in fact, you see nowadays in complexity theory like,

fine grained complexity and kind of relative complexity

starting to emerge, it has this flavor.

So he said, well, given that computation model,

what's the fastest you can go?

And this framework is rich enough to actually give an

answer, and the answer turned out to be one over K squared.

So it's faster than gradient descent.

Which is maybe a little surprising, because gradient

descent is steepest descent, if you're only allowed

to look at gradients, you better go down the

steepest direction and go as far as you can,

and we're giving you the optimal step size.

So I think probably, I wasn't around in that era, they

were a little surprised that there could be a better rate.

So probably people said, well

that lower bound's just too low.

We need to get a better lower bound that's a little higher.

And it was maybe a shock to everybody when Nesterov,

a student of Nemirovski's, wrote down this pair of equations

that's called Nesterov acceleration.

There's now two variables, not just one.

One of them does gradient descent in the x variable,

but that's now the y variable, and then the x variable

is kind of a combination, an extrapolation,

a rotation, if you will, of the last two x variables.

So it has a little bit of a momentum flavor,

or a heavy ball flavor, it has a lot of intuitions.

But the amazing thing is that then Nesterov not just

invented this, but he proved that it has a rate of

one over K squared, beating gradient descent.

Which is pretty astonishing given that

it's only using gradients, and it's not

going down the steepest descent direction.

Moreover, this algorithm will occasionally go back uphill.

Which is amazing, if you're trying to get down this hill

as fast as possible, how can that possibly be?

So I think it's safe to say that a lot of people have worked

on this, that there's a lot of math, but there's not a huge

amount of understanding to this day about the phenomenon.

Where's it coming from, why is this the optimal algorithm,

why is it reaching the lower bound, and so on.

So I'm gonna argue that the best way, at least I think,

to understand it is to go into continuous time.

That just in discrete time you're

not gonna see the phenomenon.

So yes, in the last 40 years there's been all kinds of

accelerations of classical methods, they're all really

interesting, they're based on kind of the algebra

associated with Nesterov, but not really,

there's not an underlying generative principle,

if you will, for this class of methods.

Okay, so let's now go into continuous time.

So there is gradient flow at the top, it's just

a differential equation, it takes away and it

goes downhill, you don't need a step size.

If you discretize that equation,

you get out steepest descent.

Well it depends, there's two different ways,

several different ways to discretize it.

If you discretize it with explicit Euler math

then you get out gradient descent.

If you discretize it with implicit order,

you get out a prox method.

Now, that's cool, but also you can analyze the

convergence rate of this in continuous time.

Using the same kind of tools we did in

discrete time, actually a little easier.

And the convergence rate turns out to be one over T.

Maybe not a big surprise, 'cause gradient descent

was one over K, this is now one over T.

Sort of similar rate.

Okay, in a very nice paper, Su, Boyd, and Candes at Stanford

said, what if we take the Nesterov pair of equations

and do the same thing, take the step size to zero,

out will pop some differential equation,

and there is the differential equation that pops out,

so not surprisingly it's second order

because we had two equations.

It's non-homogeneous, and there's a mysterious three

sitting up there, and then you have the gradient.

So it looks kinda like a little bit of a

oscillatory thing driven by a gradient.

You can't solve it, but you can analyze it

using basal functions and understand

a little bit about how it moves around.

And so that was very nice, very interesting.

We looked at that and were inspired by this,

and said, is there a underlying,

where did that differential equation come from?

Why that differential equation, why is it the best one?

Is there an underlying generative principle?

So I think of this as much like Lagrange looking at

Newton's equations, F equals MA and says, okay,

that's a nice equation and it seems to work,

but where's it come from, why that equation?

Is there an underlying principle?

So here's our attempt to deliver such a principle.

So I've just written it down here, and I'll

take a couple minutes trying to unpack this.

So we call this a Bregman Lagrangian.

This is a function of the state space, x where you are,

we also have a derivative, x dot,

and it's also a function of time,

this will be all time varying.

It's got two main pieces to it,

is this part right here, and this part right here.

So let's focus first of all on this.

First of all, all these alphas, betas and gammas,

are just time scaling functions that are

degrees of freedom of the space of algorithms.

We don't just want one algorithm, we want a whole space,

and so we give ourselves this reasonable freedom.

They're not the important part.

The important part is these two pieces.

This piece right here is the function you're trying to go

downhill on, so the F, scaled in some way.

So that's kind of like a potential

energy, you wanna go down.

We also have a term here which is gonna be kinda like a

kinetic energy, which is now known as a Bregman divergence.

So this is a Bregman divergence

between a point x and an extrapolated point,

so what is a Bregman divergence?

It's this little equation here, if you want,

it's based on an auxiliary function, H,

and it measures distance between x and y using H.

So this picture might be better

than the equation for some of you.

If I wanna measure distance between x and y on this axis,

what I do is I put down a function H,

and I take the linear proximation to H at X,

and go out there and get a discrepancy between

the linear proximation and the actual function H,

and that red distance is the distance between

x and y using this auxiliary function H,

it's called the Bregman divergence between x and y.

If H is just quadratic, this is a fancy way of writing down

the Euclidean distance between x and y,

it's a variational way of doing that.

But if H is logarithmic, you'd get a Kullback-Leibler

divergence and H can be all kinds of other things,

and has been studied widely in convex optimization.

Bregman, by the way, I think is Russian-Israeli,

I don't remember, so anyone happen to know?

Again, I like to make connections, the Russians

were behind a lot of this work that came out

in the 80s, for obvious kind of reasons.

So that's the Bregman divergence,

now how are we using the Bregman divergence?

In an interesting way, I think.

We were measuring, the Bregman's measuring

point x and x plus a timescale derivative.

So it's a little bit of a, like elasticity, if you will.

How much energy are you paying if you move in the

direction that the momentum seems to be carrying you?

So we measure that, and let's call that a kinetic energy.

And in fact, if H is chosen to be quadratic,

this whole thing reduces to just one half x dot squared,

which is kinetic energy.

So this becomes literally a Lagrangian

that happens to be time varying.

So, now why do I think this is an

interesting object to be studying?

Well first of all, what do you do with Lagrangians?

You put them inside of an integral,

and then you optimize over the paths.

And that will give you the particular path

that we argue you should be following

if you're trying to optimize in the optimal way.

So let's go to that.

Here's what Lagrange told us how to do, we take our

Lagrangian and we put it inside of an integral,

and then we optimize over paths and maybe

we picked out this particular path, here.

And then the mathematics is called calculus of variations.

You just, in essence, set all partials as equal to zero,

and that's, in functional analysis,

called the Euler Lagrange equations.

And so it just takes a bunch of partial derivatives

of this thing in a total time derivative.

And once you do that, you'll get out a differential equation

you'll get some xs and some x dots and some x double dots,

and so let's do that.

It's a two line little derivation, and so here you get out

this master differential equation.

It looks awful, maybe, to you, but I love it,

and so it looks beautiful to me.

It's got second order, it's got a non-homogeneous

component, and then here's a whole bunch of geometry.

Just, that's geometry to me.

And then there's the gradient.

If H is quadratic, if we pick alpha and beta

in a particular way as P log T,

and if, what else do we do...

Yeah, if we just do all that, then that differential

equation reduces to the Su, Boyd and Candes thing.

But moreover, this differential equation,

on a particular settings of alpha, gamma, and H, and beta,

comes up with all the known, at least the acceleration

items that I know of, in the literature,

their continuous time versions.

So it seems to be a generator of accelerated algorithms.

So that's already kind of cool that we were able

to unify them in one differential equation.

But now, what could you get more out of this?

What mathematical other facts can you learn

from having written all this stuff down?

Well, first thing you might wanna do is get a rate.

You'd like to know, can I get a general rate

once and for all, not having to do with algorithm

specific rates, which the literature has been doing

for all the discrete time algorithms.

And so those rates are (mumbles).

It's not that painful, but it's two or three

pages, typically, of calculation.

The amazing thing here is you get a rate

with just one line proof, for this entire family of things,

and I'm gonna show you that proof on the next page.

So here is the rate.

The rate is a big O of E to the minus beta T.

So beta T is that degree of freedom

that you have as a designer.

So here it's already kind of interesting,

you get to pick your rate.

So we're gonna return to that in a minute.

You pick the rate.

So anyway, it's this, and here's the proof.

You just write down what's called a Lyapunov function,

here's the Lyapunov function, involving the

Bregman divergence, and involving the optimum.

You take its first time derivative,

you ask that that be negative, so you go downhill

in this particular Leapunov function, and this happens

if and only if this occurs, so it's a one line proof.

Ashia Wilson, who I mentioned earlier, has done another

couple of papers that are quite beautiful on other

Lyapunov functions that exploring the whole space of

Lyapunov functions, both in discrete and continuous time

and linking them, so that's a very nice little magnum opus.

Anyway, that's already kind of interesting,

looks like we've learned something about this.

But, can we learn anything more?

And then something really interesting happened,

and this where I think it finally starts to get

interesting, maybe it's been tedious til now, but anyway.

So let's suppose that I, so as I said,

I get to pick the rate, and that seems odd.

Because aren't they just kind of

given to us by the algorithm?

So let's pose if I pick a rate which is,

I set beta T to be a logarithm of T.

And so I take exponential of that, I get one over T,

which is a rate I know I can achieve with gradient and flow,

for example.

I know I can do that.

So anyway, I set beta T equal to that value, and then

there's a particular way to set alpha and gamma,

they're actually determined by that,

I'm not getting into that.

And then I set H to whatever.

And then I plug this in, I get actually gradient flow,

it just reduces to gradient flow.

Yes.

- [Male] Quick question.

Why do not just set it to infinity, and converge instantly?

- Good, you're asking the right question.

I'm gonna bring you into the conversation here in a moment.

So suppose I set it to logarithm

and I get this known algorithm, great.

Suppose that Sham, he's a little smarter than I, he says,

I'm gonna set it to two times the logarithm of T.

Now when I take the exponential,

he gets out a rate of one over T squared.

And we know there exists such an argument,

it's the accelerated version of Nesterov which does that.

Now here's the interesting part, in this space, this x,

x dot space, he and I will follow exactly the same path.

That's surprising.

What'll happen is that he'll move

faster along the path than I am.

And all that means is that he's using

a different clock than I'm using.

That's not interesting.

The path is what's interesting,

and we're following the same path.

So this Lagrangian has a mathematical

property, it's called covariant.

And so that means that for any of these choices

of beta, we're all following the same path.

So now let's suppose that Ed says, no, I'm gonna move

exponentially, guys, I'm gonna set beta T equal to T.

And so I'm moving exponentially fast.

And he will now design with his machinery,

his other Lagrangian equations, he'll get an algorithm,

and it'll move exponentially as fast.

He will, but it'll move along our same path.

So he's really not doing anything new,

he's just using a different clock.

And like Einstein said, the clock doesn't matter,

it's not the interesting part of this problem.

And our young man here in front, Pedro says,

I'm gonna use a super infinite fast clock,

and I'm gonna go as fast as...

And yes, you can, but you will move along

our same path, just using your weird clock, so.

So in continuous time, in some sense

the phenomena evaporates of acceleration.

But that's not what really happened.

What happened is that acceleration is not really

all about acceleration, it's about the path.

There is an optimal path to follow for optimization.

And you wouldn't have seen that in discrete time.

All right, now, what's then interesting, and is

the whole story at this point, is how you discretize.

How do you go back into computer science land?

Because we know in computer science land, we can't

discretize at, Ed can't discretize his differential

equation, something's gonna break

with his exponentially fast thing.

Similar to you, you won't be able to do it.

But we will be able to, 'cause we

picked these less ambitious rates.

Now why do we know that,

well, we just have these oracle bounds.

But, there must be some more mathematics

that actually tells us that.

So let me now move on to the next part of the story,

which to me is the most interesting.

So these are the mysteries about a year ago.

Why can't we just discretize when we are using

exponentially fast clocks, or super fast clocks?

What happens when we arrive at a clock

speeds where we can discretize?

What happens in some mathematical sense?

We have a signal, a flag that's been raised,

of something that's, and then once we

arrive at that point where we can

discretize, how do we discretize?

And so I'm gonna claim these are all solvable,

these problems, and the secret is a technique,

or not technique, a beautiful area of mathematics

called symplectic integration.

How many of you know what symplectic integration is?

Okay.

The MIT person over here knows.

Yeah, it's not widely appreciated, it's one of the major

areas of mathematics, I'm sure your mathematics department

has several symplectic geometers here.

When I put it in my abstracts nowadays,

I go give talks, a few of these people show up.

And they usually have gray beards, and their sitting there

in the back looking bored, but then they lighten up

when I get to this part of the talk (laughter).

All right, so what is symplectic integration,

I think I have a slide that says something.

Yes.

All right so, this was developed in the late 1800s

by some people that you will recognize, Hamilton,

Poincare, Jacobi, a bunch of these kind of people.

What were they trying to do?

Well, at that era, they had a whole bunch of

interesting differential equations.

They had Newton's equations, but they had Maxwell's

equations, and a whole bunch of others, fluid equations.

And they were saying, they were very prescient,

they were saying, how could we discretize

these differential equations?

They didn't have computers yet, but somehow

they're already thinking ahead.

And they asked the final question, well,

the true flow of these differential equations

gives us certain paths that have

certain mathematical properties.

They, for example, conserve energy.

They conserve momentum.

There's physical laws, there's certain

invariances along these paths.

If I discretize, I'm gonna hop along a discrete collection

of points, and I'll have an approximation or that's fine,

we have to tolerate that.

But if I hop in a certain way that I'm losing momentum

or losing energy, I'm not doing the real physics anymore,

and that would be just wrong.

And they were very prescient, because now, if you were to

do that nowadays, with, say, simulating the galaxies,

the cosmos, or simulating the climate, or protein folding,

and you did it in the wrong way, you bleed away energy

and momentum and you just get bad answers, just bad physics.

So people don't do that.

Why, because these guys figured out

a way to do it the right way.

And the right way to do it is, is that there is a class

of methods that will exactly preserve energy momentum

even as they're doing a discrete approximation

of the differential equation.

Those are called symplectic integrators.

Roughly speak, well...

What is the idea, so I knew about this back at MIT,

I sat in some differential equations courses

when I was a young assistant professor.

And I heard about this a little bit

and I just kind of remembered it.

I remember the following idea, which is that

if you think about a classic differential equation,

it's a vector field, and you're

flowing along the vector field.

So at any given point, there's an arrow leaving that point,

and you follow that arrow for a little bit over time.

Symplectic perspective is different, at every given point

there's a little triple of vectors.

And that defines the little volume element, and if you take

the determinant of all the elements then you get a volume.

And so now, as that volume element

flows forward in the flow,

it's gonna shear in some way, and

I want its volume to say constant.

And the volume is just a determinant, so

that's just a little piece of mathematics.

And so you work out an integrator, which as it moves along

discretely, preserves exactly these volumes.

And then you can generalize this to volumes in phase space,

and then to exterior forms in differential geometry.

In fact, this drove a lot of the early

development of differential geometry.

Okay, now the actual algorithms turn out to be

kind of interesting, they're not that surprising.

So if you'll remember, when you discretize differential

equations, you'll take like maybe the explicit method,

which looks at where you currently are, and the

gradient where you are, and then it moves forward.

Another, the implicit approach, takes where you currently

are, looks at the gradient of where you're going to move to,

and uses that gradient.

And mathematically, this is

just looking one way or the other.

The latter, you have to solve a little

implicit equation, but you can often do it.

Neither one of those are symplectic,

they will bleed away energy or momentum.

But if you interlace them in a particular way,

some terms will cancel, and you will exactly

conserve these things.

So I remember learning about this, and then I learned the

other interesting fact is, because these symmetries are

being preserved, these little triples,

this has much better error control properties,

much less error is accrued as you move along

'cause it's kind of a stability, this,

a lot of things are canceling.

And so you can move much farther, step size

is gonna be turned up by a large factor.

And that's critical for these large scale applications.

So no one who does serious chemistry or physics nowadays

uses anything other than symplectic integrators,

at least to my understanding.

So, why did I think about looking into this a little bit?

Well, because, if you can move as fast as possible with

these integrators, that's what acceleration somehow is.

So it must be relevant.

I'm gonna now kinda cut to the chase a little bit.

There is a standard recipe for doing this,

you start with a Lagrangian and the Lagrangian's time

varying, we need to move into the Hamiltonian framework,

there's something called a Legendre transform,

which is you just do it, and you

move into Hamiltonian variables,

that's a big of conjugate duality, if you will.

I guess this is Rockefeller's university,

so you all know about that.

And so we get a new set of variables that include time

as an explicit variable, you now have a Hamiltonian,

which is time invariant, and there's a standard approach

to applying symplectic integrators to that,

there's not just one symplectic integrator,

there's a whole family of them.

So we picked a particular one,

here's a slide of actually doing this.

So this is a particular high dimensional quadratic,

and so we write down the quadratic, that's the function F,

and we make these other choices that I mentioned, and

turn that into the other Lagrange equations, or sorry,

turn that into Hamiltonian, which turns into a symplectic

integrator, and we put that on the computer.

So we did no kind of algorithmic choices of any kind,

the mathematics just drove us along,

and here is the actual convergence.

And so that is the oracle rate for this particular function,

and so it is doing as advertised.

Moreover, it's oscillating.

It's not just going downhill, it's

going up and down and up and down.

All right.

So that's really cool.

Okay, so let me spend the rest of my talk sort of,

I could go on about that, but we just wrote a paper

last week called 'On Symplectic Optimization'

which is on my homepage, so if you're curious

you can see some more details.

I also wanted to put a slide in here, which I forgot to,

which is that if you compare to Nesterov,

Nesterov in this thing also achieves the oracle rate,

and then near the bottom it actually

turns down a little bit faster.

And that's an interesting phenomena

that's not part of the dynamics.

There's a couple of interesting mysteries there.

So in some sense, Nesterov's a little big faster

to us if you set the step sizes equal.

But, this is a symplectic integrator,

you can turn the step size that much more.

So when we do that, we really beat Nesterov.

So this extra algorithm actually beats Nesterov,

which is I think a real achievement.

Okay, so real briefly, a followup project, here.

Acceleration and saddle points.

So this has been an open problem.

As I'm moving near a saddle point,

this has put two ideas together now,

acceleration is this kind of momentum I start to build up

that allows me to get downhill as fast as possible.

If I've got a lot of momentum and there's a saddle point

sitting there, presumably that might help me.

I might be able to get past that saddle point,

blow by it, I don't see it as much.

But that's that intuition, as usual

you have to try to prove something, here.

It could be false, that intuition.

Anyway, Chi Jin and Praneeth Netrapalli

have worked, we've worked together on this.

And so just really briefly, there has been some

existing work that, for example, accelerated

gradient descent defines in this non-convex setting

epsilon saddle points at a rate of one over epsilon squared.

There are some nested loop algorithms where you do

something like acceleration inside of another loop.

So if you know about interior point methods,

they kind of have a Newton inside of a nested loop,

and there are algorithm is where that's reasonable.

In our field, these nested loops, I just don't

really believe in them, I think we really need

to have kind of just a simple algorithm.

So anyway, but faster rates have been achieved,

not only one over epsilon squared,

but one over epsilon 1.75, if

you allow yourself the nesting.

So we're asked, can we do this just with basically

accelerated gradient descent, not being more fancy.

Same assumption as before, a Hessian

Lipschitz and gradient Lipschitz.

And a goal is the same is before,

we wanna find a secondary stationary point.

Here is our algorithm and the wanted time,

I'm not gonna go through it, but basically

this is just accelerated gradient descent,

that's what steps four and five are right there.

This is now the same perturbation condition as before,

and this is something called non-convex exploitation,

which I will mention in a minute.

So anyway, here is the result.

Now hopefully you're getting used to being theoreticians

and just looking at and understanding these results.

This is the convergence rate, it's not epsilon squared,

it's epsilon seven fourths.

So, da-da, proof that acceleration can

accelerate you past saddle points.

Moreover, we just have Lipschitz constant and a row up in

the top and the usual thing, and then we have that tilde

again, and that tilde here is logarithmic in D, again.

It's logarithm to the sixth power, now, and not the fourth,

but again, I think that's just the proof technique.

Okay, I'll skip over that.

So how does this proof go?

Well, you have to use, it turns out, a Hamiltonian.

The Hamiltonian allows you to handle the fact that

in this non-convex setting,

you're not always doing a descent.

So the Hamiltonian, if you don't just look,

if you just look at the function value itself,

it's not descending, you're going up and down.

If you have the Hamiltonian, you're mostly going down.

It takes out a little bit of that, but there's

occasional times where you can still go back up.

And that's what that NCE step is designed to handle.

So it's a bit of a patch, if you will.

There may be a generic Hamiltonian here,

but this patch is what allows us to prove this theorem.

So I'm gonna skip over the details here.

All right, then two more minutes on

follow on, so, a lot of that work's several months ago,

what we're mostly doing nowadays is trying to ask, what if

we put all this sort of stuff together with stochastics?

Instead of just doing a perturbation condition,

what if we do real stochastic gradients, or real diffusions?

So if you work in discrete time,

those things are called stochastic gradient methods,

when you take them into continuous time, they're called

diffusions, or more general, Markovian processes.

And they have rates of convergence,

they probably have optimality principles,

we don't know what they are yet.

And so there's a huge number of open problems about,

what's the fastest way to diffuse?

What's the fastest way to sample and diffuse?

And how's the dimension dependence go?

What about the non-convex case?

So I've got a large group now

and everybody's working on this.

And I know that people here are working on it, too,

there's a lot of people working on it.

This project was with one of the students, Xiang Cheng,

and another student, Niladri Chatterji, and then

Peter Bartlett joined us for this project.

So, can we accelerate a diffusion?

What is a diffusion, again, it's a

continuous time version of stochastic gradient.

So think brownian motion with a gradient.

There have been some negative results.

But, they've focused on what's called overdamped diffusions.

So just briefly, the classical diffusion that people have

been analyzing here is called Langevin diffusion.

Langevin diffusion is just what you might expect,

it's the rate you descend, it takes a

gradient and it adds brownian motion.

So if you're a gradient loving person,

it's the first thing you would start with.

Okay, but, from all the work we've been talking about today,

we know that acceleration and getting momentum into these

things lets you move faster, so probably

we can diffuse faster if we allow momentum.

So instead of doing overdamped diffusion,

which makes you sluggish, allow yourself some underdamping,

so you can kinda hop around a little bit more.

And so you now get two equations,

and make an underdamped Langevin,

and you can do that analysis.

So can we accelerate that, and I think

this is now actually my last slide.

Here is the results for overdamped Langevin, and this is

a very recent results, these are mostly people in France

who have been working on this.

The convergence rate is one over epsilon squared,

and it has a factor of D in the numerator,

and our new result shows that with underdamped Langevin,

more of a momentum kind of method, you can go

all the way down to one over epsilon much faster,

and the dimension dependence improves to square root of D.

Okay, so I am done with everything I really wanted to say,

let me just now bump back up to the top level.

I hope you appreciate that there

are tons of open problems, here.

And if you have some math background, especially some

physics, but not just, there are just all kinds of things

you can do, and many many problems that are open.

And these are ones that I think are needed to be answered

if we're gonna have a serious engineering science

that gives some guarantees that says this will happen

under this situation, these error bars I'm getting are

calibrated, and blah blah blah blah blah.

That we really kind of got stuck

with computer science and statistics,

which I think of myself as both,

really were very, very separate.

And the computer scientists gave

really strong guarantees with no noise.

And the statisticians gave guarantees but with

no kind of serious, computational thinking.

No speed of convergence in any serious way.

And when you put the two together,

you can start to solve real world problems.

Thank you.

(applause)

- Thank you very much.

We have time for a couple of question from the audience.

- All the way back in the back.

- Your thoughts on continuous time approach,

what are your thoughts on continuous time approach

for non-linear constraints, like a

problem with non-linear constraints?

- So a very good question about, he asked about

the constraints, and specifically, non-linear constraints.

So I talked almost entirely about unconstrained

problems here, as you may have noticed.

And it's another, one of my list of many open problems

to do a lot of these things with constrained optimization.

There has been a growth of kind of conditional gradient

ideas in the last few years, noticing that there are

really, really simple ways to impose constraints

that scale really well in dimension,

and so on my list of things to do is kinda

redo some of this literature, but with

conditional gradient instead of gradient.

And do it in continuous time.

In fact, in continuous time, you can have the constraint

not be just a, you can't go past this,

you can have the constraint be a force

on your second order differential equation.

So you'll start to get there and the force pushes you back.

I think computationally that's gonna be an

actually better way to go for constraints

than the way we're used to in discrete time.

And non-linear, linear is not really the big issue here.

These tools really are a bit

agnostic to linear, non-linear.

- So ideally, you'd like to be neither overdamped

nor underdamped, right, because if you're underdamped,

you oscillate, and that slows it down.

Is there an optimum?

- No, you don't...

So, no.

You wanna oscillate.

The proof shows that the oscillation

is what gets you the fast speed.

It's not a perfectly...

- But does it give it the maximum speed?

(muffled speech)

- Critically damped is slower

than the oscillatory.

Yeah, it's not quite the same.

Your physics intuition helps, but it's not exactly right.

Anything else?

- Not to be basic, but have you compared this to

things like Adam, that optimize neural networks?

- Yeah, so he's asking about Adam,

a particular method applied to neural nets.

And maybe Sham could kinda help me, here,

but we're in the very early days of

optimizing really, really large problems

that are non-convex and stochastic, and all that.

And I think that Adam is just kinda one little step

along the way, it's a somewhat ad-hoc algorithm

for which there's not a lot of theory,

and there are theoretically well-grounded methods which are

roughly as good, or maybe even better,

and I think, at least in my experience,

the theoretically-motivated stuff wins out.

And either Adam will be shown to be,

the descent on some Hamiltonian...

- [Offscreen] I think you should just merge them together.

- You can...

Merge them together, Mike, okay.

There's a nice... (murmuring)

Merge them together, Republicans and Democrats

let's all come together. (laughter)

But no, it's a good, there's not gonna be

one answer to everything, there's gonna be

various ingredients that all come together.

This particular algorithm has one particular insight,

this has another insight.

And a good user of these, it won't be

the computer just delivering answers.

It'll always be the human sitting in there thinking,

how do I put that ingredient together with this,

as a good engineer should be.

A good chemical engineer will build a chemical factory,

which didn't develop all by itself.

In fact, I should say I've written an op-ed on AI

due to the last few days, and I'm gonna put it out there

at some point, and it's a little polemical.

I am unhappy with all the hype over AI,

and it says something like, you know,

when chemical engineering and civil engineering

were developed, people didn't say, the best way

to do this is to build an artificial chemist,

and then that chemist will know how to build the factory.

(laughter)

And for some reason that's how we're

thinking about what we're doing here.

Let's build an artificial and general intelligence,

and it'll solve all our problems,

or kill us, one of the two.

Instead of just thinking about, what are the ingredients,

how do we as engineers put them together,

how do we use them for useful purposes, and so on.

Is that a good place to stop?

Well, one more.

- Your point of saying you're presenting mathematical

formulas and these equations for a computer scientist.

How and what kind of work are you doing in the area

of knowledge representation, where you take a problem

and you represent it in a way that computer scientists

can sort of begin and understand it better.

- I love your question, it was about knowledge

representation, computer science and math.

Some part of it I love, and some part I don't.

Computer scientists should be just as aware

of uncertainty and dynamics and

optimization as everybody else.

And don't call yourself computer scientist

and wall yourself off from that, please.

But also, all the other people out there

should be aware of knowledge representation.

And ontologies, and relational things,

and first order things, and reasoning.

And just to get, I think so many people in the audience

know this, I love natural language processing, which I

think is a field that brings all these things together.

And actually someone asked me if I had a lot of money what

I would do with it, I'd spend it on solving NLP problems.

'Cause it does, you have to think about representation,

and don't get me started, but here they are,

they're loving me again, so. (laughter)

All right, get me a beer afterwards.

So just to say this even more provocatively,

even though it's being filmed,

I do not believe that neural nets

just with huge amounts of data

will solve serious problems like

machine language translation or dialog.

Period. Ever.

They will make useful artifacts that

we can use for some simple purposes.

But they won't, and it's gonna be humans, in the loop,

thinking about human sides of these things,

including knowledge representation.

But, you know, it's been 40, 60 years of thinking about

knowledge representation, and it

didn't lead that far that fast.

So it takes time, it's hard.

And it requires annotations, it

requires why you're annotating what.

But don't read the media too much about a lot of this stuff,

this could take decades to do this field well.

And for people thinking that oh, five years has gone by

and we don't have knowledge representation solving

all the NLP problems, or, it's just,

that is such a hopelessly bad way to think.

- Great.

And with that, let's please (laughter)

thank our speaker. - Thank you.

(applause)

For more infomation >> Taskar Memorial Lecture 2018: M. Jordan (UC, Berkeley) - Duration: 1:02:02.

-------------------------------------------

VocoPro ProjectorOke - Duration: 1:20.

The ProjectorOke is a compact lightwieght karaoke player with a built-in led projector.

The projector can swivel and tilt to project an image as large as 10 feet.

Turn the focus ring to bring the image into focus.

The disc player will play CDs, CD+Gs and MP3+G discs.

Or, you can play songs from a usb flash drive, or from a device using the one eighth aux

input jack, or connect your bluetooth device.

The speakers have led lights that will dance along with the music.

You can connect up to two wired mics with quarter inch plugs, And apply echo effect

to the mic audio, and each mic channel has its own volume control.

When not in use, they can be stored in the folding mic holders.

The AVC Automatic Voice Control will automatically switch off the vocal track of a multiplex

karaoke disc when a singer sings into the mic, and will switch the vocals back on when

the mic is not in use.

If you want to use an additional TV along with the projector, or just want to use a

tv instead of the projector, you can use the RCA AV outputs to connect to a TV.

The projectorOke is a fun addition for anyone looking for a compact versatile karaoke player.

For more information about this and other vocopro products, please visit vocopro.com.

For more infomation >> VocoPro ProjectorOke - Duration: 1:20.

-------------------------------------------

Come si fa a cacciare con il falco? - Duration: 2:18.

For more infomation >> Come si fa a cacciare con il falco? - Duration: 2:18.

-------------------------------------------

Come sono addestrati i cani antidroga? - Duration: 2:02.

For more infomation >> Come sono addestrati i cani antidroga? - Duration: 2:02.

-------------------------------------------

Sonic Mania Plus - Trailer | Switch - Duration: 1:38.

For more infomation >> Sonic Mania Plus - Trailer | Switch - Duration: 1:38.

-------------------------------------------

★★HiyamaXAoi★★ 「AMV」CADMIUM | Melody Ft. Jon Becker | - Duration: 4:25.

For more infomation >> ★★HiyamaXAoi★★ 「AMV」CADMIUM | Melody Ft. Jon Becker | - Duration: 4:25.

-------------------------------------------

Dinosaur Event Near You

For more infomation >> Dinosaur Event Near You

-------------------------------------------

First Ten Lives of Saturn I - Duration: 28:01.

This is the beginning of a voyage only fantasied when most of us were born, a voyage of exploration

into space.

Saturn, sailing down river to a Florida cape, a chapter of a legend in the making even now,

a legend being made by many dedicated Americans in all businesses and professions by the National

Aeronautics and Space Administration, the Marshall Space Flight Center, other NASA centers,

and American industry, such as the Chrysler Corporation, manufacturer of the Saturn I

first stage, and the Douglas Aircraft Company, who developed and built the second stage.

This structure, the vertical assembly building, stands on an island off a Florida cape once

called Canaveral, but since renamed for John Fitzgerald Kennedy.

"I believe that this nation should commit itself to achieving the goal before this decade

is out of landing a man on the Moon and returning him safely to the Earth."

Launch Complex 39 at Kennedy Space Center is one of many launch sites developed, built,

and used by NASA to launch manned and unmanned peaceful explorations of space, a complicated

business which presents scientific and engineering problems unheard of a few years ago.

The liftoff of Saturn V and later rocket-launched journeys into space from this launch pad will

one day seem almost commonplace events to some who are living even now.

This seems strange, considering how little time has passed since we began the rocket

research and development program that will make these journeys possible.

On launch Complex 34 at Cape Kennedy, not far from the Apollo Saturn V spaceport, the

first Saturn I countdown began not too long ago on October 27, 1961.

Ten unmanned flights have been made by Saturn I launch vehicles.

Each of Saturn I's ten missions took another cautious, necessary step toward extensive

manned space exploration, not just to the Moon, but even deeper into space.

Only sixteen years before, the research and development program responsible for Saturn

I's success was just beginning in this country.

And so before the drama of the first Saturn launch begins, it might be well to understand

its full significance, to see how deeply the roots of Saturn lie in the history of rocket

flight.

In these early years, there were several groups doing research to develop useful rocket vehicles.

Even as today, the information gained was freely exchanged, rapidly expanding the base

of our knowledge, developing new technology, creating experience and skills from which

we might ultimately move to space flight.

From one of these groups came the first true spaceflight in the world.

On February 24, 1949, a WAC corporal placed a payload outside of the Earth's atmosphere.

As these groups continued to expend this country's rocket technology, more room was required

and rocket researchers began to spread over the United States.

In 1950, the Army research group moved to Redstone Arsenal, Huntsville, Alabama.

Here, the successful Redstone and Jupiter missiles were developed by the United States

Army.

The Redstone and Jupiter, the grandfather and father of the Saturn, for during their

development, the scientific base a technological knowhow necessary for Saturn I were obtained.

In 1957, this knowledge was already being used as men were sketching a vehicle, designed

from the ground up, for lifting the tons of payload required for scientific space exploration.

Predicting the need for a booster with a million and a half pounds of thrust, these rocket

scientists said this need could be met with a cluster of existing Redstone and Jupiter

tanks and Jupiter rocket engines, plus many hours of engineering work to tie them together.

In the fall of that year, the Soviets launched Sputniks I and II.

Long-range projects were temporarily shelved.

On January 31, 1958, only six weeks after Sputnik II, the first American satellite,

Explorer I, was placed in orbit by Juno I, a modified Redstone with a thrust of 83,000

pounds.

Its payload discovered the van Allen Radiation Belt.

With our own capability in space, an established fact, on July 26, 1958, ARPA, the Advanced

Research Projects Agency, approved the program for a clustered engine booster with 1.5 million

pounds of thrust.

A single engine powerful enough to produce this thrust was under development by American

industry in 1958, but neither this engine nor its tanks would be available in time.

Besides, with eight well-proven, properly clustered smaller engines, the failure of

one would not mean the failure of the booster to lift its heavy load.

The clustering of already designed Jupiter and Redstone hardware would save both time

and money as we moved ahead.

On August 15, 1958, the Saturn Project was assigned to the experienced group that had

developed Redstone, Jupiter-C, Juno I, and launched the Explorer satellites.

It was a team geared to conduct its own research, development, and production.

Its highly successful products were literally custom-made in its own laboratories, shops,

and offices, or produced to specification by a relatively few major industrial contractors.

What were only pounds had been lifted before, tons of weight would be blasted into orbit

to explore space, in which we had only begun to pioneer.

The lightweights were having their day.

Soon, it would be time for the heavyweights.

It was time for Saturn I, and in July 1958, NASA was born.

Six months later, the Saturn Project, along with other Army space-related missions became

part of the NASA program.

The size and complexity of Saturn I, the size of the booster alone, and the complexity of

its components presented an enormous technical and scientific challenge.

Another challenge would be presented by the larger complexity of the management and organization

required to sustain a number of other space projects already underway in 1958, and yet,

move on beyond the frontiers of science and engineering.

To produce a series of huge Saturn I rocket boosters in a mere handful of years and to

ensure success would create a revolution in both management and engineering development.

Early in 1959, it was decided to use the liquid propellant H-1 engine, an improved version

of the S-III Jupiter and Thor engine developed by Rocketdyne as the basic element in the

Saturn engine cluster.

Fabrication and assembly of a ground test first stage to prove out the cluster tank

and engine concept was started in mid-1959.

Translating this concept into a working vehicle required proving out new ideas, for there

were many engineers and scientists who had doubts that eight rocket engines could be

fired at the same time in close proximity or that nine propellant tanks could be fastened

together so they would not shake apart during flight.

The second stage, designed and developed by the Douglas Aircraft Company, would be powered

by six liquid hydrogen-oxygen engines.

Selected for its greater thrust per pound of fuel, the super cold liquid hydrogen would

require NASA and industry to develop a new technology.

Much is always made of space technology, the hardware is glamorous, but little is known

of the management program that makes a number of complex projects simultaneously possible.

Three main projects ran like nerves through the Saturn Program, development of the launch

vehicle, development of ground support equipment, and the development of launch facilities.

Each demanded sustained and intensive effort.

Each was a separate project, yet interrelated with the other two.

Through 1960 and into 1961, the Saturn Program continued, developing not only the first flight

vehicle, its engines, equipment and instrumentation, its plumbing, and its circuitry, but also

providing for support and launch facilities.

While the booster was being developed, modification began to a static test stand at Huntsville,

Alabama to captive fire the first stage.

It took a year to convert the 177 foot high structure.

When it was finished, a mockup was installed to check the fit and test out methods for

servicing the booster during static tests.

These static tests began in March 1960, first with two,

then with four,

finally, many weeks later, with all eight engines running.

Then came another change in the management of the Saturn I Program.

In September 1960, that part of Redstone Arsenal transferred to NASA was formally dedicated

as the George C. Marshall Space Flight Center.

President Eisenhower had long ago stated his belief that outer space should be used for

peaceful purposes.

Aeronautical and space scientific activities sponsored by the United States should be,

he said, under the direction of a civilian agency.

Now this was an accomplished fact, and the president had this to say, "We are propelled

in these efforts by ingenuity and industry, by courage to overcome disappointment and

failure, by free-ranging imagination, by insistence upon excellence.

In this fact is proof again that hard work, toughness of spirit, and self-reliant enterprise

are not mere catchwords in an era dead and gone.

They remain the imperatives for the fulfillment of America's dream, not pushbuttons, nor

electronic devices, but superlative human qualities have brought success and fame to

this place."

In that fall of 1960, the planning and construction continued, new test stands at Marshall and

at contractor's facilities, launch and ground support construction at the Cape.

Meanwhile, transporters to haul the bulky boosters from place to place, aircraft to

ferry the bulging second stage and other cargo from coast to coast, barges to float the first

stage down the river and through the Gulf, these and thousands of other items large and

small had to be provided for Saturn I.

With so much happening, new information flowed almost constantly into the Saturn project.

New technical fields were explored, new techniques devised for assembly, fabrication, and other

processes, new decisions were made.

In early 1961, the Saturn I prime mission was changed from launching scientific satellites

to supporting the growing manned spaceflight program.

The first stage, therefore, had to be redesigned to include fins for additional stability,

larger propellant tanks and uprated engines for heavier payloads, payloads of about ten

tons as compared to the twenty pounds of Explorer I. Guidance and control instrumentation was

also moved at this time from the separate stages and combined into a single instrument

unit.

As new technology advancements were made and as scientific breakthroughs were achieved,

change was a constant factor all through the years of Saturn I development and flight,

for these were the years in which valves, pumps, tubing, turbines, tanks, engines, gyroscopes,

accelerometers, and computers all adapted to the rocket environments were developed

and produced.

This created a vast array of instruments, pressure and strain gauges, transducers, telemeters,

beacons and recorders.

The combination of high stress, heat, and vibration demanded creating new alloys of

steel and aluminum, increased the use of other metals, beryllium, titanium, molybdenum, and

tungsten.

The same physical properties making these metal suitable for Saturn also required development

of new methods of working them.

So, new tools based on the latest engineering theory and scientific discoveries were developed.

Magnetic forming,

chemical milling,

electron beam welding, to name a few.

Ground service equipment was developed and produced for test stands and launch sites

springing up like wild cat wails.

Liquid fuel, some commonplace, but others once considered rare and exotic chemicals,

were produced by tons, kerosene, liquid oxygen and hydrogen.

Liquid hydrogen technology, the development of techniques for the manufacturer, delivery,

storage, and use of the super cold liquid hydrogen, with its minus 423 degree boiling

point, reached a scale never attempted before.

The engineering and research lay on the edge of the state of the art throughout the program,

and the skills involved were more diversified than usual.

No single government agency, or even the entire government, could command all the knowledge

and skills needed.

Rather, a combination of government, educational institutions, and private industry was needed.

This led to increasing industrial support for the Saturn I program.

More and more corporate names began to appear on tools and equipment, on containers, on

small parts and components developed in industrial plants scattered across the United States.

More and more names appeared on office doors and building walls, on the smocks of engineers

and technicians.

At the same time, there was another shift in Marshall's philosophy of vehicle development.

A shift in the basic concept of testing vehicles in flight to the idea of testing not only

all components, but the vehicles themselves in ground test facilities, even though these

also had to be designed and built.

The reasons for the shift from flight to ground testing were mainly economic.

Flight testing required a large number of expendable flight vehicles.

The Jupiter Program alone had involved almost fifty flight test vehicles.

The high cost of each Saturn I set a definite limit on the number of these that could be

launched simply for test purposes.

Everything happened everywhere at once.

In Florida, machinists formed parts for a liquid hydrogen rocket engine,

While in Tennessee, wind tunnel blasts roared past a scale model of the Saturn vehicle,

While in California, the dome of a Saturn fuel tank turned on a welding fixture,

And in Huntsville, at the same time, a full-sized Saturn first stage shook in its test stand.

The use of mockups, special stages, flight stages, computers, special test stands, and

other ground test facilities reached the point in the Saturn I program where the success

factor for any given vehicle was exceptionally high.

Success in flight was almost guaranteed, in fact, by the continuous growing and cumulative

effect of testing on the ground.

Research, design, develop, and test, assemble and test and modify, test, reassembled and

test, and mate and test and modify and test again, this was the pattern resulting in Saturn's

reliability, a pattern that flowed through months and even years of constant change and

growth and new research that often began the cycle over again.

Finally, now somehow, but as the result of management grounded in science and technical

competence, the reliable parts, systems, and stages of Saturn I were joined.

In August 1961, the first Saturn flight vehicle was ready for shipment from Marshall to the

Cape.

Its stages and payload body moved from Marshall shops to docking facilities on the Tennessee

River for loads aboard the barge, Palaemon.

On August 5, the barge began its slow trip down river, through the Gulf of Mexico, and

around the Florida peninsula to the Cape.

There, the stages were unloaded.

Assembly of the first flight vehicle began in the middle of August, towering 162 feet

on its launch pedestal at Launch Complex 34.

Loaded with propellants, its weight on liftoff totaled 460 tons.

Before that happened, however, other matters were attended to.

On September 13, 1961, contracts were let for the Army Corps of Engineers to build Launch

Complex 37, an additional Saturn I launch complex.

At Marshall, designs were being drawn of bigger and more advanced Saturn vehicles to launch

manned explorations of the Moon and deeper space.

In October, NASA acquired Michoud Ordinance Plant at New Orleans.

Here in this manufacturing facility, Chrysler would build the last two Saturn I first stages,

and Chrysler and other NASA stage contractors would build first stages for more advanced

Saturn vehicles.

Out west, an upper stage manufacturing and assembly site for advanced Saturn vehicles

was planned for Huntington Beach, California by Douglas Aircraft Corporation.

Such planning extended into the years ahead.

For the launch of the first Saturn vehicle from Launch Complex 34 on October 27, 1961,

the Saturn I and advanced Saturn flight programs were only beginning.

The first Saturn launch was almost flawless.

None of the ten flights presented significant problems.

The prime mission of the first four Saturn flights was to verify first stage design and

prove out launch facilities.

There was some distinctive differences, however, between these early flights.

A secondary mission was assigned to SA-2 and -3, Project Highwater.

Ninety-five tons of water ballasting the dummy second stage was deliberately exploded in

space to see what effect this would have on the ionosphere.

The explosion caused only a temporary disturbance.

SA-3 also carried the first full propellant load of 750,000 pounds.

A secondary mission for SA-4 was the testing of its engine-out capability.

One engine was cut off in flight to see if the vehicle would operate on the remaining

seven.

This it did, scoring the fourth Saturn I success.

On January 29, 1964, NASA launched the fifth Saturn I.

The liquid hydrogen fueled second stage, flight tested for the first time functioned perfectly.

First stage engines shut off as planned.

Onboard cameras recorded second stage separation and ignition.

The stages' engines burned for eight minutes with attached instrument unit and sand-filled

nosecone attained orbit as an Earth satellite.

This almost nineteen ton satellite was the heaviest ever orbited by the free world.

SA-6, launched five months later on May 28, 1964, was the first to put into orbit an Apollo

boilerplate command and propulsion module.

The Apollo spacecraft is being developed by the Manned Spacecraft Center, Houston, Texas.

There was another first for SA-6, activation of a new guidance system This system, developed

at Marshall, was unique and complex, fully exploiting the state of the art.

It was designed to constantly monitor the flight path and compute completely new trajectories

as the flight progresses.

This guidance system, active for the first time on the sixth flight, successfully corrected

an unexpected deviation from the planned trajectory caused by the premature shutdown of one engine.

As a result of this kind of excellent performance, SA-6 was the last Saturn I R&D vehicle instead

of SA-10, as originally scheduled.

The rest will be operational vehicles.

On September 8, 1964, SA-7 launched the second Apollo boilerplate spacecraft into orbit,

testing its structure and design and demonstrating its compatibility with the launch vehicle.

All other major test objectives of this flight were also met.

SA-9 placed in orbit the first Pegasus, a meteoroid technology satellite, on February

16, 1965.

Pegasus, developed for NASA's Office of Advanced Research and Technology by Fairchild-Hiller

Corporation under the management of Marshall, was shrouded by the Apollo boilerplate command

and propulsion modules.

Onboard television recorded shroud ejection after second stage shutdown.

Then, Pegasus extended its 100 foot aluminum-plated electronic wings and, like a moth emerged

from its cocoon, moved on, its wings spread to measure the hazards of meteoroids in space.

Data from Pegasus conferment that the Apollo spacecraft design is adequate to protect astronauts

from meteoroids.

SA-8 launched at night, orbited the second Pegasus satellite.

SA-10, the final Saturn I flight vehicle, launched Pegasus 3.

The ten vehicle Saturn I Program was without precedent in the history of spaceflight.

Each vehicle of the program established new levels of rocket reliability far above those

of sixteen years earlier.

A program providing the foundation for the follow-on advanced Saturn programs.

The uprated Saturn I

and Saturn V.

That's the story of the first ten lives of Saturn I, each one an essential step in

the development of large launch vehicle science and technology, each flight a vital step toward

the future of man's extensive exploration and utilization of the universe for the benefit

of the entire human race.

For more infomation >> First Ten Lives of Saturn I - Duration: 28:01.

-------------------------------------------

Citroën Grand C4 Picasso 1.6 E-HDI LIGNE BUSINESS 7P Luxe uitvoering / Diesel / Automaat Staat in H - Duration: 1:12.

For more infomation >> Citroën Grand C4 Picasso 1.6 E-HDI LIGNE BUSINESS 7P Luxe uitvoering / Diesel / Automaat Staat in H - Duration: 1:12.

-------------------------------------------

Justin Bieber vs Lionel Messi | Transformation From 1 To 30 Years Old -2018 | russia cup 2018 live - Duration: 5:42.

Justin Bieber vs Lionel Messi | Transformation From 1 To 30 Years Old -2018 | russia cup 2018 live

Justin Bieber vs Lionel Messi | Transformation From 1 To 30 Years Old -2018 | russia cup 2018 live

justin, bieber, cars, collection, cristiano, ronaldo, car, luxury, richest, rich, comparation, comparasion

For more infomation >> Justin Bieber vs Lionel Messi | Transformation From 1 To 30 Years Old -2018 | russia cup 2018 live - Duration: 5:42.

-------------------------------------------

Cristiano Ronaldo vs Neymar Jr | Transformation From 1 To 33 Years Old -1028 | russia world cup 2018 - Duration: 4:30.

Cristiano Ronaldo vs Neymar Jr | Transformation From 1 To 33 Years Old -1028 | russia world cup 2018

Cristiano Ronaldo vs Neymar Jr | Transformation From 1 To 33 Years Old -1028 | russia world cup 2018

Cristiano Ronaldo vs Neymar Jr | Transformation From 1 To 33 Years Old -1028 | russia world cup 2018

Cristiano Ronaldo vs Neymar Jr | Transformation From 1 To 33 Years Old -1028 | russia world cup 2018

For more infomation >> Cristiano Ronaldo vs Neymar Jr | Transformation From 1 To 33 Years Old -1028 | russia world cup 2018 - Duration: 4:30.

-------------------------------------------

Michael Jackson vs Chris Brown | Transformation From 1 To 50 Years Old - 2018 | chris brown 2018 - Duration: 7:33.

Michael Jackson vs Chris Brown | Transformation From 1 To 50 Years Old - 2018 | chris brown 2018

Michael Jackson vs Chris Brown | Transformation From 1 To 50 Years Old - 2018 | chris brown 2018

Michael Jackson vs Chris Brown | Transformation From 1 To 50 Years Old - 2018 | chris brown 2018

For more infomation >> Michael Jackson vs Chris Brown | Transformation From 1 To 50 Years Old - 2018 | chris brown 2018 - Duration: 7:33.

-------------------------------------------

[FREE] J Cole Type Beat 2018 - ''Drama'' | K.O.D l Instrumental - Duration: 3:31.

For more infomation >> [FREE] J Cole Type Beat 2018 - ''Drama'' | K.O.D l Instrumental - Duration: 3:31.

-------------------------------------------

The Universe Planets - Death of the Milky Way - Duration: 42:30.

For more infomation >> The Universe Planets - Death of the Milky Way - Duration: 42:30.

-------------------------------------------

J'aime Disney - Duration: 0:32.

For more infomation >> J'aime Disney - Duration: 0:32.

-------------------------------------------

Mercedes-Benz SLK-Klasse SLK200 K. Automaat LEDER AIRSCARF STOELVERW. CRUISE '07 - Duration: 1:12.

For more infomation >> Mercedes-Benz SLK-Klasse SLK200 K. Automaat LEDER AIRSCARF STOELVERW. CRUISE '07 - Duration: 1:12.

-------------------------------------------

Soy Luna 3 | The guys ask Matteo to sing with them at the Open (ep.9) (Eng. subs) - Duration: 1:49.

Very good.

That sounded really good.

Thank for making me feel part of a band although it was just for a minute.

What would happen if it wasn't just for a moment?

What are you saying?

I was backstage and I heard you.

I imagine we can play with the four of us at the Open Music.

With a special guest in the Roller Band.

The Roller Band featuring Matteo.

So you're asking me to sing with you at the Open?

It seems like a really good idea.

We only have to know what you say.

I say yes.

Me too.

We all agree, we just have to know

what you think.

Will you sing with us?

For more infomation >> Soy Luna 3 | The guys ask Matteo to sing with them at the Open (ep.9) (Eng. subs) - Duration: 1:49.

-------------------------------------------

Selectmen 26apr18 - Duration: 36:18.

For more infomation >> Selectmen 26apr18 - Duration: 36:18.

-------------------------------------------

Will - Back to the Future! - Duration: 7:13.

Hi everyone! Today we are going to talk about how to use "will" in our sentences.

So let's learn when we should use "will" at first.

1)We can use "will" when we are giving or asking information about the future.

Here I have two examples.

As you can see, we are giving information about the future!

2) We can use "will" for spontaneous decisions, the ones that are made at the time of speaking.

For example, you might make a quick decision and say "Wait! I will help you." after you see a lost child.

3) We can use "will" when we are making predictions about the future.

We are predicting that!

4) We can use "will" to offer to do something.

5) We can use "will" to agree to do something.

6) We can use "will" to promise to do something.

7) We can use "will" to make requests.

8) We can use "will" to refuse to do something.

When we want to refuse to do something,

we can say that WE WILL NOT do that thing.

"Will not" is shortened as "won't".

So "won't" is the shortened form of will not,

and the negative form of "will".

As you can see, sentences with "will" can be used for many reasons!

So it's really important to know how to build sentences by using "will".

So, here are our pronouns.

In your sentences you can write will right after the pronouns.

It's usage is not different for each pronoun.

It's always will.

For she, you can write she will.

For I, you can write I will.

For they, you can write they will.

See? It doesn't change.

After that, you should continue with a verb.

If you want to ask questions by using "will", you can do it like this:

"will" can be the first word of your sentence, followed by a pronoun.

Let's see some examples.

"Will" is the first word,

followed by he, which is a pronoun,

and then prepare, which is a verb.

"Will" is again the first word,

and then it, which is a pronoun,

and then rain, which is a verb.

"Will" is the first word,

she is a pronoun,

accept is a verb.

"Will" is the first word,

they is a pronoun,

and then be, which is a verb.

Or, you might want sentences with specific words that are making questions, like

who, what, when, where...

Let's see some examples.

So as you can see, when you are using where or when,

it will be directly followed by "will"

and then the pronoun,

and then the verb.

But when you are creating a question with "who", it will be followed by "will"

and then the verb,

because this question is ASKING the pronoun!

If you want to use "will not", you can simply write "won't". After that, there should be a verb.

Sleep is the verb here.

Let is the verb.

Care is the verb.

Now we know how to build a sentence by using will.

Let's learn about the contractions as well.

Instead of writing I will, you can write I'll.

you will can be shortened like this, and so on.

While writing a negative sentence, there is a point that you should not miss:

If you write,

this will be correct.

However, if you write:

this will be incorrect.

"Not" should come right after "will" in case of a negative sentence. Please pay attention to this.

I hope this topic is more understandable for you now. Thanks for watching my video!

For more infomation >> Will - Back to the Future! - Duration: 7:13.

-------------------------------------------

UPSTAIRS INFERNO (2015) - Trailer [HD] (www.UpstairsInferno.com) - Duration: 2:27.

The Up Stairs was my safe haven.

It was the one I loved most of all.

It was the usual Beer Bust:

Everybody's laughin' and cuttin' up and talkin',

and you know, and dancin'.

All of a sudden, you heard people screaming.

From laughter to screaming.

For years, it was the "Worst Mass Murder of Gays and Lesbians in American History".

I could see the glow of the flames,

and all of a sudden, the fire just ripped across the Up Stairs Lounge.

Bodies burned so bad that there was

nothing left but the bones and they were seared to the floor.

I saw the Reverend's body hanging halfway out the window.

I knew him...

I never got to grieve.

On the street after the fire,

it was like just chaos!

The smell of a burning building.

The smell of burning flesh.

And I just kept looking through the crowd, hoping to see Reggie and Adam.

Everybody was trying to find out who had made it out,

who had perished.

I didn't just lose my lover, I lost 32 of my best friends too.

We were damaged emotionally.

I tried to sleep at night.

I'd wake up with nightmares and see those people burning.

I kept expecting him to come through the door.

I'd get up and fix breakfast and

lay his clothes out for work and everything, like he was going to get up and go to work.

I don't think I ever felt loneliness in my life until that point.

New Orleans has robbed me of my innocence and my childhood!

I would tell God, "thank you! You saved my life."

"Thank you for not letting me burn up."

And I'd say, "but next time, just let me die."

It shows what can happen through hate.

Somebody was angry and full of hate and he took it out on everybody in there.

No comments:

Post a Comment