An Giang info: Youtube daily report Dec 4 2018

I am the new King

I already beheaded King Luceo

and your Father

Adolfo! You'll pay for this

and your beloved Laura

has agreed to exchange vows with me

Now you're best days are behind you

-(Borj) Let me help you -(Roni) Really?

For more infomation >> G-Mik Season 3: Basty assaults Borj - Duration: 0:50.

-------------------------------------------

G-Mik Season 3: Basty and Borj make amends - Duration: 1:21.

You know guys, no matter how long i can endure this

I think I need to talk to both of you

Please, before the play starts, I hope you guys are in good terms by then

You know, Roni that's a piece of cake

I want Basty to know that I have no intentions of hurting Roni

and I'm your friend

You know, dude whoever that may be, I will always be ready to help

I know that I became impulsive

It's nothing Dude

I feel ashamed of what I did to you

I feel ashamed as well

I also made a mistake

I forgot to tell the Director about what you should do

I'm sorry bro

It's ok dude

For more infomation >> G-Mik Season 3: Basty and Borj make amends - Duration: 1:21.

-------------------------------------------

G-Mik Season 3: Borj and Roni (#4) - Duration: 0:59.

Did you know that I've been looking for you all over the place

So that we can rehearse

I grabbed some newspapers so that I can help Tonsy with his business

But he got exhausted

Me and Missy also helped Tonsy with his savings

by selling those newspapers

You could've told us about it so we can help you

Don't worry, I have extra money with me so

I'll help you

Really?

Let me bring the bag

Sure thing

Can you carry it?

For more infomation >> G-Mik Season 3: Borj and Roni (#4) - Duration: 0:59.

-------------------------------------------

G-Mik Season 3: Basty interrupts the rehearsal - Duration: 1:05.

Ok, Roni this is a scene where Adolfo will take advantage of you

Maybe I can't take it anymore If an evil comes here as well as your family relative and protest

Are you ok with it?

Now do good on this rehearsal

You won't escape this time

No one will hear you scream

You traitor!

You're mine now, Laura!

Let go of me

What are you doing?

You're not even in this scene, what are you doing? Do you even know how to read a script?

You're crazy!

Get down!

I said get down!

What's with you?

Let's rehearse again

For more infomation >> G-Mik Season 3: Basty interrupts the rehearsal - Duration: 1:05.

-------------------------------------------

Mazda 3 Skyactive-G 120pk Aut navi/cruise/clima! - Duration: 1:05.

For more infomation >> Mazda 3 Skyactive-G 120pk Aut navi/cruise/clima! - Duration: 1:05.

-------------------------------------------

UW Allen School Collquium: Research in Data Management - Duration: 55:40.

DAN SUCIU: Hello.

Welcome everyone to the computer science colloquium.

Today it's my great pleasure to introduce six speakers

from the Database Group who will tell you

about their project in data management.

So in case you haven't noticed this,

big data is changing the world today.

It helps to make scientific discoveries.

It helps cars drive themselves.

It helps save the environment, and it helps make discoveries

in the medical field.

So the students in our group, they

conduct research that address the new challenges in big data

management.

We have a secret sauce, and the secret sauce

that exists in databases and data management in general

is a separation of the what from the how.

What is a declarative level.

This is a logic, theoretical level

where we express how we want the data to be managed and stored

and queried.

The how is a physical systems level, algorithmic level

where we design the effective methods for storing

and managing the big data.

So today you will hear six talks from students and postdocs

in our group.

The first four talks are more at the border

between the what and the how.

This is the beautiful stuff where

the two extremes are combined.

The last two talks are about the what.

They show you the power that we get from a high level

declarative approach to data management.

I should mention that Babak Salimi is--

he had presented a video because he's

stuck in Canada without a visa.

He's waiting for his visa renewal.

And I will also ask you to withhold your questions

until the end of these talks.

So the first talk will be by Brandon Haynes on LightDB.

BRANDON HAYNES: Hey, everyone.

So I'm Brandon Haynes.

Today I'll be talking about LightDB, a data management

system that targets virtual, augmented, and mixed reality

video applications.

This is work in conjunction with Amrita

in the architecture lab, Armin, who's

over at Oculus, along with Magda, Luis, and Alvin

who are all faculty here at CSC.

So you might be wondering why I didn't give a talk at the VR

session a couple of weeks ago.

It's actually not the case that I was kicked out

of the graphics lab, but we conceptualize these sorts

of video applications as being increasingly a data management

problem.

So we're seeing a lot of really exciting things come out

of these communities, and they're actually

being delivered to real users.

So we need to think about things like scaling,

being able to work on heterogeneous hardware,

and other aspects that us in the data management community

are really good at solving.

So there's a whole taxonomy of these sorts of VR and AR style

applications out there.

You know, this is kind of a very rough taxonomy

I have on the screen here where we have--

hopefully you guys are familiar with a couple of these types.

I'll just quickly talk about each one

in a little bit of detail.

So here we have a 360 video.

These are videos that are captured

at a fixed point in space, and then

when you go and view that video you can, of course,

adjust your orientation.

But you can't move away from the point in which that video was

captured.

And you can see here I'm using the mouse

to rotate my orientation, but in the real world,

we would have a headset on which would have tracking software

with it that would adjust our orientation automatically.

So these are all over the place, and these are pretty widely

deployed, even on mobile devices and things like that now.

But far more exciting are these things

called light field videos, a really exciting area

at the cutting edge of VR.

And here, it's a little bit difficult to tell,

but you can actually translate around in space.

And so you can see the user just moved

outside of the area that was captured in this light field

video.

Modern light field video cameras can

capture about a cubic meter of information,

which allows for a good amount of translation in space.

And this greatly increases immersion,

as you might imagine.

You can do things like experience parallax

for distant objects.

Mirrors reflect when you move, and you

can see the reflection adjust.

Iridescent things shine.

Super important for immersion and a really exciting area.

On the AR, MR side--

so here's a prototypical example of a mixed reality video where

we mix synthetic video with real world objects

like a table or a couch.

And then finally, here's kind of a Hello World augmented reality

application where we're in a pet store.

We've captured some kitten videos.

We're running some sort of classifier on top of that video

and overlaying bounding boxes.

So you may notice this is, in fact, not an elephant or a dog.

We in the data management community

consider that to be the ML people's problem and not ours.

So you can talk to them about that.

So as you might imagine, developing applications

of a space is truly terrifying.

So 360 videos are an order of magnitude

larger than their 2D Netflix counterpart

because you have to deliver that entire sphere of video

rather than just the part that you're looking at.

Light field video, another order of magnitude beyond that.

That cubic meter of light field camera I mentioned a moment ago

puts off about a half terabyte of video data

per second, which is a formidable challenge

to keep up with.

This leads to programs that are extremely difficult to program

and optimize.

They can only be developed by domain experts who are hand

crafting optimizations.

And perhaps the most concerning thing

is that as new innovations roll off the research pipeline,

shoehorning them into these giant brittle applications

is a substantial software engineering challenge.

So with that background in mind, we've

introduced a system we call LightDB, a data management

platform for these sorts of virtual, augmented, and mixed

reality video applications.

LightDB is a full stack data management

system where you give the system a declarative program.

You tell us what you want in your application.

And we leave it up to LightDB to figure out

an efficient execution plan to realize that intent,

This leads to programs that are much smaller and simpler

to express.

We'll see an example of that in a moment.

And we think it runs faster than everything except the most

expert programs out there.

And so just to motivate this in the couple of minutes

I have left, let's go back to this Hello World

augmented reality application and think about how

we might develop this.

So in today's world what would we do?

We would open up our favorite editor.

We might decide we want to use something

like FFmpeg, which is a popular video processing framework.

And we start writing some code.

And we start writing more code, and more code, and more code.

And this is the reality of these applications today.

And this is a minimal viable example

we're seeing scroll by here.

On the other hand, in LightDB you'd

write something that looks about like this,

well, actually, looks exactly like that.

Where it-- this is almost a one-to-one correspondence

between how we might describe solving the problem

and solving it, where we decode some inputs

from a remote source or from on disk.

We apply some sort of detection algorithm on top of it.

We union the resulting bounding boxes with the original input

video, and then we write it to disk or send it to a client.

Very succinct program, especially

compared to the imperative version.

And LightDB is able to actually efficiently execute this.

So compared to the version you guys see in the left,

is you get about a 3x performance benefit

relative to what we can get out of LightDB.

And you get all that for free by letting LightDB

convert this declarative program into an optimized execution

plan.

So how does LightDB perform that conversion?

Here is kind of a Hello World virtual reality

application I'm going to use to go through that

at a very high level.

And so here what we're doing is we're taking a 360 video,

overlaying a watermark, converting to grayscale,

and then writing to disk.

What LightDB does is it converts this

into a logical execution plan drawn from it's rather

modest set of operators.

And here there's a one-to-one correspondence

between the invocation and the declarative version

and the logical plan, but that's not always the case.

And it does what you'd expect, decode some input videos,

scan the watermark from disk-- it's a little bit hard to see

there--

union them together, transform it to grayscale, and encode it.

And now LightDB is going to take this logical version

and convert-- and explore a space of physical execution

plans and hopefully choose the most efficient one.

And so here's one potential execution plan

it could choose where it moves everything over to a GPU,

decodes everything in a GPU, runs some CUDA kernels,

and then re-encodes.

And that's a reasonable plan.

And that, in fact, this is exactly what FFmpeg does.

On the other hand, LightDB can explore a much richer space

of potential plan, such as this one, which I'm not going

to go over in great detail.

But the idea here is that if this Sharks video has

a temporal index associated with it, which is almost universally

the case, what we can do is we can

take small pieces of that video and shuffle it

to multiple GPUs, and then essentially run

a similar pipeline in parallel where we decode,

we run a CUDA kernel, and we re-encode.

On the other hand, the watermark is super tiny.

It's the same frame extended over an infinite time span.

So what we can do is we can just broadcast that to all the GPUs

and bring those things together.

In fact, we do that at the top were

we can catenate the resulting chunks together.

The performance of this-- so double the GPUs, we actually

get double the performance out of it

by allowing LightDB to explore this space of plans

relative to not using the index or using FFmpeg.

So I only have a few seconds left.

The architecture of LightDB is a rather straightforward data

management system.

I'm just going to highlight a couple of things.

So of course, we know we have this planner that

converts the logical plans to physical execution plans.

We know we have an optimizer that

draws from a rich set of physical operators

that target various hardware, et cetera.

And we have a storage manager that I don't have time

to talk about today that achieves

a considerable compression ratio for stored videos.

So just to summarize--

OK, so this has been LightDB, a data management

system for virtual, augmented, mixed reality videos.

Smaller declarative programs, free optimization,

more efficient execution than the state of the art

programs out there.

Thanks for listening.

DAN SUCIU: The next speaker Cong Yan.

She will tell us why the internet is slow,

and what you can do about it.

CONG YAN: Thanks for the introduction, Dan.

Hi, I'm Cong, and I work with Alvin Cheung.

So today I'm going to talk about designing in-memory data

storage for web applications.

So we use web applications every day.

We read news online and work with other people online.

But don't you hate them when it takes forever

to load, like this?

So this web application is slow, not because of a network.

Network is usually fast.

They are slow because they're allowed the interaction

with the database.

So the web app usually uses three tier architecture.

The front end is usually a web browser that--

it uses an HTTP request to the application server,

the middle tier.

And the back end is a database that

stores the data persistently.

Since the application logic is often

developed using object oriented language like Python or Java,

it often used an object relational mapping framework,

the OIM framework, to generate SQL queries

and translate relational data back to objects.

We performed 12 open source web applications--

we profiled 12 open source applications

to see how well they perform.

So these applications are popular with many Github stars.

Some of them are pretty well known

like GitLab and OpenStreetMap.

So for this applications, even with a small amount of data

less than one gigabyte, over three pages

takes more than two seconds to load.

And for these slow pages, they spend

over 80% on the application server, especially the OIM

framework and also the back end database.

Then we look into why this is slow.

So there are two major causes.

The first is how the queries are written because these RM

frameworks usually provide similar APIs with very

different performance.

For example, here, this is a foreign application

developed using Ruby on Rails.

It wants to show a list of certain stories

and then calculates a count of the stores.

So it first issues a SQL query to retrieve the stories

and then store them in this favorable s.

Then, if the application writes s.size,

the count of the stories is computed

using in-memory objects.

However, if the application writes s.count,

it performs the same thing, but the count

is done by you showing a count query to the database.

So these two APIs have the same functionality,

but the count is less efficient than size because

of this extra query issued.

So these APIs are usually very confusing

and they result in inefficient queries.

The second reason is how the data is stored.

For example, if the app wants to show a list of users

and each containing its stories.

So to answer this query, the back-end stores two tables,

the user table and the story table.

To answer this query it first performs

a join on these two tables.

And then converts this relational result

into nested objects.

So if there are a large number of users and stories,

both the join and the deserialization

can be very slow.

So if we can precompute and restore

this join without in-memory, we save the time to do the join.

And further, if we can store this nested object

theoretically, we save the time to de-serialize.

However these nested objects may not

be optimal for all the queries in the application, for example

a query that select the stories.

So it is challenging to figure out the best storage model

for the whole application.

To solve this challenge we built Chestnut, an in-memory storage

designer to optimize the overall query performance

subject to a memory bound.

Instead of the low level, inefficient interaction

used by today's OIM, Chestnut proposed a new language.

This declarative language is designed

to cleanly and easily express common object queries.

The query result are directly objects

instead of relational data.

Even better, this language can be used to optimize.

It can be used to leverage, to customize in-memory storage.

And the interaction with the database

is more efficient because the read queries can

be answered using this in-memory storage,

while only the write queries go to the back-end database.

This optimization is challenging because you

need to design a new search space and a new search

algorithm.

Chestnut proposed a new search space that includes--

that explores both relational and non-relational storage

options.

So it considers using normalized table and indexes

in the relational world, and the nested objects and the pointers

in the non-relational world.

It devises a new search algorithm

that use bounded verification to find out

the storage for each individual query,

and then formulates into the ILP to find out the sharing of data

structures among queries.

To use Chestnut, a developer simply

needs to provide classes and queries declared

using Chestnut language as well as a memory bound.

And Chestnut will generate C++ code for both the storage

and the queries.

We evaluated Chestnut on three open source applications

originally built with Ruby on Rails,

and used MySQL as a back-end.

These applications include Kandan, a HipChat-like chatting

application, Redmine, GitHub-like project management

application, and Lobsters, A Hackernews-like foreign

application.

For this reapplication-- this figure

shows the average time of these applications.

The query time is composed of the time

to retrieve the data for the query,

and the time to deserialize that data into Ruby objects.

So as we can see from this figure,

Chestnut is able to speed up the query up to 26 sects.

And it's faster in both the query answering

and the deserialization.

And for all of these applications,

it takes less than an hour to find out the best storage.

Next I'll show an example of what

data structures Chestnut is able to find

for an individual query.

So this is a query that shows a list of tags

and then counts the number of stories

written by a particular user for each tag.

To enter this query, the original application

needs to join three tables, the tag table, the mapping

table that says which story has reached tag, and then

the story table.

And then it performs an aggregation and--

performs a group and then an aggregation.

It takes 0.8 seconds to finish this query.

Instead, Chestnut stores only the active tags.

And inside each tag it stores the user ID and the story count

of that user.

So to answer this query it only needs to scan the tags

and then do a binary search on the story

ID to figure out the story count.

And it only takes seven milliseconds to finish.

So to conclude, we show that many web applications

have performance issues due to the mismatch between object

and relational data model.

So we built Chestnut, an in-memory storage designer.

Chestnut has a new declarative language,

and uses a new search algorithm to explore both relational

and non-relational storage options.

And it is able to significantly improve the query performance.

Thanks.

DAN SUCIU: Thank you, Cong.

The next speaker is Maaz Ahmad.

He will tell us how to generate automatically

very difficult programs on distributed data.

MAAZ AHMAD: Thank you, Dan.

Hi everyone.

So I'm Maaz.

I also work with Alvin.

And the problem that I want to talk about today

is-- basically, the high level problem that we want to fix

is that we have all these new data processing frameworks

and DSLs being developed for new hardware or new abstractions,

and we want people to be able to update or adapt

these systems more easily.

In specific, I'm going to be talking about Casper, which

is a tool for generating MapReduce applications

from sequential Java implementations.

So the first question is, why do we even

want to do this sort of translation?

So imagine if you have a sequential application

or legacy application that exists,

and you use this application to do some data processing,

maybe build some visualizations.

Now as the amount of data increases,

that application might become too slow or might stop working.

Of course, you can scale your application

by running it in a parallel and distributed setting.

The good news is the database community

has built a lot of tools that can run your applications

in this struserberg setting.

The problem is, for your sequential application

to leverage these optimizations, it

must be rewritten using the framework's provided API.

So in order to use these frameworks,

you then have to rewrite your applications.

The first option, of course, is that you could manually

sit down and rewrite all of your code.

This is, of course, time consuming.

It's tedious.

It requires expertise not only to understand the input code

but also the target frameworks that you want to use.

And of course, you could introduce bugs into your system

when you do this sort of translation.

So what we want to do is we want to just completely automate

this process by building a compiler that

does this translation for you.

Why is it difficult to build compilers in this manner?

So traditionally, whenever we're building compilers,

we use syntax directed rules.

What this means is the compiler will scan the input code

looking for code patterns.

For example, here's a code pattern that might look for.

And every time it finds a code pattern that

syntactically matches this--

a good parent that matches syntactically this parent

that you have, it will just simply rewrite

it using some operators from the target language.

So for example, in Spark, it will use filter and the union

operators.

For good fragments that are simple, it's easy.

However, as the data processing algorithms get really, really

large and they get more complex, the set

of rules that you need to do the translation become unobvious

and in fact reasoning about these rules

becomes almost impossible.

So the reason we believe It's hard to use re-write rules

to do this translation, is that the code that you

want to generate, the specifications for that code

are provided to you in terms of legacy code, which

is written in low-level, messy languages that

are very complex.

What if, instead of the specifications

were provided in this low-level language, what

if they were provided in a cleaner, high-level simpler

language?

For example, if the specification for the programs

provided in a functional language

that just had map and reduce of primitives.

Now generating the Spark code or Hadoop

or Flink code from this specification

is much, much easier.

However, we don't have the specification in that cleaner

form.

So essentially, what we need is a way

to convert the legacy programs and extract the specifications

out of them.

And the way we want to do this is by using program synthesis.

So a quick primer on what program synthesis is, so

if you have some piece of code and you

want to use program synthesis to do to the translation, the idea

is to consider the space of all possible MapReduce programs.

So say this shape represents all possible programs

or specifications that you can write

using MapReduce primitives.

And within this space maybe there

is a few programs that actually produce the same outputs given

the same inputs for all inputs.

In program synthesis, we simply search for these programs

that have the same outputs.

If you can find these programs, and if he

can verify that for all inputs they produce the same output,

we can essentially treat those programs

as a translation of the input code.

So the big question with this sort of approach is,

how can we make this scale?

Of course, the size of the search space that you are

considering is extremely large.

So how can you actually do this in a reasonable amount of time?

So I'm just going to show some quick high level

ideas for how you might make the search manageable.

The first and probably the most important idea

is to design, very carefully, an API for expressing

the specifications.

Since we're doing the search, you

can imagine doing the search directly

in the target framework's API.

So for example, if you to generate Spark code,

you could synthesize the program or search for the program

written in that API.

However, those APIs were not designed

with synthesis in mind.

So for example, Spark has over 80 high-level operators,

and searching in them is very expensive.

In this instance, for example, we

were able to design an API that could

capture the same semantics using only three operators.

And that reduces the search space significantly.

Secondly, you want to use some static program analysis.

What essentially this means is you look at the program code

automatically, and you try to generate heuristics about it.

So you might specialize the search

by saying that if the input type of the program

is a list of integers, you don't want

to consider any MapReduce program that

operates on a different type.

Similarly, you might want to specialize a search

or like bias a search by saying that only search for programs

or search for those programs first that

use the same operators like Edition.

Third, you can use incremental search.

And the idea here is to, instead of searching

for the entire space at the same time, you break this space down

and you search incrementally.

And this becomes really powerful when you

introduce cost-based pruning.

And the idea here is that you order these subspaces

by the cost of the programs that they generate.

So you can then search for more efficient programs first.

And if you can find one of those,

you can just prune away the rest of the search space.

So this idea or this technique that I've explained so far,

we actually implemented this in a tool called Casper.

Casper takes as input unannotated

Java sequential code and generates--

and is able to generate code in three MapReduced frameworks,

Spark, Hadoop, and Flink.

For evaluation, we accumulated a bunch of benchmarks, about 55,

from prior works and open source implementations

of different algorithms.

These include common statistical and mathematical functions,

as well as big data workloads that are popular.

Within these 55 benchmarks, about 100 code fragments

were found that were doing data processing.

And from those 100, about 82 Casper

was able to translate completely automatically.

The ones that failed were either because it was taking too long

to search or because the API that we had

was not expressive enough.

So the first evaluation that we did

was, of course, to compare with a rule-based compiler.

This is a prior work compiler called MOLD.

On the x-axis, as you can see, a subset

of the benchmarks that we've picked.

And on the y-axis you see the improvement

after compiling the code using Casper or the MOLD compiler.

The first thing you'll notice is that Casper,

which is in yellow, can translate

way more benchmarks than the rule-based compiler can.

This is not very surprising.

What's more interesting is that even for the benchmarks that

MOLD can compile, Casper finds more efficient implementations.

However, the ultimate sort of benchmark that we want to test

is against a human.

So how does Casper compare against a human?

We hired online developers through a freelancing platform.

We had them manually rewrite these applications.

And then we compared the performance

of the handwritten implementations

versus what we generated.

As you can see, in most cases the performance

was extremely competitive.

There were a few cases like the 3D histogram

where the user or the manual developer

exploited some domain specific knowledge to optimize

the program better than what Casper could do.

It takes Casper about 10 minutes on average

to translate one code fragment.

And the median time is even lower.

So for really simple programs, it can actually translate them

really, really fast.

And this does not count any of the overhead involved

in hiring these developers and making sure

that they do the work on time.

So the key takeaways are, you can actually

use compilers to do this sort of high-level transformations.

And with the speedup of about 16x,

Casper is even competitive with hand-written translations.

So here's a link if you want to read the paper or use the demo.

I'll take your questions at the end.

Thank you for listening.

DAN SUCIU: Thanks, Maaz.

The next speaker is Jennifer Ortiz.

She looks at the very difficult question,

how to convert from the what to how using deep learning.

JENNIFER ORTIZ: Thanks, Dan.

So today I'll be presenting a project called DeepQuery,

where we're learning subquery representations

for query optimization.

And this is joint work with my advisor Magda Ballazinska

Johannes Gehrke, and Sathiya Keerthi.

So generally, whenever we have questions about our data

we rely on our database systems to help answer these questions.

And these systems help us store, maintain, and manage our data.

But the setup of systems are generally

constrained by allocated resources.

And with this limitation, the database system

must come up with what we call an optimal query plan.

And you can think of this query plan

as essentially a program that is able to efficiently fetch data,

run computations on the data, before providing

the final result.

So just as a quick example, say we have,

for example, three relations, the Customers, Orders,

and Regions table.

We have this query where we, say,

we want to fetch all the customers

from Arizona that have orders within the last 10 years.

Now, to run this type of query, we

would need to join across all of these three relations.

And when the optimizer looks at this query plan,

it needs to consider, well, in what order should I actually

be running this join?

For example, we can first join the Customers and Orders table,

which will get us some intermediate result,

and follow that with a join with the Regions table.

Or we could first join the Customers and Regions table,

which could give us a smaller intermediate result,

and then follow that with a join with the Orders table.

As you can imagine, depending on your query or the number

of relations that you have in your data set,

you can have really complex query plans,

which makes it even harder for the optimizer

to come up with an efficient plan.

And essentially, the idea of query optimization

has been a core problem in the database community

for several years.

And this is especially hard because these optimizers

have to come up with these good plans with really

limited understanding about the underlying data.

And these plans need to be efficient with respect

to the resource consumption and also the runtime.

And essentially-- this problem essentially

boils down to what we call the Cardinality Estimation

Problem, where the cardinality estimation is

the process of estimating the number of rows

that are returned by a query.

And this is essential for producing

these optimal join orders.

And, unfortunately, because the optimizer needs to be efficient

it makes really simplifying assumptions about the data.

So for example, if we have this query where

we're joining across all these three relations

and we're filtering the Customers and Regions table,

it might make some assumptions about the correlations

of these columns in these relations, which

results in some inaccurate cardinality estimation for some

of the intermediate results, which then provides us

with a suboptimal query plan.

So there must be some other approach

that we can use to solve this problem.

So I likely don't need to motivate why deep learning has

been useful, but we've seen how it's been successful

in various applications such as image processing and also

natural language processing.

So in this work in particular, we

want to look at, well, what can deep learning do

but in the context of data management.

So our vision in this work is to say,

can we rethink about query optimization

but in the context of deep learning?

So say we have as input some data set and some query that we

want to run, can we use a model to describe

different properties of the query?

For example, can we have the model give us an optimal query

plan, describe the resource consumption of the query,

or estimate the query's cardinality?

Now, as a first step in this project,

we're just focusing on estimating

the cardinality of the query.

And one thing that we want to consider

is, well, how do we encode our inputs?

How can we encode the data and the query?

So for these models, essentially, the input

is this-- you can think of it as a long vector where

we can encode information about our inputs.

In this case we have our data in our query.

One approach to encode the data would

be to provide the model with really basic statistics

about the data.

For example, we can provide the model

with simple one dimensional histograms

across some of the columns that exist in our data.

And now for the query, one way we

can encode this would be to provide

the model with all possible--

some possible join predicates or possible selections that

could exist on this data set.

Now the problem with this approach

is that these vectors need to be really

long to encode all possible queries that we

could write on this data set.

So essentially what we need is a model

that will give us this ability to encode all possible queries

that we can run on this data.

So the intuition behind our approach

is to, instead, view this query plan

as a sequence of operations.

So for example, we can first filter the Customers relation

followed by a filter with the Region,

and then combine this to do a join, and so on.

And we can iterate through all the operations

until we've completed the query plan.

So essentially what we want to do

is to learn what we call these subquery representations where

we still want to use deep learning,

but we want to use it to describe specific properties

of these intermediate results.

So just as a really quick example,

say we have this query where we're joining tables a and b,

and we have two operations.

We have a join and a selection.

So the idea is that we can start with an encoding that

shows basically a representation over our data set

and one of the operations, in this case a selection.

Now given these inputs, we can use a deep learning model

to come up with a representation of the following intermediate

results.

So in this case, we applied a filter

and now we have some encoding of this subquery.

Now from this representation we can

extract information such as the distribution of the data

at that subquery or even the cardinality.

Now given this representation we can chain another operation,

in this case a join, and still use our deep learning function

to come up with a subsequent representation

of that subquery.

So essentially what we're learning is this risk recursive

function that takes us input some encoding

or some subquery representation along with a query operation

to come up with the representation

of the following result. But just to give you

a sense of what these models can do, we wanted to experiment

and see how well these models are

able to estimate the cardinalities compared

to a commercial database engine.

And for these experiments we use the IMDB data set,

which has some interesting correlations

across these columns.

So here, for example, in this graph,

we're showing the percentage error

in terms of cardinality on the y-axis.

And on the x-axis we're showing the database system

compared to the neural network.

For these queries, we only varied the selection predicate

for one of the columns on one of the relations.

So you can see that the database system generally

does pretty well, but our neural network is

able to get slightly more accurate cardinality estimation

results.

Now if we make this problem a little bit harder where we're

now selecting across several more of these columns,

in this case, we actually need to have the model learn

about the distribution and correlations

across these columns, it's harder for the network

and also for the database system.

But in the cases now where we're starting to chain more

of these representations together, for example,

in this case we are applying selections

on two different tables and also tying that with a join,

it becomes a lot harder for the system

and also for the network itself.

So what we're experimenting now is

thinking about what it actually means to do--

actually, to improve these results,

we want to see what it means to chain

and tie these representations together along

with these complex query operations.

So just to conclude, we've talked

about why cardinality estimation is a difficult problem

and how we should start looking at other new approaches.

And so in this work we propose this model

of learning these subquery representations

through a recursive function by using one of the deep learning

networks.

DAN SUCIU: Thanks, Jennifer.

So the next talk is by Babak Salimi.

He is in Vancouver.

He's waiting for his Visa.

He sent us a video of his talk.

He also has a cool demo.

It's about how to understand when your queries are not

returning what you thinking you are there returning,

and how to fix them.

BABAK SALIMI: Hi, everyone.

I'm Babak.

Today I'll be talking about bias in the context of decision

support SQL queries.

Decision support SQL queries are used today by knowledge workers

to make better and faster decisions.

However, inexperienced workers can easily

write SQL queries that are biased for decision making,

unless they're trained as statisticians, which

is typically not the case.

As we will see throughout this talk,

biased SQL queries can lead to wrong business decisions.

Let's go through a multi-weighting example.

Suppose a company has many business travels from four

particular airports and would like

to choose between business travel programs offered

by either American or United Airline,

depending on which one has a better performance in terms

of on time flight.

To find out, the knowledge worker

does some simple data analysis and writes

a SQL query that computes the average delay once for American

and once for United Airline.

The bar diagram on the left shows the result of this query.

You may acknowledge that this scenario is repeated over

and over in industry today.

Users write declarative queries, explore the results,

and make decisions based on the insights obtained

from these results.

In our example, the knowledge worker looks at the bar

and observed that American looks significantly better,

so she decides to contract with American.

However, any trained statisticians

would recognize that to make decisions regarding

the performance of the two airlines,

one needs to look into potential confounding

factors such as how frequently these airlines operate

at different airports.

Simply because one airline may have several flights

from an airport which has a high rate of weather-related flight

delays.

This clearly makes the comparison

between the two airlines with that SQL query

biased and unfair.

This is actually the case in our example.

Once we break down the delay by each airport,

it turns out that in each of the airports

it is actually United Airlines that has a better performance.

This trend reversal is known as Simpson's paradox in statistics

and happens because of overlooking the confounding

variables.

In this case, what the company really wants

is the causal effect of choosing American or United Airlines

on the delay of its travelers.

But the simple SQL query fails to answer that question

because it is biased.

Similar to the other projects that we have seen in this talk

so far, we leverage the declarative nature of SQL

and perform this complex causal analysis automatically

on top of the structure of the query.

Specifically, we propose HypDB, the first database

system which detects, explains, and removes bias

from SQL queries.

Instead of discussing the technical contribution,

I will demonstrate HypDB with two examples.

OK, so HypDB accepts as input a SQL group by query

and assumes that the query is being

used for making decisions.

Here with the simple group by query,

the goal is to find out the effect

of choosing one of the two carriers or airlines

on flight delays.

As we have discussed, the answers to this query,

they suggested American Airline performed significantly better.

However, HypDB takes advantage of the declarative nature

of SQL and performs some deep analysis on the SQL query

and automatically detects confounding factors

that are detected today only by trained statisticians.

These confounding factors make the query

biased for decision making.

Specifically, first HypDB warns the user

that they're grouping by some of the confounding factors, which

radically changes the insight obtained by the original SQL

query.

As you see in this case, HypDB automatically

identified origin airport as a confounding factor,

such that further grouping by origin

totally reverses the trend obtained by the original query.

The next step is to remove the bias from query.

HypDB does this by rewriting the query

to control for the confounding variables.

The query shown right here in this box

is the rewritten query associated to the group

by query we started with.

Please refer to the paper for more details

about this query rewriting, but for the moment note

that under certain assumptions the insight obtained

from the answer to this rewritten query

are unbiased and support decision making.

The bar diagram here shows the answers

to the rewritten query right here on the left.

The answers reveal that it is actually

United Airline which performs better than American Airline.

So there is no Simpson's Paradox anymore.

Finally, HypDB generates two kinds of explanations

for the biased query.

We argue that these explanations are crucial for decision making

and reveal illuminating insight about the domain and the data

collection process.

HypDB generates coarse grain explanations

by ranking the detected confounding factors

in terms of their responsibility for making the query biased.

In this case, as you observe here,

origin airport has the highest responsibility.

And the fine grain explanation generated by HypDB TB

is essentially a ranking system which

derails down into a particular confounding factor

and explain how the interaction of the ground

levels of the attributes involved in a SQL query

contribute to the bias.

In this case, the top floor of fine grain explanations

for origin airport indicate that United Airline frequently

flies from airports such as Rochester airport

that has a lot of weather-related flight delay.

Whereas American Airline frequently

flies from airports such as McAllen Miller,

that has fewer delay.

And this simply explains why the initial SQL

query that we started with was biased

toward American Airlines.

OK, for the second example I'm going

to use the famous adult income data set from the UCI Machine

Learning Repository.

Actually using this data set, several prior works

on discrimination discovery and fairness

have reported gender discrimination

in favor of males.

To replicate these experiments, let's use

HypDB to compute the effect of gender

on income with this sample group by query, which essentially

computes the average of males and females with high income.

As you see right here the results of this SQL query,

indeed, suggest a strong disparity with respect

to females' income.

However, not surprisingly, HypDB detects

that this query is biased and identifies

attributes such as marital status

as confounding variables.

To remove the bias from the initial SQL query,

HypDB rewrites the query to control for the confounding

variables.

Here you see the routine query associated with the net one

that you started with.

In this bar diagram you see the answers to this routine query.

As you observe the routine query answers,

they suggest that the disparity between males and females

is not nearly as drastic as suggested by the naive SQL

query.

To see what's going on here, let's check out

the explanations generated by HypDB.

the coarse grained explanation, they

showed that marital status accounted for most of the bias,

followed by the occupation.

Let's take a deeper look into marital status

and see why it makes the query biased.

The top four fine grained explanations

reveal a surprising fact.

They say there are more married males

in adult data than married females,

and marriage has a strong positive association

with high income.

To understand why this is actually

the case in adult data, we check the provenance of this data

set, and it turns out that the adult income attribute

in adult data--

it reports the adjusted gross income

as indicated in the individual's tax forms,

which depends on filing status.

It could be actually household income.

Therefore adult data is essentially inconsistent.

And should not be used for investigating

gender discrimination.

With this I will conclude.

We have shown that SQL queries can be biased and misleading.

We've proposed HypDB as a system to detect, explain, or resolve

bias from SQL queries.

We've shown that HypDB is useful for making

causal analysis accessible for nonstatisticians,

avoiding false discoveries, and detecting

errors in data collection.

Thank you for listening, and I hope

to see you soon in Seattle.

DAN SUCIU: And I thank Babak.

He's online watching and he can take questions at the end.

The last talk is by Shumo Chu, who's

going to tell us how to check if two SQL queries are equivalent,

a fundamental task in grid optimization.

SHUMO CHU: Hello everyone.

My name is Shumo, and today I'm going

to talk about automated reasoning of database queries.

And this is joint work with a few folks from our Database

Group and PL Group.

So first I will show you a figure.

So this is a survey by Stack Overflow

that shows language popularity.

Actually it tells you JavaScript is the best programming

language in the world, while SQL is the second.

And so is PHP is far behind.

So talking about the actual stuff,

I mean SQL is really a popular language that's

supported by all the relational database systems

and now supported by almost all the big data systems.

While SQL is great, because this is actually

a restricted abstraction, enabling

powerful optimizations.

The database community spent almost like more than 30 years

to develop many powerful optimizations

based on the semantic equivalent of SQL rewrites.

The problem here is that we are really

lacking tools that can actually reason about SQL equivalence.

So here I will use a self-driving car analogy.

It's really embarrassing that we have

self-driving car right now, we don't have a automated solver

for SQL.

And what is a automated solver for SQL?

It means that for any possible input can

we actually check whether those two SQL queries are

semantically equivalent.

And this has many applications.

For example, it can be used to verify

the correctness of a query rewrite

to make your query optimizer more reliable.

It can also be used to build a semantic caching

layer for big data systems.

And what's more, it can be used for automated grading of SQL

assignment to SCALES' MOOC.

Well apparently this is a very challenging problem.

First, from a theoretical result 50 years ago deciding

two relational queries are equivalent are actually

undecidable.

In addition, if you think about SQL,

you might think just select from where,

but actually SQL has rich language features.

So you have aggregation and grouping.

You have index and integrity constraints,

and you can write like correlated subqueries

like exist.

So how can we solve this problem?

Well, we have two lights we can't shed on.

So from a gogus result, while a problem is undecidable,

it doesn't mean there is no proof.

And in fact, you can use an interactive theorem

prover that--

you can use it to validate mechanized proofs.

And second observation we can find

is that the model of inequivalent SQL queries

are usually not very large.

And in fact this is known to the formal massive community.

And this partially explains why the constraint solver nowadays

are extremely fast.

So we can use constraint solver to model checking SQL.

So based on these two observations

we built Cosette, the first automated solver for SQL,

by combining interactive theorem proving and constraint solving.

While it takes me almost three years to build this tool,

it can be concluded in these simple slides.

So for a given pair of SQL queries, we did two things.

First, we compiled the SQL queries

to propositions that can be checked

by the interactive theorem prover.

And then we developed a proof search, or automated proof

generation procedure inside of the theorem prover

to try to find a mechanized proof for these equivalences.

And secondly, we compiled these SQL queries

to constraint solver and built a model checker

to check in order to find counterexamples

to inventing these SQL queries.

Without actually showing the code,

I will actually show you a demo.

So this is the tool that we built for checking equivalents

of SQL queries.

So you can see you can specify a schema for the table.

And then you basically right two SQL queries.

While you can see the first SQL query

joins the employee and payroll using the employee ID.

Well, the second query here we do kind of similar stuff,

but there is another join.

And there is two employee table.

So I mean I know it's really tricky to say whether--

it is really tricky to reason about whether it is equivalent.

So that's why we need our tool.

The solving takes some time, but this is really

the formal method researcher's fault.

So you can see that this two SQL query indeed equivalent.

It's hard to explain exactly why,

but intuitively it's that there is

a join in the second query that's kind of redundant.

So let's show you another example.

So in this example, similarly, you

can see there are two SQL queries.

It's kind of like a super lasting because there is join,

there is group by, and there is subqueries.

So whether these two SQL queries are equivalent.

Well it actually not, and returns a counter example.

If you actually ran this two SQL query

using this counter example, you will find exactly why.

But to give you a short story, this is actually--

people actually installed this two SQL query equivalent

and they published a paper in the top database journal

in 1982.

And after three years people find

this is actually incorrect.

And of course they published another paper to say,

this is incorrect.

But the high level idea is that the second career actually

ignores the extremely colored case when the group is empty.

So it takes three years for the database's researcher

to find this, but it takes our tool less than 10 seconds.

To conclude, I show you that SQL is the best programming

language next to JavaScript, and we

built Cosette, the first practical SQL

server based on the integration of interactive serum proving

and constraint solving techniques.

And I would say automated reasoning brought

by the integration of formal methods

and domain specific semantics, will

lead two more reliable and more optimized

future database systems.

DAN SUCIU: OK.

So let's thank Shumo.

And I would like to ask all the five

speakers to come to the front, and we have

like two minutes for questions.

So any questions?

Yes, please.

AUDIENCE: I had a question for Cong.

Really cool talk.

I'm curious, you mentioned that only rights end up

having to go to the database for a web application.

I know some web applications, multiple people

may be writing it at once.

Is there some easy mechanism for knowing when somebody else has

done something?

Or is that sort of out of scope here?

CONG YAN: Yeah, so basically the write

query both updates the in-memory storage

and the back-end database.

So the back-end database has a lot

of mechanics to handle concurrent write basically

using transactions.

And similarly like in our model, even

though I haven't implemented it yet,

we can use similar algorithms to use in-memory transaction

processing to make sure the update on the all the data

structures are automatic.

So when different people are doing different writes

on different data structures, I make

sure that they're not interweaving all of them.

DAN SUCIU: Yes, one more?

AUDIENCE: I have a question for Brandon.

So you have-- there's this--

you were talking about all these new methods

for capturing video in different ways from 360 or these cubes.

And I was wondering, are the like standards

for how to store--

the exact file formats, are they standardized at this point?

Are they changing?

Is your tool like designed to change with file formats?

Or is it pretty specific to one type?

BRANDON HAYNES: Right.

So that's a really good question.

So for anybody that didn't hear.

So the comment was that there's many, many formats

and projections, and all sorts of different VR and AR things

out there.

There's a giant zoo.

And are there standards?

Or how quickly is that changing?

And how do we handle that?

So when you run a SQL query you don't think about

whether your relation is stored as a B-tree, or as a heap,

or a flat file.

You let the database engine figure out how to manage it.

And so we do the same thing in LightDB.

So there is, as you suggested, a very large set

of formats and video codecs and for end

and 360 videos versus live fields,

and all sorts of ways to store these data.

And they're constantly evolving.

And some are more efficient for some types

of queries than others.

And we do our best a, to abstract

that so the user doesn't ever have to think about it.

So you don't worry about whether you're video is H264 encoded.

You just load it from the catalog.

And we'll actually make modifications as necessary

to minimize the storage footprint,

if you store it later on.

And if you run subsequent queries

it may run faster with those conversions.

So we try to pay attention to that.

I mean, of course, we don't at this time

cover every possible video codec, for example.

But the idea is that we could extend

that to support whatever is out there, if there's

a user demand for it.

It's a really good question.

DAN SUCIU: OK, so I think this is all the time we have.

So let's thank the speakers again.

And thanks, everyone, for coming.

For more infomation >> UW Allen School Collquium: Research in Data Management - Duration: 55:40.

-------------------------------------------

UW Allen School Colloquium: UW Reality Lab - Duration: 47:01.

BRIAN CURLESS: Welcome, everybody,

so-- to our next installment of the CSe fall colloquium

where we're bringing to you research talks from groups

across the department.

So today the UW Reality Lab gets to own the colloquium.

So what is the reality lab?

We just launched this year, in January,

with a focus on doing research and teaching

in augmented and virtual reality.

We have a web page.

We exist.

So, basically, the goals of the lab

are to create the technologies to power the future of AR

and VR to do the things that are beyond what the companies are

doing for maybe the shorter term bottom lines to do the longer

range research for this, and to educate

the future researchers, who are going to be faculty, basically,

doing research and teaching AR and VR,

as well as developers who are going to go out

into industry to a lot of these companies

and creators for making, sort of, artistic products that

will fit into AR and VR.

So we're currently funded by Facebook, Google, and Huawei.

So this is an industry-funded effort.

It makes a lot of sense to do this in Seattle.

We're kind of-- I hate to use this word--

the epicenter of AR and VR.

With Facebook has Oculus, but Oculus research,

now Facebook Reality Labs--

coincidentally, we got there first with the name,

by the way--

is in Redmond.

Google has a major presence in AR and VR in Seattle.

Hewie has a research group in Bellevue

with-- doing a lot of work in the kind of AR,

VR space for mobile, in particular.

But, also, Microsoft HoloLens was done here.

Valve Vive was done here in the Seattle area.

Magic Leap has a big presence here.

Anyway, this is really a great place

to be doing this kind of research.

So the leadership of the lab is myself, Professor Ira

Kemmelmacher, and Professor Steve Seitz.

So we're the co-directors of this lab.

We, also, have our core members, Barbara Mones, is--

she's been leading the effort for animation production

for years, and now she is the director

of the reality studio-- bringing that kind of storytelling

into VR.

And it's a great challenge and she's

producing already exciting films in this space.

And our director of research in education

is our very own Aditya Sankar, who may or may not be here.

Anyway, so Aditya is leading a-- particularly,

a lot of the efforts for undergrad education

and research.

And our program manager is David Kessler,

who is also the general manager for-- so he kind of keeps

everything working for the lab.

And he's also the general manager for the Ig--

Ig Nobels, which is kind of-- anyway,

you should check that out if you haven't already, it's

kind of, you know, sort of Nobels for interesting research

that is kind of off the beaten track, often very humorous.

We, also, have an advisory board from all of the companies--

really, luminaries in the fields of computer graphics, vision,

virtual reality.

We're doing a lot of research.

We're rolling out courses at Capstone that Aditya is

teaching right now is ongoing.

Barbara's Reality Studio, creating films

in virtual reality.

We have a lecture series.

You're probably getting a lot of e-mails about that.

But we can bring in really world-class thinkers,

researchers, visionaries, to come teach us

all about what's going on in AR and VR

and what the future looks like.

And, finally, an incubator for undergrads

to kind of go beyond a class and develop

ideas that could be-- it can be prototyped and put out

in the wild, released through Steam and so on.

So, finally, getting to the research,

we actually are funding research with a bunch of groups, not

just our own group.

So we had a call for proposals, and 20 proposals

came into the lab for research.

And we've managed to fund 15 of them.

14 different faculty are currently funded students.

So their students are currently funded to do research

for the reality lab.

And the board, basically, helped make

us-- helped us to make those decisions.

So, today, you're going to see six different projects

from a variety of different groups.

Some of them were kind of foundational projects

that, basically, raised our profile

and helped establish the reality lab

and got the sponsors excited about what we were doing.

And some of them are stuff that was just funded in March,

basically, is when we launched the funding process.

And you'll get to hear kind of the latest,

greatest from a bunch of CSC postdocs and students,

and one EE student, as well.

That EE student is up right now.

ELYAS BAYATI: Hello, everyone.

My name is Elyas Bayati.

I'm a PhD student from EC department

and Arco [INAUDIBLE] Lab.

Today, we proposed to design the next generation of visors,

which can open a new window to optical advancements

in this field.

Near-eye visors are one of the most vital component

of head mounted displays, as we can see.

And people were designing these near-eye visors use free

from optics and waveguide-based optics.

However, as you can see, these structures

are bulky, complex, heavy, and even their field of view

is limited when we place them close to eye.

And this came from this fact that people

relies on geometric reflections and refraction problem.

So to bend the light and-- too close to eyes.

So what you want to do is we want to make this AR

visors smaller, lighter, and if you

want to make them close to eye, you still

want to have large field of view, like in ordinary glasses.

So we develop a new type of visors, which combination

of reflective visors and waveguide-based [INAUDIBLE],,

and visors which use the nano-scatterers.

So these nano-scatterers are cylinders,

as you can see, between the dimension of 100 times thinner

than human hair.

So we can change their diameter, thickness, and even

their periodicity in a square lattice.

Their material is silicone, which is like glass,

we can see through them, and they are transparent.

And we can change their diameter.

So we can get the different phase and amplitude response.

So using this technique, we can convert any optical freeform

shape, or any phase distribution,

to the group of nano-scatterers, as you can see,

in a-- like a flat surface.

We call these flat surfaces metasurfaces.

As you can see, they are really small

and we can fabricate them.

So, usually, we fabricate them in the smaller size.

But, similarly, we could fabricate them

in large area and larger scale.

As you can see, their size is compatible with our hands.

So we can fabricate them for AR glasses

or even for ordinary glasses.

So last year we develop first metasurface-based visors.

Using metasurface, we could never get light from display

to eye with a large bending angle.

So these features help us to get larger field of view.

So the field of view that they could get

is comparable with HoloLens One or Magic Leap One,

is like current famous AR visors.

So we could get around 77 degrees by simulations.

We designed these wave visors by numerical simulations

and reality simulations.

This is the phase value, or phase max,

that we could get it--

this visor.

And that-- using the same technology,

we can convert them to the group of nano-scatterers.

So the problem that we get with this visors

is the see-through quality.

Because we use just one single metasurface,

the see-through quality that we get is not that good.

So, currently, people use another prism

to solve the see-through quality problem.

We want to able to view the environment

as we would with the regular protective goggles.

So this requires imaging raised comes from infinity to our eye

without any distortion and [INAUDIBLE] or power loss.

We kind of use a similar idea to solve the see-through quality

problem.

We use another metasurface to correct any distortion caused

by first metasurface.

So when I say metasurface, I mean, group of nano-scatterers

that I mentioned before.

So we proposed a stack of metasurface

to solve this problem.

And for designing the phase of the second metasurface,

we used the same idea, numerical simulation,

and ray optic simulation and optimization

to get the better see-through quality.

As you can see here, this is the phase of second metasurface.

For comparison, we do the simulations

with wavefront error.

So we used wavefront error because we

can see how much our imaging ray is

different from ideal case, which is-- here

is a parallel case, as you can see.

The RMS data we could get for metasurface was--

is around three times of the current visors, which

is not that bad, but we are currently working on it.

And as it is-- an ideal imaging Res system is around quarter

of [INAUDIBLE], which is--

I mean, we are trying to get it better.

But, currently, we are working to optimizing this phase.

However, what-- all we have done is a single color.

So one of the longest standing problem,

which refractive offtake of metasurface has,

is chromatic aberration problem.

So-- which is famous with rainbow effect.

It means that optics behave differently

for different colors.

So here I bring an example for you.

We use metasurface lens to do the imaging for this image.

As you can see, we got different focus for different colors.

We have the same problem with AR visors using metasurface.

As you can see, the design AR visors for green light and--

the visors bend light differently for red and blue.

So, fortunately, there are some solutions

for that in metasurface field.

One of them is dispersion engineering.

It means that we can change the diameter

of these nano-scatterers in a way

to get the kind of similar phase profile,

or desired phase profile, for different colors--

red, green, and blue.

So we want to use this idea as a future work for our AR visors.

So if you use this sim--

dispersion engineering, we can bend light

in a similar fashion for three different wavelengths.

However, this is not our ideal case.

We want to do the full color operation and broadband.

So-- because for this-- if you use the dispersion engineering,

our display should work just for three discrete wavelength.

So, recently, our group demonstrated

the first full color imaging using

metasurface and combination of that we accomplish on imaging.

The idea was this--

instead of using the lens to focus in a single plane views,

type of metasurface lens which focus along optical axis.

And we do that for three different colors

and find the same overlap for different colors,

and do the computational imaging and reconstruction convolution.

So as you can see, the result is kind of better,

and we kind of solved the aberration problem

for metasurface.

We're going to use the same idea for AR visors in the future.

And this is our future work.

That concludes my talk, and I would be happy

if I answer any question.

The time was limited so--

thanks.

EDWARD ZHANG: Right.

Hi, everyone.

I'm Edward, and I'm going to talk to you about the work

that we presented at SIGGRAPH Asia almost two years ago.

And it's called emptying, refurnishing and relighting

indoor spaces.

The idea with this work--

I'm just going to set up like a theoretical example.

Let's say you wanted to rent an apartment.

So, usually, what you do is you would go visit it,

and it would probably be filled with its current tenant's

furniture.

But the point of you visiting isn't

to see what the current tenant's,

you know, bed looks like.

It's-- you know, you want to be there to imagine what it would

be like for you to live there.

So, you know, you have your desk in the corner,

your Mona Lisa on the wall, your shelf in the corner.

You might want to say, OK, if I change the decor, or would--

how would it look?

Or, you know, you're visiting in the morning but, you know,

it's got a west-facing window.

How would it look in the evening?

So, more concretely, what we want to do

is, say, take an image, or several images, of a scene.

So this image over here.

And then be able to make modifications

to it for the purpose of visualizing what would it

be like for you to physically, actually move this bed out

and put in your couch over there.

So how do you do this?

So your first thought might be, OK, I've

heard of this tool called Photoshop.

It's got this great content to where it fills stuff.

I can download some images of furniture from Google.

So that's what this poor misguided real estate

agent did.

This is pulled from some-- like BuzzFeed list

of 20 worst real estate photos.

Anyway, so you can tell there--

so what happened here is this is like an empty house

and they decided, oh, you know, we

want to make it look more attractive

so we're going to Photoshop stuff in.

And it doesn't look realistic at all.

So, in contrast, we want really visual realism

in the scene edits.

So like the reason is-- you know, it's pretty obvious.

The whole point of this kind of application

is for aesthetic evaluation of, you know,

what it would be like for you to live there.

And so if you want to think about what is it

that makes this Photoshop unrealistic

compared with good scene editing.

And to answer that question, we have

to think about for realism, it's really about,

as I said before, it--

if you actually physically made the changes to the room,

what would it look like?

And so that kind of points to, you know, how do we physically

see things?

So brief overview, you know, you're

going to have a light emitter, because we're

talking about light rays.

So the light emitter has some color, some intensity.

It's going to shoot out some light rays.

All those light rays are going to bounce off of an object.

They might get scattered by other objects.

And, finally, those light rays will hit your eye.

So with that very brief description,

we can kind of identify four elements

that are important for the physical reason why

you perceive things.

The first one is the light emitter properties itself.

So where does the light come from?

What directions is it going?

How bright is it?

What colors?

There is the 3D geometry.

So where do these light rays bounce off of surfaces?

Where are the surfaces?

One really important one is the materials.

What are the surface scattering problem--

properties of the surfaces that they hit?

So, you know, when a light-- when a photon hits an object,

it will get absorbed, and then probably re-emitted,

maybe at a different wavelength, in some particular direction.

And then, finally, there's the properties of the viewer.

Where is the viewer?

What light rays do they perceive to make up

the images that we're going to be producing?

How sensitive are they to particular wavelengths,

and so on.

So with these four elements in mind,

I can talk a little bit about how this works.

The first thing that happens is that we

take an RGBD scan of the room.

So this is a Google Tango sensor.

And then from that we perform a bit

of processing to get this whole 3D model of the room.

So this has the phi geometry of the entire room and, also,

the appearance of the room in high dynamic range.

And this is really important, because in a single camera

image, if you took a picture of outside the window,

either the window would be blown out

or, you know, the area under your desk

would be completely dark and you couldn't see anything.

So you really need this high dynamic range.

From this model, then we can do some simple geometric

processing to get semantic scene geometry.

So what does the empty room look like?

Where are the walls or ceiling?

Where is the window?

And where are the doors?

And then, finally, we also care about the materials,

as I mentioned before, and the light emitters.

So I won't go into too much detail about how we did this.

But I will focus on the fact that the realistic lighting--

light modeling is maybe the most important part of this work.

In particular, it differentiates it

from a lot of the previous image manipulation types

of works in the literature.

So traditional graphics lighting models

are these simple idealized models

like, you know, perfect, you know,

rectangular area emitters, or spotlights or point lights.

But if you think about real light sources,

they're a lot more varied and complicated, particularly,

in their angular distribution, as you

can see in this large set of examples over here.

In our work, we set out to be able to model these

in a realistic way, such that they could represent accurately

the appearance of light sources that we'd

see in real world scenes.

So, for example, we can model point lights

that have varying directional distribution.

So in this example-- so on the left

is one of the original input images.

On the right is our reconstructed empty room

version of the scene.

And you can see that this standing

light shoots most of its light upwards

but a little bit downwards.

And we've managed to capture that, as well, in this image.

Similarly, with line lights like the fluorescent tube lights

like, you know, these lights up here, they're lines,

they're not just points.

But, similarly, they also have directional distribution.

So, again, a lot of lights throw most of their light upwards.

In fact, if you look at the lights in this room,

they do that.

And the room is actually lit by the reflection of that light

off of the ceiling.

And then, finally, you also have things

like windows and aerial lights.

So in order to do all this modeling,

we're not actually taking a camera,

sticking it on top of the light and measuring how much light

there is coming off there.

We're actually looking at all of the indirect reflections that

are going around in the scene.

And it's, basically, a large non-linear optimization problem

to figure out the distributions and the properties

of these light sources.

So I'm just going to show you a few more results now.

So these are the original input images

from four different scenes.

And now from the same camera angle, empty room version.

I'll just flip back and forth one more time.

So it's, obviously, not exactly the same,

but it captures most of the feel of being in that room.

And, you know, you can stick in whatever furniture or spheres

or graphics models.

You can do other things than just modifying

the geometry, of course.

You can change the materials, you

can change where the light sources are

and their properties, because we have

the full 3D model of the room.

I'll just show you one more example

of one of these fade in, fade out sequences.

And I think these really highlight

the strength of our method in the aesthetic sense,

despite all of these changes they make in the room.

It really feels like you're sitting in the same room.

And then, finally, just to say something about the camera

angle thing, since we were constructing a full 3D model,

we're not constrained to the original input images.

So you could do something like visualize this in AR.

So walk around this empty room in augmented reality sense,

or a refurnished version of it.

BRIAN CURLESS: Thanks.

JEONG JOON PARK: Hi.

I'm Jeong Joon Park, and I'm excited to present

our project, Surface Light Field Fusion,

was recently published at 3Db 2018,

and was worked with Richard Newcombe and Steve Seitz.

We present the surface light field fusion

that can produce photorealistic appearance reconstruction

from a hand-held commodity RGB-D sensor.

This video does synthetic rendering accurately models

the high quality textures, spectral highlights,

and even interreflections and shadows that induce

by global illumination effects.

Our system takes input as RGB-D and IR video

streams, and outputs an appearance model of the scene.

Our goal is realistic reconstruction

of target scenes.

And besides from sharp textures and nice spectral highlights,

we try to achieve global illumination effects.

Which is such as the interreflections

where the white paper appears reddish because of the object,

and the soft and sharp shadows highlighted with green circles.

Although, these global effects are

critical to the perceptual realism, modeling

them has been traditionally been very difficult.

Our key idea is that unlike traditional BRDF methods that

enable scenery lighting, we can trade off

the real lighting for the ability

to capture diffused global elimination effects.

And we achieve this by posing the appearance reconstruction

as surface light field as emission problem.

Prior works on appearance modeling

can be split into two categories.

One is parametric BRDF estimation.

And the other one is nonparametric image-based

methods.

Each of them with pros and cons.

The model-based methods involves modeling

the complex global light transport.

And because inverting the global lighting

is extremely difficult, most previous works

simply ignore these effects.

On the other side of the spectrum,

image-based representation naturally

incorporates the global effects, but it requires dense view

sampling on the hemisphere, which is

unsuitable for casual scanning.

In contrast, surface-like field fusion

is located in the middle of the spectrum,

combining the benefits of both model-based and image-based

methods.

Specifically, we achieve this by considering the factorization

of the surface light field into view independent

and wavelength independent components.

And we modeled the components with image-base and

model-based, respectively.

Please refer to our paper published at 3Db

for further details.

And the resulting system reconstructs the surface light

field without the dense view sampling, while capturing

the global effects.

We note that, however, because we do not recover the full

BRDF, our captured models cannot be rendered under new lighting.

We now go over the implementation of our theory.

Our system gets RGB-D input stream

and reconstructed geometry using KinectFusion,

and the high resolution texture is then fused in real time.

Next, our system segments the scene into regions

with similar materials.

And for each material segments, surface particularities

are measured in IR channel.

Finally, with captured environment lighting,

a realistic novel view of the scene

can be synthetically generated.

To estimate the material properties of the scene,

our systems start with segmenting the scene

into regions with shared reflectance properties using

deep convolutional neural networks.

The material segmentation is necessary

because it is invisible to recover

BRDF for every single surface points.

Given material segmentation, and estimated camera pose,

we use the IR projector and camera

attached to the depth camera.

And given the light source model and IR intensity images,

we optimize for BRDF parameters.

The estimation works robustly in practice

for a wide variety of surfaces.

For example, for brush metal surface in the image,

the right image shows the IR image.

And we can see the specular highlight from the projector.

The specular highlights in the IR image is vertically

elongated due to NS choppy.

And we can fit a reflectance model from the IR signal,

and our fitting is quite close to the ground truth.

And same for the wooden surface in the-- in the bottom--

bottom row where we can accurately model the specular

highlights.

And rendering the recovered material clearly

shows the vertically elongated specular

highlights that characterizes the appearance of brush metals.

Our final scene is one with multiple objects.

Our surface light field renderings of the scene

is virtually indistinguishable from the ground truth.

Notice how our system captures the rich global light transport

effects, including the soft shadows and occlusions,

which cannot be muddled with other pure BRDF-based methods.

So we introduced the first capsule surface

like field fusion reconstruction framework.

Our system effectively captures global illumination effects

and specular highlights.

And we show that by factoring the surface light

field into view independent and wavelength independent

components, we arrive at a representation that

can be robustly estimated within RGBD camera,

and achieves high quality results

across wide variety of scenes.

Thank you very much, and that's it.

JUNHA ROH: Hello, I'm Junha Roh.

And I'm working with Andrzej Pronobis, and Ali Farhadi,

and Dieter Fox.

The title of the project is Two-player Game

for Real-time Natural Human-Robot Communication.

So what we are trying to do is to train a robot, which follows

a human speech direction like, go straight, turn

left, and the car should follow that direction.

So in order to achieve this, we will

like to have set up two-player game of driving, which

requires collaborative manipulation and navigation

through two players.

And we thought the virtual reality

is a good environment for natural interaction

to extract some information.

So this is a demo video of actual game.

So we asked two people to play a driving game.

While the one-- one peop-- one person will do the operator,

who can--

who can see the view and to drive the car in the VR,

while the other person do the commander.

So he can see the whole view of the operator's view

and, also, the entire map and the specific target.

So he can send some like speech--

some command to operator, and the operator

will react to that speech signal.

So from this game we would like to collect some control

signal and the speech signal.

And before moving on the real game,

we will build a simple environment with a maze.

So this is 2D maze simulation, and the input

was simulated 1D laser sensor.

And the output is actual the velocity, linear

and angular velocity.

And we refined the dimension, like command, complex delta

command to three--

three types.

Two types turn left and turn right.

So we analyzed the structure of the maze

and randomly start the engine.

And we tested on the maze environment,

and also on the real scan data of CSE building.

So we find the-- our agent navigated it well.

So we build a new city environment for a longer driv--

driving sequences.

And, also, we had a gaming will and the power

for a realistic experience.

So in the middle of the screen, you

can see the actual instance of driving game.

I don't think you can listen to the actual voice,

but currently people-- one of people

saying like some commands.

And I manually annotated these into straight, left, right

cases for labeling.

And this is current result train on this environment.

So it's a little bit noisy, but it follows some directions

as we change some labels.

So you can see some labels in the upper left side

of the video.

So, currently, we are working on language input.

So instead of using the direct speech signal,

we transcribed the speech signal into sentences.

And we-- so we give a stream of words,

like, turn left, and some pauses, and we transfer

our network to sequence to sequence model

to react properly.

So thank you.

CHUNG-VI-WENG: Hey.

Hello, everyone.

I'm Chung-Vi-Weng, a PhD student in [INAUDIBLE] and reality lab.

I'm excited to here-- to be here to present my research

work called Photo Wake-Up 3D Character

Animation from a Single Photo.

The work was done by co-working with Professor Brian Curless

and Ira Kemelmacher.

So let's start.

OK.

So this is Stephen Curry.

People know that.

And-- yeah, actually, the image captured the moment

when Stephen Curry made an important three point shot.

And time was frozen when the image was captured.

And our goal is that we want to work out a photo

to bring this different Curry to life again

by animating his body.

So here's the result. So quite fun.

And, actually, our technique can be applied

to a large variety of photos.

So this is a [INAUDIBLE] graffiti

we download from our internet.

And this is our result.

OK.

So another case.

Will call people know that a cartoon character

in comics Dragon Bow, and this is an image.

And this is our result. OK.

So the last one is one of my favorites.

This is a Picasso painting.

And we can date that.

So check out this.

This is our result. So this is one of my favorite.

OK.

So because behind the technology,

we actually rebuilt a 3D human model of that character.

So if we have some augmented reality device,

like HoloLens, we can in some way

bring the character into real 3D world.

So I demonstrate in this video.

So here I virtually hand up the painting

in-- on the wall of the building and bring it to this--

the real city world.

So, actually, to rephrase the goal, in order

to work out a photo, our goal is given that photo,

we hope we can rebuild a 3D model of that human character.

And, also, we want to make it animatable because it

is imposing our scenario.

So the high level idea is that we leverage the multiple body

model, which is trained by lots of real bodies

scans and allow user to control the shape and the pose

with parameters, which is good in our scenario.

But, unfortunately, morphable body model is naked.

So to tackle the problem, we propose

a 2-step to solve the problem.

First, we fit the body model based on body

pose to get approximate body model.

Then we warp the mesh, the body mesh,

to match the body silhouette.

And, hopefully, we hope after warping we still

keep a reasonable mesh geometry and skinning weights, which

will be used for animation.

So here we go through our pipeline a very high level

concept.

Given a single image, this is our input

with the tech to the posts and segment the human mask.

Then this onto the pose, we regress

the perimeter of multiple body model

to get approximate body mesh.

Then we extract the skinny mat and the desk

mat from the body mesh.

And that's map represent a mesh geometry.

And skinny map represent any measure information,

which tell us how to move the vertex on the mesh

when the post was changed.

Then we wobble both maps to match the segmented human mask

to get a new maps.

Then we rebuild our mesh, the front part of our mesh,

by combining these two maps.

And, also, we repeat the same process

to rebuild a back part of our mesh,

by virtually rendering the back view over the body model.

Then, finally, we rebuild the texture

and we input the acuity part of the background image

and put the mesh on the background.

Then apply the motion sequence in our database,

like running, to finish the wake-up illusion.

So here we want to see more results.

So LeBron James, Messi, and unknown girls.

So this is a movie poster.

And Beatles.

And [INAUDIBLE].

And we can take you back to the time to-- back to the moon,

have a moonwalk.

The same, Wukong.

And Iron Man.

Oh, sorry.

Another paintings.

And we can also do something like this.

Last one we saw is about augmented reality result.

We can see some of the result. This is-- was a--

download a video from HoloLens.

This is Beatles' album, "Help."

And this is a Picasso painting.

Here we demonstrate we start a motion,

then look around a mesh, and later we will make a run again.

This is the CSU building.

I think people know that.

OK.

So that's all.

Thank you.

KONSTANTINOS REMATAS: Hello, everyone.

I'm Konstantinos Rematas, and I'm a postdoc

at [INAUDIBLE] UW Reality Lab.

And today I will talk about our project,

Watching Soccer in Augmented Reality.

And this project is together with Ira, Brian, and Steve.

So the goal of this idea behind this project

is to take the experience of watching

a soccer game live in a stadium and bring it

in our living room.

And we proposed to do it with augmented reality.

So, now, we want to take the whole experience of the game

and put it in our space.

And so the game is represented as a hologram where

we can watch it with an augmented reality device,

such as HoloLens or Magic Leap, or any of these.

And, traditionally, the way of capturing sports

is done by going in the stadium, and then

you can put cameras around, then you

have to synchronize them because you need some frame accuracy

for the reconstruction.

And you will fit all this data from all these cameras

to a cluster where you estimate the whole metric representation

of the game.

However, this is very expensive.

And it's also very difficult to have a setup like this,

or getting data from other companies that did

do something similar.

And then we thought that can we just do it from a single video?

How we can achieve a holographic representation

of a soccer game but--

by just looking at a single video?

And, in fact, this is what we did

is that we have enabled a YouTube video--

a single YouTube video, and this is the actual hologram

of the game that we see through a HoloLens.

Back to our approach, our goal is

to have a video compression to 3D,

so as I said before, our input is just a single YouTube video.

And on the right, you can see the reconstruction

that we have from that video.

We want to have a land-based approach since we

have a limited amount of input.

And so for a land-based approach, we need data.

And we found out that it's very difficult to acquire real data,

but it's very easy to have data from video games.

So we had to play, and if you pass through [INAUDIBLE]

and you see us playing, just is for science,

so it's not for fun.

And, yeah, essentially, the game gives us

texture and the depth buffer.

So it means that for every player,

we know how it looks like, but also

what is the depth of this particular player.

And we can use it for us 3D save representation.

Essentially, we have a press of images in play--

of players images and the depth.

And knowing the camera parameters,

we can convert the depth maps into many scenes.

So we have about 12,000 of those prints.

So the first thing that comes to your mind

is, of course, training a deep neural network.

And we did that.

We used a stacked hourglass module that takes us

and input the RGB mesh and the mask of their player,

and outputs its depth.

So the depth is represented as an offset with respect

to a player that is passing through the middle

of the player, which pretty much says that the darker--

the darker the pixels, the further away from the camera,

and the brighter, closer to-- the pixel

is closer to the camera.

Now, since we have a method for estimating a 3D shape from--

for the player, it's--

we have to go back to the YouTube videos

and have a big pipeline of actually finding out

which pixels correspond to every player.

So we started with an input frame.

And with me the camera parameters for that

would take advantage of the known dimensions of the soccer

field.

We prefer player detection and pose estimation,

and also, we track the players

Additionally, we have an instant segmentation

for separating the players from the background,

but also from each other.

And then we are able to use our network

for estimating the depth for every player

and for every frame.

So, now, we have a 3D mesh for--

for our video-- from our YouTube video.

And we take, also, the pictures that we come directly

from the video and we apply it as textures.

And here are some results.

So on the bottom right, you can see the input-- original input

video from YouTube.

And it's the actual construction from our system.

Like it was accepted at CVPR, otherwise, we

would have to read and it with different banners in the--

in the background.

So, yeah, this paper appeared on CVPR this year,

and we have also code and network models on GitHub.

So you can try it for your favorite team.

So, yeah.

Now, I would like to talk a little bit

about work in progress.

And what I have been--

what we have been working right now.

So here you can see our construction is pretty nice,

but something is missing.

So where is the ball?

And this is, actually, a difficult problem.

Because in the beginning, we're thinking that, OK, just--

just track the ball.

And, again, there.

So people are playing like they are state-of-the-art ball

detector, where we have something like this.

So the ball sometimes is occluded,

sometimes we miss the textures.

So it's not very accurate and we cannot really rely on detection

systems.

But even if we say that I will go

and I will manually annotate every frame, where is the ball?

Essentially, I have something like this,

which doesn't tell us much about where is the ball.

Because the-- unlike the players, for the players

we made the assumption they are always on the ground.

But for the ball, we cannot make the same assumption,

because you have [INAUDIBLE],, you have long [INAUDIBLE]..

So it is part of the action and is important to model, also,

when the ball is in the air.

So the main problem is that you have this ambiguity.

So, for example, these two trajectories, the red

and the blue trajectories, they are different in 3D.

But once you project it in the image,

they have the same 2D trajectory.

So there is an ambiguity over there.

However, we can use physics for estimating a more reliable

trajectory in 3D.

So pretty much what we want to do

is that we want to find the point that we know the 3D

location.

For example, when the ball is on the ground,

or when it is being kicked.

And then we can connect these two points

by just using physics.

So we can use projectile or linear motion

for connected points for short or longer shots.

And how we going to estimate this critical point?

So we have-- if we have some trajectory like this,

you may think that we can just use similar heuristics

for finding where the velocity change,

and try to fit parabola in between them.

And this is true when the data is very clear, but when it's--

you have millions of observations,

then this thing doesn't apply anymore.

So we experiment and we found that it

wasn't very reliable for having such a heuristics

for [INAUDIBLE].

However, we found out that we are

able to use image trajectories.

So pretty much we generate an image

of how the trajectory would look over time.

So you imagine of it as a long exposure of the ball

trajectory, and we can use color to incorporate time.

And then we formulated actual localization problem.

So we want to know where a force is being applied

in a particular trajectory.

And we're trying a network for this.

And the written data came from a game engine,

so we used the physics engine from unity,

and we had set up like a box where the ball is

being applied random forces.

So we're having all the necessary 2D and 3D information

and forces for training such a system.

And, actually, then we manage to get something like this.

So now we're able to have the ball and, of course,

there are some smaller problems that needs to be addressed.

For example, I can play again, and you

will see that for the second bounce,

the player is far away from the ball.

But this is something that you can

use the [INAUDIBLE] information for associating

the critical points with the correct place in space.

And another-- and, finally, another aspect of our original

system is that it was very slow.

So now we're working on having a real time

version of this system.

So we hope that we'll achieve real time performance

by just scaling to many--

many cores and many GPUs.

And this can be done fairly easily by just writing--

by using Scanner, which is a program from Stanford.

And it enables to generate the computational graph of them--

part of your program and run it in multiple machines and cores.

So, yeah, this from my side.

And, yeah, thank you.

BRIAN CURLESS: So I guess that kind of wraps up--

were there any questions about any of the projects?

Kind of saved it for the end.

I tell you what-- oh, yeah, go ahead.

AUDIENCE: [INAUDIBLE] from his word

when he [INAUDIBLE] because of the image

going out of the [INAUDIBLE] house, back over construction?

PRESENTER: Just depends on background.

I mean, [INAUDIBLE]

BRIAN CURLESS: Well, I think you can keep the speakers

around for a few more minutes so you can come and ask them

An Giang info

Tuesday, December 4, 2018

Youtube daily report Dec 4 2018

No comments:

Post a Comment