>> About future speakers on that site and by following us on Twitter,
and our handle is @nci underscore ncip.
Today, I'm delighted to welcome Dr. Adam Resnick.
Dr. Resnick has many titles.
I'm just going to go through these quickly, but he is the director of the Center
for Data-Driven Discovery in Biomedicine
at the Children's Hospital of Philadelphia.
He's also on the faculty of the Abramson's Cancer Center at the University
of Pennsylvania, Director of the Children's Brain Tumor Tissue Consortium
at CHOP, Director of the CHOP Penn Department
of Neurosurgery Brain Tumor Tissue Biorepository.
Director for Neurosurgical Translational Research at CHOP,
and a Stokes investigator at CHOP as well.
He's a leader in pediatric neuro oncology.
His laboratory is working to understand molecular and genetic underpinnings
of pediatric brain tumors, in an effort to identify
and develop targeted therapies.
And he also leads a number of open data initiatives focusing
on genomic data sharing and infrastructure
to support collaborative research efforts.
Title of his presentation today, "Innovation Through Collaboration:
New Models Emerging in Pediatrics
for an Integrated Data-driven Healthcare Ecosystem."
With that I'll turn the floor over to Dr. Resnick.
>> So, it's a great pleasure to be here today.
And I'm thrilled to be able to present on some of our efforts here today.
I'm here also on behalf of the Pacific Pediatric Neuro-Oncology Consortium
and the Children's Brain Tumor Tissue Consortium.
These are consortia who are dedicated to data-driven efforts on behalf
of pediatrics, and I'm the scientific chair for them.
And much of the work I'll be talking today about really has its birthplace
in these consortia led initiatives.
And what today I'll go through is some of our lessons learned and experiences
and hopefully be able to describe some points of engagement for the rest
of the community as we move forward.
So, it's actually even more fortuitous that I am here today,
because May is actually brain tumor awareness month.
And every year this ends up being a galvanizing month for many
in the oncology space, specifically focusing on brain tumors.
All the away from the pediatric enterprise to the adult enterprise.
And by and large is serves to both reflect on our past successes,
but also particularly in the context of engagement with foundations
and with patient advocacy groups, there is a very palpable sense of urgency
and mandate to step up our game, essentially,
on behalf of the patients we represent.
I think this year ended up being even more meaningful
for these pediatric families.
In part because over the past year,
brain cancer became the leading cancer killer of kids,
and actually the leading cause of disease related death in children.
And this is of course no title that any of these families want to maintain.
And for us, this creates a challenge,
in ways that many of you here in the audience understand.
Specifically, as I was coming here today,
I was reflecting about this intersection between accelerated discovery,
exponential growth of data.
Many of you have seen slides exactly like this
where we're representing the scaling of growth of genomic data in the world.
Essentially on the right hand side you see the size of data that's going
to be collected on the left hand side
of the graph you see the number of genomes collected.
And of course, this is the large scale that is exponential in nature.
And as I was thinking about this, you know particularly in the context
of the pediatric environment,
with every single time as one of us use these types of slides feels is
that the pediatric community has been challenged to really be
on this exponential growth phase.
But this intersection between growth of data, growth of discovery
and how the scientific enterprise works,
is something that we've been spending a lot of time thinking about.
Exactly for the very same reason, that we in the pediatric community,
we're failing to really harness this as well as we thought we should be able to.
And so, as an entry point I was thinking about an article I read
about the 150 anniversary of essentially the discovery of genetics
and Mendel's work, and I was reflecting the Mendel
when he first published his work was really left undiscovered for more
than 30 years with only three citations until it actually came to light
that he discovered genetics in many ways.
And there have been many articles written about this sort of failure
of communication of discovery and why it happened
and you know people have hypothesized, you know what if Darwin had met Mendel,
or if he had actually read his articles.
You know 30 years is a significant chunk of time for many disease spaces.
But, for us, this ends up being a paradigm that we think hasn't changed as much
as it should, given where we are today.
You know a recent example that came out that highlights the discrepancy
between the flow of data in a scientific enterprise and its communication
versus the flow of data in the financial side is something
that I think might be worthwhile to reflect upon.
So, this is a picture, I got this from Kevin Slavin's talk,
a picture of New York and the pointer actually pointing to a location next
to the Carrier Hotel, where much
of the internet actually gets piped into New York.
And intriguingly over the several years,
more and more financial trading companies began buying buildings closer
and closer to that X essentially.
And the reason they began doing that is
because they actually began moving their servers, their trading servers closer
and closer to the point of entry.
But what that means is you know, Cherokee Nation versus the carnival,
that difference between stack and space reflects,
you know an order of a few microseconds of time for the transmission
of information that actually provides preferential advantage to the traders
who are closer to that X. So, the advancement in delivery
and transfer of information is at such a rate that essentially they are trying
to mitigate lightspeed here, right?
Information is traveling across these wires very rapidly,
5 microseconds' worth is actually the determinant for them, right?
Means that almost every other aspect is maximized,
now they're going for distance as a determining factor
for the delivery of information.
For us this would be the dream, right?
That microseconds as a determinant of the communication of information
in a scientific enterprise.
And for us I think it's also been an opportunity in the pediatric arena.
And in part it's because we've been reflecting on some failures
that we've examined within our own community for opportunity that has emerged,
based in the fundamentals of biology.
So, I'll give you one example, again from brain tumors in this context.
Diffuse intrinsic pontine gliomas are these horrific tumors
that occur in the pons of children.
As a result, they're inoperable.
And by and large every single patient dies within 9 months or so.
Now, it's a rare disease and that's another layer
in the pediatric enterprise that's worth focusing on.
Only about 300 patients a year, but virtually every single one of them succumb
to their disease in less than 1 year in time.
Because they're inoperable, for a long period
of time we actually had very little information
about the molecular underpinning of this disease.
And that really prevented most of us in our community
from making any headway in discovering its cause.
But a large number of investigators,
around 2011 or so focused on describing the molecular underpinning
of some bio specimens that they had managed to collect.
And really through amazing work, discovered that histones,
essentially underpins the, at least one of the causes of these.
And of course, histones, now we know really you know comprise
of the molecular scenery that winds and unwinds DNA
and onco-histones is a ripe field of research,
largely given birth by these discoveries in these rare tumors.
Several years later, it turns out that 1/5 of those patients,
in addition to having histone mutations for which we didn't have any drugs
to target or evaluate approaches for therapeutic intervention,
1/5 of the patients have another mutation called ACVR1.
This is actually a kinase that can be targeted
and there are actually compounds available and actually clinical files
for targeting ACVR1, which again is a great narrative of opportunity.
We still don't know if ACVR1 and its pathway for the BNP receptor family
of receptors is going to be a critical intervention point.
But the fact that it occurs in 1/5 of patients was fairly significant.
Those are essentially fundamental and amazing discovery efforts
through the correlation of a large number of pediatric researchers.
But for us it also represents the missed opportunity.
Because it actually turns out that the exact same mutations that are found
in ACVR1 in 1/5 of those patients also underpin another rare disease,
called fibrodysplasia ossificans progressiva.
This is even more rare than the IPG,
but the underpinning for FOP was discovered in 2006.
And since 2006 until 2011, that distance in time
from when the first sequencing studies emerged for the IPGs,
the FOP community managed to really define the mutation that causes it,
that there's biology behind them, that they activate the pathway.
But still, it took from 2011 or 2012 to 2014
for the pediatric oncology community to make that second discovery,
even though that mutation was present
in every single sequencing file that they had.
And this represents a missed opportunity for data connectivity, right?
The fact that they couldn't find it representative of the recognition is
that if they're only actually intersected rare diseases data,
right from bone and oncology, right it would have been fairly evident,
right from a developmental biology perspective
that the same exact amino acid mutation that causes errors
in cell differentiation in one disease might also be important
for another disease.
That three-year period, or two and a half year period for us is as equivalent
to a 30-year gap, you know in Mendel's work, right,
and it's a missed opportunity because there are 300 kids a year
who have this disease, 1/5 of which have this mutation which means
if we had just begun three years earlier,
something like 180 kids might have been spared in one way or another.
I think that's one of the most exciting things that has taken place
within the precision medicine efforts, right,
the immediacy of application in our field.
Even if they had not intersected with the rare disease group like FOP,
if they had just intersected with the TCGA, right,
the pediatric data was just actually intersected with adult data
in the same disease space, they would have seen that stomach,
uterine cancers also have the exact same amino acid mutation take place.
And those, in rare subjects, and of course the uterine
and stomach cancer folks are not paying attention
to that very tail edge of the subject population.
But again, when you have recurrences across different diseases,
biology emerges as a conservative force.
And so, we spent a lot of time thinking about how it is that one,
we can support discovery within our community,
what are the challenges that we face within that arena and why do they arise.
And for many of us, this was an interstitial practice.
This is actually a paper from my own laboratory looking
at fusion gene biology in low-grade glioma.
And this is a project I began virtually the day I set foot on the University
of Pennsylvania's campus and CHOP around 2006, 2007.
And it took all the way until 2016 to actually end at this point.
And it wasn't because it took that long to do the science.
It took that long to bring published data together with new,
emerging data to coordinate efforts among the very long list of authors
who had distinct needs and requirements for participating
in collaborative research, oftentimes independent of their own wishes, right?
So, independent of their own requirements,
you know they were mandated to participate in certain ways.
One collaborator, you know was lamenting that if there were more than 10 authors
on a publication, that didn't count for his promotion, right?
These are the challenges we essentially ended up encountering,
just to be able to do collaborative research in such a fashion.
And for those of us who are in the rare disease community,
this ends up being a bottleneck.
Because the rare requires coordination beyond your local environment, right?
And it took me, let's see, another week after stepping foot on CHOP's campus
to recognize that no matter how hard I worked,
or how hard I attempted to do anything within my own laboratory,
I would not succeed to essentially bring my own investigative efforts in parity
with what I saw happening around me, right?
And what was happening around me is what's happening right now, right?
This is the current basic model that most of us engage
in from bio-specimen driven research, right?
We collect bio-specimens, we sequence generate large amounts of data,
we then create model specimens that investigate our hypothesis
for the cause of the mutation.
We publish our fancy journal
and then we of course succeed imminently in acquiring grants.
And then continue in that prospect.
And for us in the pediatric environment, this was challenging,
right because we could never bring as many specimens together as rapidly
as a melanoma, or a prostate, or a lung cancer.
We could never really harness commercial interests
to derive those efforts as rapidly.
So, more and more as an exponential curve of data grew, right,
the pediatric community who is always behind in some fashion,
felt like they were being behind more and more.
Right, because exponential growth across time.
And for us, this arrival of new therapies was something
that we were still feeling some advantage of and began looking at other models.
So, the other models, and this is something
that has really transformed the cancer landscape,
were the efforts of the NIH itself, right.
So, the NIH and The Cancer Genome Atlas,
this was a wealth of data that essentially primed the cancer community
independent of the existentially threat of promotion or anything else.
And those efforts of generating this large amounts of data
across 11,000 patients, 33 different diseases.
And it was the underpinning for a huge amount of scientific efforts
within the community, all of which of course you, here in this room,
participated in and empowered.
The other effort in parity with the TCGA efforts was
of course a target initiative.
For those of us who were working on brain tumors,
you know we recognize the value Target, but brain tumors were not represented
in Target and so we still had to find an alternative mechanism
to participate in those efforts.
But even as we began looking at these alternative mechanisms,
we evaluated the platforms in existence that supported a pediatric community.
And so, the first question that we actually asked was how was pediatric data
being used differently than adult data?
Data that was out there in the domain and what might be the challenges.
So, these are essentially page hits from cBioPortal, right?
A portal that can really link to the TCGA data during the TCGA grant period.
And you can see that on the top graph you have more
than 50 million page hits across 3 years.
This is from 2013 to 2015, representing robust activity, right?
Around the TCGA data space and this really represents the success,
right, of the TCGA in many ways.
But if you look at the target data over the same amount of time you didn't see
that growth, right, you see peaks of implementation or software implementation,
but hardly anybody is using that data.
And that's not so surprising in retrospect, right?
There just aren't as many pediatric oncologists in the community.
There aren't as many drivers supporting the pediatric enterprise in many ways.
But, more importantly, the target data was separable from the adult data.
So, when somebody was quarrying from a gene-specific approach,
or from a pathway specific approach within the TCGA datasets,
they weren't looking at the target data, and that's a missed opportunity, right?
Given the conservative biology that we already designed.
In our field, when we saw that you know it really solidified the recognition
that one of the main jobs we actually have is as a data steward.
Our job is to pull people into our field, right?
Other expertise and other domains
because there'll never be enough pediatric oncologists working in this domain
to support the discovery efforts that are necessary.
But for those of us who are in the data enterprise, that's you know,
it makes sense you and everybody always understands that more data harmonized
in a way is always better, independent of disease or source.
So, began thinking about a different paradigm largely derived
from the other spaces of the information economy, right?
We essentially, when we began noticing that other enterprises didn't have
to own their space, but still were able to ad compete,
right those who owns things.
Right? So, when UBER, Airbnb can ad compete, right,
the Westin who owns all the hotels, but who's app owning a hotel,
they're competing in a different kind of economy.
And economy is information actually.
And economy of connectivity, of users to what they want independent
of what it is that is the subsidiary for the traditional economy.
And we thought that actually this is where the information economy
within healthcare could also benefit from exploring,
particularly in the context of the pediatric enterprise.
And so, we've designed a number of different efforts that spine
from bio-specimens, to cell lines, to clinical phenotypic data,
large cell molecular data, and hypothesis testing and pre-clinical models
as a paradigm that we could differentiate from existing approaches
that were investigative driven to consortia based efforts.
It had to meet test and vet,
what would it take to actually support this within our community.
So, we began a consortium that, in the first one, and this was when I say we,
several of us across four different hospitals.
And the first consortium we began looking
at is the Children's Brain Tumor Tissue Consortium.
This was an agreement amongst four institutions to essentially commit
to localized and centralized bio-specimen collection in ways that didn't give
up ownership, but that essentially empowered joint quarrying
of those bio-specimens.
And that's important, right?
Because it was the first entry point of getting four institutions
with all their bio-specimens in one location.
Now, that's actually fairly challenging to convince people to do that.
And it required building architectures of what we call radical transparency,
that anybody could actually look and see what bio-specimens are there,
who's using them, with what format.
But we also learned from the TCGA and many of the other efforts
that if you don't actually standardize the way specimens are collected,
what kind of clinical, you know typically you collect,
there's a lot of work that ends up having to be done after the fact,
and you potentially lose a large number of specimens if you don't do that.
So, early on we learned from those efforts,
established SOPs that are uniform in nature.
Every single institution collect specimens exactly the same way
in a large number of different formats, paired samples of frozen
and germline to empower really data generation.
And then we linked it to a clinical trial consortium,
that itself was focused on generating clinical genomic data for patients,
who are entered one such trials.
And this was the beginning of our efforts.
We began with four institutions, we asked more than four,
but only four agreed to initially join.
But once we began building the informatic architecture
that embraced radical transparency, others wanted to join.
Today we're 15 institutions strong, more than 20 across the two institutions,
expanded to Italy and China and then beyond.
And everybody is practicing the exact same protocols, the exact same processes.
And importantly, all of them have signed a constitution
that aligns all the way back to the Bermuda Principles
and Fort Lauderdale Agreement of data empowerment, right?
There's an agreement upfront that we're going to share bio-specimens,
and then any large-scale data will become available in real time,
to the community pre-publication and without any embargo.
And that wasn't as hard to convince our community to do as you might think.
And that's because when you have,
you know a community that faces a nine-month median survival time
for a child, right?
Requiring a six-month embargo,
or any type of embargo represents a significant portion
of those patient's lives.
Furthermore, it wasn't so challenging to do that because we actually had
to engage the patient partner themselves, right?
And so, for us, patients and community organizers
in the rare disease space are a very vocal group, right?
Because when you have a rare disease, you go home, you log onto Facebook,
you find somebody, they connect you, they tell you not to talk to Adam,
to go talk to somebody else.
They know as much as anybody.
And so, if there's ever a hint of any institutional practice
that is preventing accelerated discovery,
they'll be on Twitter the next moment, right?
And that's a very challenging stick to avoid in our community.
So, the combination of patient engagement and shared understanding
of mandate really drove our consortium's growth.
And importantly, drove its growth in a way that was empowered by patient consent
that maximized the use of those specimens, right?
It's as broad a consent as you can possibly drive that creates
to empower discovery in that context.
Of course, many of you here know that those patients also understand
that they actually own their clinical data, right?
And they come to us asking what do I need to do to get my data from here to you?
Or from here to somewhere else?
And that's an amazing partner to have at your disposal, right?
Advocate groups who are driving to do this.
These are, I think part of the reason I'm telling you this is I think this will
differentiate and in weird ways allows the pediatric and rare community
to solve some of the hardest problems
that have faced the data sharing community at large.
We began with something like 174 samples in 2011,
centralized in a large bio repository at CHOP that's robotically managed,
and staff that can hold more than 2 million samples.
And over the past several years,
we've grown to more than 17 thousand bio specimens across more
than 2000 subjects representing every single one
of the brain tumor tissue pathologies.
And again, in order to do this we had to engage trust.
So, not only you look at what bio specimens are there,
you can log on and see all the clinical and unit specific data.
And anybody can actually make proposals.
And that was another differentiator in our efforts in the consortia.
The consortium is not only there to serve consortia members.
Anybody in the community, including everybody here
in the audience can actually make a request for the bio specimen or a proposal.
That proposal gets reviewed in a very transparent way.
And specimens can be given to anybody outside of the consortium.
The only mandate is return of data in alignment
with the constitution's real-time data deposition and access.
Longitudinal data was another component that we again,
by accessing some of the successes and failures of historically
in the cancer community is something that we essentially elevated as a mandate.
Not something that you see in the noncancer community taking place all the time.
And again, the pediatric cancer community actually is a rare community
environment that's really in proximity to other rare diseases.
Other rare diseases that have not focused on genomics as much have been,
for a long time collecting clinical longitudinal data
as a phenotype discovery space.
And modelled many of those efforts within our community and collect essentially
on to similar data, and every one of the subjects in these cohorts,
that longitudinal clinical gene-specific data is connected
to the bio specimen data and available in real-time for everybody.
I think this is a great you know, review piece from the Harvard Business School,
where essentially in the two axes that you have here,
where the y axis is the data shared in the public domain,
versus data that's not shared in a public domain and only among partners.
The x axis points in time data versus longitudinal data.
And for us, that top right hand quadrant of open data that is rich
in longitudinal clinical phenotypic data, empowered by genomic,
is a rife competitive environment actually, right?
If you can actually do that better than anybody else,
you become the anchor for how that is structured.
And that was our message to our community.
That we actually are in a very strong position to compete on behalf
of our patients for discovery if we empower those spaces.
In addition to being able to query and look into the clinical
and phenotypic data to create patient cohorts for your research,
you can also of course do clinical research based
on much of the data that's there.
We bring in pathology reports, op reports, slide images,
and very soon also MRI images.
And these are also separately empowered
for additional discovery and findability.
But most of these portals are not just push portals.
They're not portals that just provide information to the community.
Another mandate was to create a layer on top of them
for communication among the participants themselves, right?
We wanted to begin exploring how it is that the sort of Facebooking
of activities can actually be layered on top of research in the same way
that we actually do normally in the wet lab right?
When you do research, almost every single scientist will confirm that some
of the most transformative moments have been those chance interactions among
human beings, as opposed to chance interaction with data, right?
And we wanted to empower that within this landscape as an enterprise.
We then also addressed, you know one
of the earlies lessons we highlighted, right?
The silo-ing of data.
We were very fortunate to partner with MSK, Dana Farber,
and Princess Margaret in the further development of the cBioPortal
and launched a PedcBioPortal.
But again, we learned from the lessons of the past and not only brought in all
of the pediatric data what we were generating,
you can see there's a child button.
But we also brought in all of the adult data, right?
So, every single adult case that's available from TCGA
and the NIH cBioPortal is also available for query in PedcBioPortal.
So, in many ways, when you do a query,
the largest empowered arena for searching adult
and childhood data is actually in PedcBioPortal.
And for us, I think this was one of the first spaces
where we began engaging external community members, right?
People who care about adult characters, but looking for additional discovery
in any other domain, right?
A new effort that we just launched, it will be made public soon,
I just wanted to introduce it.
It's called DiseaseXpress, where again, we're engaging this adult pediatric
and now normal data environment.
And here we essentially have harmonized TCGA data target, CBTCC,
and PNOC data in a common environment for looking at expression profiling.
And this will be launched fully later this year.
Like all of you here in the room and I think many
who are watching this presentation,
analyze data in PedcBioPortal is one entry point.
But, in addition, it's clear that genomic data,
unlike many other points in time data in the past within our domain,
is a rich renewable data force, right?
You can ask multiple questions, multiple ways, re-analyze, re-integrate.
And when the NCI launched the CAVATICA commons
and also the Cancer Cloud Initiative, we began asking how can we,
as a community partner that environment.
You know we recognize, right?
That the movement in data, use of data,
access to data for re-analysis secondary use was something we needed to empower.
And so, we reached out to all the cloud pilots for potential collaboration,
and all of them were very eager to collaborate.
And we ended up actually forming a strong partnership
with Seven Bridges Genomics who agreed
to essentially help create with us CAVATICA.
Cavatica is the last name of Charlotte from "Charlotte's Web,"
so it harnesses for us of course the narrative of you know
through intelligent design and true partnerships across a very diverse community
of farm animals, you can save the life of an innocent
who is destined for an alternate course.
And so, CAVATICA launched officially last year as part of the Cancer Moonshot.
And it really is an environment
that re-capitulates the Cancer Center McLeod environment,
but with additional features that really layer in that human interaction space
and findability and usability component that we need
in the pediatric rare disease community.
So, it really sits outside of the NCI Genomic Data Commons
and cloud environment, but uses the same data models and can actually via APIs,
interoperate with the Seven Bridges environment.
So, if you have dbGAP access that allows you to see data
in the Cancer Center McLeod environment of the Seven Bridges,
you can also see it in CAVATICA.
And again, that was one of our mandates, is to ensure that anybody who cares
about any data that's empowered within the adult environment,
can also see that data in the pediatric environment,
but then can also see any data that we deposited
and brought into that environment.
So, our goal was to convert, right?
The analog space of the bio specimen into the digital space,
big data and then power it for usability and access.
And like many of you here, we built multiply different applications
that can answer the question in different ways,
whether you're interested in analyzed data, the clinical phenotypic data,
the imaging data, or of course if you want
to re-analyze the data within CAVATICA.
CAVATICA has just recently undergone another duration of development
where we're really layering in much of our clinical phenotypic data, again,
modeled after the TCGA data oncology.
So, there's harmony between those efforts.
And you can now begin creating cohorts within CAVATICA, not only on TCGA data,
but on data that's emerging within our domain.
Importantly you can also identify your friends,
and neighbors who you want to work with, right,
and want to participate in that environment.
And of course, everything that we do and that we analyze,
we make available in real-time, including the applications,
and of course many of the other public applications are also there
within that environment, and of course, now there's also new connotation
of a genomic data browser that really further empowers the use
of that landscape.
One of the biggest challenges we faced was beginning to look
at differential data access models, right?
In CAVATICA we have to begin solving unlike the Cancer Center McLeod
where you just have dbGAP data, in this environment,
we're telling people if you have dbGAP access you can bring that data
into that environment in alignment with your dbGAP access policy.
You can bring that data from your own laboratory,
you can access CBTCC or PNOC data.
These are very complex workflows with differential access requirements.
But, we've spent a lot of time thinking about how to do
that and how to create that.
And importantly, we implemented user group environment and projects environment
that permit joining into that project in ways
that retain the correct alignment with access policies.
But more importantly, every single project that's initiated on CAVATICA,
independent of the fact that you might have access
to the data is quarriable, right?
So, you get to actually ask to join a project,
and know what the project is about.
And again, radical transparency makes it very difficult to say no, right,
because in a cloud based environment, if somebody's asking somebody else
for access into a project and they say no,
not for a good reason, people know, right?
And so, transparency around activities, right,
aligned with the mission is something that we try to essentially build
into every single component.
And over the past year, we vetted how it is that we can do
that across the different data types and models that are in existence.
Whether you're bringing data in from the EGA or dbGAP, from your own laboratory,
or from any other environment,
or essentially connect via API to the TCGA data and the Capstone McLeod.
Now, one of the most exciting things that we wanted to look at is
to subvert the lesson we've learned in the context of DAPG.
Right, so we never wanted to again be in the position
where another rare disease has already found out an answer and we missed it.
And so, very early on in addition to cancer data
that we generated and put in here.
We began bringing in autism data, epilepsy data,
and birth defect data into that environment.
And that's because we, in our community,
already knew that this made sense, right?
We already know that if you've got a cleft palate
or cardiac birth defects you're three times more likely to have a cancer.
This is the power, or essentially the landscape of developmental biology.
Kids don't get cancer because they smoke, or they're out in the sun,
or any other type of traditional, or common component of most cancers
in the human landscape, they get it because of developmental biology,
and these are shared mechanisms, right that we oftentimes have missed.
And again, we see this as our burden, right?
It's very challenging for any other community to bring
such different communities together, right?
Even within the NIH, it can be very challenging to grow
across institutes in such a fashion.
And so, we really see this as something that we have to do and own.
Of course, I'm standing here in front of you really as a basic scientist,
who you know stepped on the campus of the University of Pennsylvania,
and Children's Hospital Philadelphia, wanting to do brain tumor research.
And out of an existential need of not being able to do it,
had to actually engage much smarter people within my community,
much more talented scientists and data scientists
that essentially comprise now the Center
for Data-Driven Discovery in Biomedicine.
So, they actually do almost all of the work that's here.
I just get to make really nice Prezi's about their work.
But, I also get to actually ask the right questions now
within a field that's required our scientific input
for a very long period of time.
Here in the audience I also just want
to recognize Jenna Lilly [assumed spelling] and Alison Hieff [assumed spelling],
who are two members of our team and again,
they have done much more of the work than I have.
And of course, you know our team of 50
within the center is just a starting point
because what we really want is you on part of our team.
And that's because right now there's more than I think 1200 BAM files
that have been uploaded in real time into CAVATICA, prepublication,
available for people to actually request access to and use
and make discoveries even before I have looked at them.
And so, that's where I would like to complete, right?
The competition shouldn't be who can do it first,
it should be who can do it fast, right?
And those are not the same, right?
Because you can do something first by just holding on to your data, right?
That's an easy trick.
But if you can do it first and fast in public ally available data,
you're actually serving the need of the community, right?
You're actually accelerating discovery in a way
that you know maybe Mendel didn't have that in his sort of motivational space
when he was looking at those peas, but I think everybody here today does.
And I think that's the space we're in today.
And I look forward to working with you and anybody else who's watching this.
And of course, we're always looking to do it better, right.
And so, we're looking for additional opportunities to learn from the community,
to empower discovery in such a fashion.
And so, I thank you today, and I think I'm almost on time, which is great.
And I'm happy to take questions, or review anything else.
Thanks.
[ Applause ]
>> Thanks, Adam for a fantastic presentation.
If there are questions in the room, please raise your hand
and Eve will bring the microphone to you.
For those of you on the WebEx use the raise hand feature in the WebEx tool
and we'll unmute your mic for you.
Questions here in the room?
>> It's a little ironic that it's Adam and Eve working together [laughter].
>> Two angels, that sounds great.
Fantastic presentation, truly.
Just curious is your data available in the GDC now, or?
>> That's the next step.
And that's something we committed to very early on,
both in the GDC and the ICDC, right?
So, as soon as we can get it in there, it's something that we want to do.
So, first we're sequencing in synchronously, so we're sequencing both germline
and tumor, but not always the matched pair.
Those will get matched up as they come in there.
So, every matched pair then gets prepared for deposition
in other environment communities.
We definitely, that was our mandate from very early on, right,
as many locations that can empower discovery will be utilized for this purpose.
>> If I may just very quickly, does any of this you actually updated from GDC
to your servers, what level or were they raw data,
or they are really analyzed data?
>> When? Say that again when?
>> From GDC to your?
>> Oh, so to be clear, so when we actually connect
to the Cancer Center McLeod environment,
it's BN API that utilizes the Cancer Canter McLeod environment
within Seven Bridges.
But, we have piloted bringing in dbGAP approved data or EGA approved data.
One of our lessons was exactly this and it took us almost nine months
to move a pediatric dataset from the EGA into our environment.
And we only had one year to use the data, per the agreement.
At the end of one year, we were required to destroy the data, right?
And they started counting when we started downloading right.
But that was a use case.
Right? That's what the community is used to, right?
So, moving data.
And we launched as many Amazon instances as we could to maximize the python
out of the EGA to move the data into CAVATICA for use.
But we saw it as a use case to really learn.
How long would it actually take?
What are the challenges?
When in 2015 the NIH permitted the via dbGAP access to move data
into an approved cloud environment.
For us this was strategically aligned, right, to do this.
Even within our institution, when we review the number of times
within one institution people have actually downloaded the exact same datasets,
right?
For different projects, it's surprising how often that takes place
and how little most institutions recognize that they are actually spending money
and time and storage costs in a duplicate matter in one institution,
whereby dbGAP rules you can actually very easily add someone to a project,
right so within the same institution.
So, we did model this, bringing in dbGAP data into CAVATICA, when you do that,
it goes into a project with a PI who is in charge of that data,
just like they would be for any other dbGAP approved access.
>> Other questions in the room?
>> Any critiques?
No.
>> You have one, okay.
>> We have is it, Denise?
Did you say?
Denise, go ahead.
>> OH, sorry I put myself on mute.
Hi, yes this is Denise Worzel [assumed spelling] very interesting presentation,
thank you.
I'm calling in from Colorado, but I'm part of the CBIT staff there.
I have a question and maybe you said this and I might have missed it.
You said you were using the GDC data model is that correct?
>> Correct, so one of the things when we were looking at how
to model the clinical and the specific data, this is something that we've been,
actually go ahead and ask the question before I answer it.
>> Yeah, so my question was what,
I'm presuming that you may have some additional data that's not
in the GDC data model.
And if that's true what, you know how do you categorize that data,
what types of information did you need that weren't there?
And then if you're planning to put the data back into the GDC,
have you thought about how you'll handle those data types
that are not already supported?
Thank you.
>> Yeah, so I think that's a great question
and I think that this is still an open question for us.
So, even actually using GDC model doesn't fully support the pediatric cancer
environment all the time.
And so, for us the way we've engaged this is through communication
and actually engaging the stakeholders themselves.
So, early on we began by working with clinicians and the clinical domains
to decide how they think about the data itself and iterate around those models.
But it's still an open question and we really see it
as an open community question.
So, we'd be happy to of course share the way we're doing it with you.
Everything we do is in really an open access.
But we need more stakeholders to actually have
that discussion in this environment.
Alison has been leading these efforts, maybe I should give her the mic,
maybe she can also further answer those questions.
Partly because Alison was also instrumental in the GDC environment, so.
>> Yeah, I mean it's been a really interesting experience going
to the CBTCC data, and a lot of them are a lot of very clinically driven.
And, you know the terms of mapping the bio specimen
and the genomic data type have been very easy toward the GDC data model.
I think there's gaps that we realized when we were doing the GDC
that are becoming apparent trying to map some of this data, longitudinal data.
Different kinds of treatment types,
for example surgery is a very important treatment type as the brain tumors
and the GDC data model doesn't have that modeled very well.
TCGA doesn't have that modeled very well, it's very the drug treatments
and radiations more so than surgery.
So, you know there've been kind of these educations that we're working through
and I think it's going to inform both sides in terms of how
to make this data harmonize across these large datasets.
>> Yeah, and we actually see that as the next domain, right.
In the genomic space, right, you have really four degrees of freedom let's say,
but in the clinical and phenotypic space, there's sort of infinite degrees
of freedom or harmonization.
And we are looking at ways to build new tools that you know accelerate
that process and connectivity between the human architecture and how to pose
in fact that information on the data architecture.
>> How do you keep up with the changes that are happening in terms
of the analytical pipelines?
I mean do you really?
Because we know as the tools are becoming available we can really get a much
better, you know whether its alignment or variant calls so on and so forth.
>> Right. I think the answer is that you can't keep up, right?
And so, I think, I personally can't.
Maybe some people on my team can.
But I think that's the necessity for a cloud based environment, right?
And even more so, that's a necessity for the deposition
of pipelines in the environment.
So, within CAVATICA, and also the GDC
and I know the genomic data comments also align with this, right?
Doctored pipelines that can be made available,
right for the community is something that you know we're very much aligned with.
All of ours are written in common with four languages.
And so, the use and reuse and the position of pipelines.
And for one person to say my pipeline is better than yours, right?
That should happen in an open environment for that competition.
I think also, another component which isn't fully launched yet,
but something that we have seen is definitely needed.
You know, when we wrote, like when I wrote that paper that I showed you right?
The limited space of recognition of who's in the beginning
and who's at the end often does not do service to those in the middle.
And by and large, I think oftentimes the biochromatic community,
unless you're writing a paper about biochromatics,
ends up being in the middle, right?
In ways that I don't actually think represents a true contribution.
And so, we'll begin exploring actually DOI assignment, we're now a DOI mentor.
DOI assignment to pipelines, the staging files as well as the results in ways
that permit attribution to those components.
And again, I think that will promote, I think a space of authentic competition
and usability, because we can track, right?
Which DOI assigned pipeline should be first place or last.
>> Tanya Davidson CBIT.
So, I'm curious about your repository, your brain cancer repository.
First, where is it located and where did you get the funding for it?
>> That's certainly an important question
and you know it actually represents a missed opportunity on my part.
Almost all of what you see here has been funded by foundations.
They have more than 60 foundations who have supported these efforts.
And I say they are our true partners.
And so, they have been supporting this from the very beginning.
The bio repository is located at The Children's Hospital in Philadelphia
on the A level where we have a large robotic managed facility.
And it houses both cancer and noncancer bio specimens
and all of those will continued to be empowered in very similar manner.
But if it wasn't for the commitment of foundations it wouldn't have happened.
And that's something I think is important to recognize.
All right, when a patient consents for the use of their specimen
for a search there is an implicit contract, right?
That you're going to maximize the opportunity from that bio specimen on behalf
of the disease that they are you know committing to.
And so that's that premise is underpinning everything we do.
And importantly, the architecture that you saw and the application
that we build the purpose is not just to do the data science, right.
You always want to be able to go back to the bio specimen.
So, cell line, genograph, and additional bio specimens
to test vet your digital understanding.
That's the goal, right?
To go from the analog of the patient to the digital of the data back
to the analog of the bio specimen of the patient, right?
That's the workflow that we wanted to create.
>> Thanks very much.
Fantastic presentation.
I have a question.
How much was sort of the altruistic objectives of putting transparency
and patient value before scientific what is it,
reputation and recognition and career building.
Was this in part enabled by the funding mechanism,
and potentially all the reasons,
I don't want to constrain you to just this one question.
>> Yeah, yeah.
You know so, I think when altruism and self-interest align,
that's the perfect for [inaudible] centesis.
Right?
It's not always easy to get those two to alight, but I think you know funding,
right is something that's important.
And by and large, if you look at most foundations and how they fund research,
it's changing, but early on most of the discussions that we had were the type,
I the foundation want to fund my favorite investigator to allow them
to get preliminary data so that they can compete for an RO1
in the disease's interest, right?
But the NIH has actually done a great job of describing success rates
and failure rates of RO1 acquisition and what are the determinants for those.
And there's very little evidence that preliminary data funded
by foundations is the most deterministic component of RO1 acquisitions.
And so, when we began talking to foundations in such a manner
and when we began educating foundations about the longevity
of empowerment associated with data access, right?
So, when they fund data generation and if it's implicit in their contract
that data has to be shared, they get to have credit
as a foundation much, much longer, right?
So, it became something that was good for them, right?
More people use that data, the more credit they could actually get
as a foundation in alignment with both the altruism and self-interest.
And so, I think incorporating data access requirements within that is key.
And I think that's going to be what topples the wall because more
and more foundations, they actually assumed
that data just gets shared right away.
Right? That's what scientists do, all right, in many ways.
And so, I think education and communications with people is the first heap,
but I think the next generation actually is going to be patients
who really even go around the scientists entirely, right?
And begin empowering the clinical
and genomic data that's being generated, you know robustly.
Most of the hospitals, most of the hospitals
that we have here are generating clinical genomic data, right?
And that is something patients have complete control over.
>> Or maybe too, the rare disease example that you showed us a real eye opener.
And you and I have chatted about this a little bit before,
about all of it and you mentioned in one of your slides about bringing in all
of these other data types from other diseases, or conditions, autism,
birth defects, and those sort of things.
What do you see as the biggest challenge, you've been,
with this CBTCC this has been basically pediatric community.
Pediatric oncology community.
What do you see as the biggest challenges in bringing
in other communities even though they may be working in the pediatric space,
but they're not working in oncology.
What are the challenges for getting them on board with this model
and getting them to share the data?
>> Yeah, so.
There are a couple different challenges that are faced.
One, I think when we tell our story, they very much connect with and recognize
that they face the very same challenges that we face.
The first few issues is one of you know is the platform going
to support the capacity to still maintain identity and visibility, right?
Nobody wants to be swallowed up by some large platform.
And you actually want that right?
You want that identity maintained,
because it ends up being a focus, a pull right?
When there's a foundation that's dedicated
to a rare disease, they are the node, right?
So, you don't want that node to disappear.
You just have to connect that node to your node, right?
So, once we begin showing other foundations and consortia that they can do that,
that's the first effort and it's actually self-serving for us, right?
It serves our purpose to maintain their identity, support their identity,
because then they bring in more data that we connect to our data, right?
The other challenge ends up being funding, right?
Even though we end up essentially reducing duplication of efforts, so there,
any other foundation or consortia that has begun conversation with joining us,
they don't have to spend the same money that we spent twice, but scaling, right?
And everybody here in the room knows that scaling costs money.
So, that's the other component.
And then the last piece connected to that is that it's still very difficult
for most efforts to recognize how much it costs
and how difficult it is operationally to do this work, right?
Infrastructure is not a sexy thing to support.
Right? And having that conversation is again something that I think is important
to convey and seeing we still have room to grow
and how do we convey the importance of infrastructure, right?
And how does that connect, ultimately, to the clinical trial.
Right? That distance between infrastructure
and the clinical trials seems so large sometimes.
But I think now we have enough case stories that we can connect the thread.
And then the last piece is actually,
you know demonstrating to them that there's actually evidence
that they will empower discovery within their own domains.
And so, by and large most of these efforts have been very willing to do that.
The hardest part is going to be some
of the clinical phenotypic data harmonization across different diseases.
Right? How do you find what you want in those spaces,
or differential data types?
We've been strategic in our efforts by beginning with brain tumors.
Because it's very easy genetics and into the central nervous system,
which is a robust environment for developmental biology.
Right? Twenty to forty percent
of brain tumor patients have seizures and epilepsy.
So, that's an environment already intersected.
Autism also intersects with it.
So, the brain as a developmental biology arena is already a point of assessment
for us that I think is very well aligned with the neuro oncology space,
but it expands beyond that and so we're already looking at craniofacial,
and cardiac birth defects as the next wave.
>> Great, thanks for that.
We're just about out of time.
So, I just want to make a brief announcement about our next presentation,
which will be on Wednesday, May 24th,
and should be a really interesting presentation by Brad Erickson
from the Mayo Clinic and Elliot Segal from the University of Maryland
who are going to have what I understand as a debate-style presentation
on the use of, on automation versus human analysis of radiological images.
So, that should be a really interesting one.
>> I can come too.
>> Yeah, come on down.
That'd be great.
And thanks to everybody who's joined today, both here in the room
and on the WebEx and once again let's thank our speaker
for a terrific discussion [applause].
>> Thank you so much.
No comments:
Post a Comment