Skip to main content

dkernohan

#opened14 - third day keynote from John Willbanks

6 min read

John Willbanks currently works at Sage Bionetworks. He was asked to speak about open science and open data.

He started by cautioning against "open silos", different campaigns using common tools and approaches but not speaking to each other. Science effects education, and both are affected by wider culture, and the culture of prediction.

Yogi Berra - "predictions are hard. especially about the future". (It is now easy to find older books to source quotations via web seachers, though nothing from the last 25 years)

It *was* really hard to make predictions about the future. But predictions are increasingly accurate - especially predictions about ourselves. Every single website is trying to sell you the same thing - it's not like they know you, they literally know you. Mining things like email data to make predictions has exploded over the past 10 years.

This is about probability. And this is basic mathematics.

Increasingly fields are, or can be, data driven. Biology used to be a narrative science, now with the advent of cheap shared data, it is a predictive science. He gave the example of services like "23andMe", consumer genetics. Or Science Exchange - ebay for university science services.

It now costs $200 per sample to do RNA microarray. Tools for science and analysis are cheaper.

Not just hard science. In Archaeology there are huge amounts of archive data. Even etymology we can find the origin of quotes.

Everything is text. So every field has a data wave coming. Everything is increasingly measurable and indexable.

So probabilistic analysis is going to be the academic coin of the realm. And advertising is making the methods and tools more accessible.

Probability changes every time we add new information to the model. This changes educational culture, and changes the needs for training and skills. He said that current pedagogy is failing - there is no continuing education for sciences. So it is hard for academics to deal with the data flow.

In the sharing economy, a larger market makes for a better economy. Though these are rental economies, not good for labour or conditions. And service owners don't want you to be a buyer and a seller - in science, we want to be able to be both.

These markets are better (for buyers) than the terrible status quo. But this isn't good enough. Open multi-sided platforms allow individual actors to have multiple roles.

In the open movement, we don't focus on adding users. We need lightweight ways to move people in ways like shifting from being a wikipedia viewer to a wikipedia contributor. And getting value from both sides - increasingly as more people are involved.

It is not about the assets (or the license choices), it is about the users. And these may be people who don't agree with us philosophically. He gave the example of open source - methodologically and economically it succeeded. The philosophy is great, but it wasn't that that drives growth.

Willbanks asked of any "open" activity - "does it create more value than a closed version?". Openness is a methodology that gets assets and data in front of people.

So selling value rather than philosophy is selling a practice change. Work at Merck on cancer is open, via a non profit operation. It allows anyone to use genetic data.

Analytic tools to analyse this data need to be used alongside experience - so can we create an open multi-sided market to bring these together - not just solo labs (as the natural unit of science) but communities. Government funding now works to foster collaboration, and open approaches can simply play into this (eg TCGA Pan-Cancer Consortium).

In this example, open methods allowed the consortia to analyse data collaboratively, buy instigating a culture or sharing clear information. So science practice has improved via open approaches. Using approaches like version control for annotations and metadata. Allowing researchers to see every stage, allowing us to be confident in probabilistic analysis.

And for researchers not used to these ways of working, this practice sucks. It is new, and slow. But the value realised in terms of academic activity (papers etc) is immense. And this led it to gain users from across TCGA.

This was a community that was required to work together, but what about those that are not. In colon cancer we saw 4 (or more!) simultaneous papers postulating different genetic subtypes for the disease. Open approaches allowed groups to test their methods across all of the 13 data sets. So a consensus subtype, with high probabilistic confidence, emerged.

The approach is now exploding across research groups. And it makes challenges possible to widen communities - more eyes on the problem. For example computing the probability of cancer relapse. The winner (with the competition as peer review) gets a guaranteed high-impact journal publication, but code sharing is required to be eligible.

The winner actually got a cover, an opinion piece, a methods paper and a results paper. And an entire suite of tools was generated (even from outside medicine) for others attaching the problem - the winning entry was from the lab that invented the mp3 codec.

If you have an open player in the market, it changes and improves the market.Less immoral. Less asshole-y.

So we need to think about our practice - how do we govern open platforms? How do we design and cost them? Willbanks felt that the biggest challenge the open movement faced was platform design, to drive engagement. The iPhone was not designed around the idea of a closed ecosystem - it was designed around value to the user.

With an open platform, you are not just a buyer or a seller. You are a citizen. You are a member. And good design means you are the priority.

Licenses like BY and 0 give users more value. And a winning design can embed this into places where open had not been previously considered. He gave the example of informed consent (which reminded me of early UK work on the consent commons), claiming that better designed forms would make it easier to find research participants, allowing for larger scale (and thus more probabilistically confident findings).

This led to collections of noun, verb and sentence icons and animations, and storyboard templates, put into the public domain. Allowing the simple creation of stories that can properly inform consent. Using mobile technology and sensors to gather and analyse research data (for example gyroscopic sensors to measure hand tremors in Parkinsons patients).

As a fully open tool, these informed consent approaches can be used in a variety of contexts. Allowing other people to do things that the product creators cannot do, or had not considered. Again, an open method creates more value. Economic value, educational value.

In probability, adding more data refines the model. But what we "know" becomes less stable as more data is added, so pedagogy needs to change to reflect this emerging ontological instability. So the right to reuse becomes the right to be current, and to get better, and to create value.

And value is not just economic in open systems - it is social value and knowledge value.