Developer's workshop meeting focuses on the formation of an open source community

Developer's workshop meeting focuses on the formation of an open source community


Open source is the best approach for leveraging distributed thinking to achieve synergistic advancements in innovation.

You could say that open source is one of the best ideas for innovation in the past 20 years, but it would not be true.

More than a 100 years ago Henry Ford entered into the automobile manufacturing business. He was met almost immediately with a problem that would become a lightning rod for his ire.

A patent.

Yes, Henry Ford found himself face to face with a patent troll. A man by the name of George Selden held a patent for a motorized carriage which he claimed gave him rights over any automobile being produced.  This patent was administered by 11 car manufacturers that made up the Association of Licensed Automobile Manufacturers (ALAM)

When Henry Ford went to ALAM for a license to produce automobiles, he was told he was only an assembler not a true manufacturer. He was more than an assembler, that’s for sure. More importantly he was persistent and fought a patent battle for 8 years. A battle that ended on quite an ironic note.

The case was finally decided when a judge ordered the Selden car be built. It did not work. So, a patent for a car that did not function almost blocked one of the greatest industrialists ever.

Did Ford have patents himself?

He did. Did he ever use them to block competitors?

No. Despite having some 92 patents on engines the Ford motor company never made any royalties from these. Ford’s early experience colored his view on patents. He believed you should compete to produce the best possible automobiles and work together on engines.  Even today car manufacturers work together on developing engines and Ford has made some of its hardware and software open source.

The ‘engine’ of translational research

tranSMART is in reality the translational research equivalent of the motors Ford developed. There certainly has been a significant amount of recent interest in tranSMART.

Many companies and some projects have implemented their own versions of tranSMART.  There has even been the formation of a community around tranSMART. However, up until now you could only describe this community as a ‘nascent’ community.

That changed last week.

Last week the 2nd tranSMART Developers Hackathon was hosted by the TraIT consortium in Amsterdam. It marks a major milestone.

It marks the conclusion of the first collaborative coding effort that began with the first hackathon held at Imperial College in London. As Kees Van Bochove, the CEO of the Hyve, put it “Today we have a chance to write history.”

Gerritt Meijer, the principal investigator of TraIT from VUmc, opened the hackathon with images of an engine. The point he was making was that the tranSMART community should not fall into the trap of focusing entirely on the technology and less on the outcome. He went on to highlight the importance of tranSMART in improving patient care. Yet, his engine analogy channels Ford’s open source mentality. A mentality that drove the entire industrial revolution.

Perseverative Inefficiency

Van Bochove nicely illustrated how both industry and academia perseverate their knowledge management solutions.

Pharma companies all have similar knowledge management desires. They all want for example the ability to:

  • Support microarray data analysis
  • Load public microarray data from GEO
  • Store and retrieve saved analyses
  • Search on gene name disease name
  • Load TGA studies we have access to
  • Load 1000 genomes of data

Traditionally bioinformatics and IT departments would implement this all on their own, or sit and look dreamily out their windows waiting for the day that they were given large enough budgets to do so. On the whole, a highly inefficient approach.

Is academia any better?

You might think so. Just look at the number of authors on a typical academic publication. Academia seem to be an orgy of collaboration. Not so fast.

Van Bochove describes what is known as the not invented here (NIH) syndrome as illustrated here.

tranSMART is an opportunity to stop perseverating and say something meaningful. A chance, as Meijer points out, to focus more on what really matters - better patient outcomes.

Open source is not only for developing automobile engines.

Van Bochove described a number of wildly successful open source projects.

Projects like Galaxy for bioinformatics, Plone for text editors, and Drupal for building websites.  And perhaps two of the most ubiquitous ones in bioinformatics are R and Bioconductor. How did the R developers achieve this?

Van Bochove quoted Brian Ripley, an R Core member:

"The R project is governed by a self perpetuating oligarchy, a group with a lot of power. R was principally developed for the benefit of the core team."

The point being that to get tranSMART moving forward does not require a lot of governance, “we just need to get people working on it”. That is precisely what has been happening and not just on the core code base.

Companies are developing enhancements to tranSMART that they are bringing to the community.

Pfizer has contributed a GWAS upload VCF, data storage and analysis, and enhanced data export capabilities. Sanofi developed a cleaner user interface, a metadata layer for all concepts and categorization and file management for studies and programs.

Add these contributions to all the previous work and what do you get?

“A monolithic code base” with multiple unconnected branches.

From monoliths to bazaars

It is not unlike what Eric Steven Raymonds describes as a ‘bazaar-like’ development process. This contrasts with traditional developments’ ‘cathedral-like’ projects. Can bazaars deliver big projects?

Van Bochove points out that what all the wildly successful open source projects have in common is that they are not making one “silver bullet” that solves all problems. They are not building a cathedral. “They build a community, have a stable core, and have good options for modularization”.

This is not unlike what Raymond claims makes ‘bazaar’ type approaches like his own Fetchmail and Linux successful. Emphasis is based upon a stable core that is not something entirely new and the inclusion of personalities capable of building a community.

There is no doubt that the tranSMART community has the requisite personalities in people like Brian Athey, Terry Weymouth, Yi-Ke Guo and Kees Van Bochove. The spirit of the Hackathons is a testament to that.

In their talk Brian Athey and Yi-Ke Guo emphasized that the focus remains very much on generating a stable core including developing a robust panel of test scripts.

Van Bochove highlighted the importance of the API development. It will provide the necessary degree of modularization. This is the key for utilization of all the different development contributions and enabling the use of different databases. Indeed the current phase of development, where the API is being built is critical. Without it, it will be difficult to build a thriving community.


Flourish or flounder?

Open source approaches are not something new. What is new is the scale at which they can be implemented.  Furthermore, the complexity of many projects almost demands an open source approach.

Although the mass production of automobiles had an enormous impact on society, today’s open source projects such as tranSMART have the potential to have an even greater impact.

What is clear from the recent tranSMART Developer’s Hackathon is that this phase of development will determine if tranSMART will be able to become a ubiquitous solution like R or simply an option in a sea of solutions.


Scott Wagers, MD is the founder and CEO of BioSci Consulting who blogs about collaborative research at Assembled Chaos. He co-leads both the management and dissemination work packages in eTRIKS and is the chief editor of the eTRIKS blog