blog toolscompetition

Concept To Clinic tools highlight: FOSSA

Complying with open source licenses is difficult. Today we're talking about a tool that has made it easier for the Concept to Clinic challenge.

Isaac Slavitt
Co-founder

Pull requests, commits, code reviews, and issues—these are the beating heart of open source projects. Without these transactions of code patches and technical discussion, the Concept to Clinic challenge wouldn't exist.

But it also wouldn't exist without the concept of free and open source (FOSS) software itself.

Today we want to highlight one of the less glamorous aspects of open source, but an aspect that goes to the heart of what "open source" actually means: licenses.

Software licenses: choose your own adventure

The choice between MIT, BSD, Apache, GPLv2, AGPL, LGPL can be daunting. These are not the most friendly or inviting acronyms, and none of the actual documents are what you might call "thrilling and plot driven." Most software developers are also not lawyers, so some developers view licenses as a necessary evil at best, old-fashioned gobbledygook at worst, or even decide to ignore licenses completely.

While some are vocal about free software and have strong preferences for which license to use on their projects, others are much less interested. After all it's not like the Free Software police burst in and put you in handcuffs if you use some code from a repository without a license. But (lack of) enforcement is not the point.

Choose your own adventure

Licenses are important because they make explicit the agreement between individual and community, between end user and contributor, between contributors and corporations, so we should care about selecting the right licenses for our projects and about ensuring that the way we use software respects the license under which the code was released.

Not just a concern for hobbyists

That was a lofty argument for being explicit about licenses and about complying with license terms, but for any formal organization—from the largest corporation to the smallest non-profit—employees spend a lot of time and resources worrying about compliance and legal liability. Software licenses matter to these people.

What many hobbyists don't realize about professional software development is that free software is often viewed with suspicion rather than excitement by non-developers. This is particularly true in larger, more established companies who have staff specifically concerned with legal compliance. It gets even more fraught in companies that work in heavily regulated industries such as healthcare or banking.

Instead of a cool piece of technology or a time saving asset, free software is often treated as a liability. Common questions from managers would include:

  • What if we need support?
  • What if we adopt the technology and they decide to start charging money for this?
  • Can we trust the software quality?

These are reasonable questions to ask, and they also have pretty satisfying answers when the open source model is explained. But the biggest question non-developers tend to worry about is this:

How do we know if we are legally allowed to use this software?

This is where the legal language of rights and obligations embodied in OSS licenses is important. The generally recognized OSS licenses use well understood and standard legal terms of art to very clearly lay out permissions and responsibilities.

Although there hasn't been much case law in the United States, these licenses are at least well characterized enough that organizations know the tradeoffs involved when they use packages released under, for example, the GPLv3 as opposed to BSD licenses.

Dependencies all the way down

Bearing in mind that a medium or large project can have tens or hundreds of other dependencies, each of which have dependencies of their own and so forth recursively, and given that organizations generally need to see a well known license in order to use each individual package, the next question is obvious:

How do we keep track of all the licenses that govern our use of free software?

In some organizations, a large spreadsheet or central tracker is used and reviewed periodically. Other organizations only evaluate software licences when a new dependency is added. In both cases, there is a review process to determine what licenses are attached to proposed libraries. Traditionally, this involved several stages:

  1. Every time a package was added to the project, engineers would have to remember to check its license.
  2. Then they would have to decide whether the license was compatible or not. If they were lucky, there would have been a preordained list of acceptable licenses but in reality it usually means that non-experts are interpreting what it means to "distribute" software.
  3. Finally, they would need to manually investigate all of that new dependency's respective dependencies recursively until everyone is so tired of the process that they give up on writing software and decide to become vegetable farmers.

If only there was some type of machine that could traverse graph structures recursively and execute a repetitive checking operation more efficiently than a human developer...

Enter FOSSA

For this project, we wanted contributors to be able to focus on writing code, but we needed the licenses to stay compliant. Concept to Clinic is an open source project, but the same desire holds for virtually every professional software development project.

Thankfully, a colleague recommended FOSSA which is a service that can "continuously scan and comply with open source licenses without slowing down development."

In addition to automatically figuring out the whole dependency tree, FOSSA guesses each dependency's license and flags possible conflicts and automatically notifies maintainers who can manually review and resolve any issues. They also do deep code scanning, where they check inside files for code released under a different license than the project.

We're grateful to Kevin and the rest of the FOSSA team for sponsoring the Concept to Clinic project. Head over to fossa.io and check them out!

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

insights

Life beyond the leaderboard

What happens to winning solutions after a machine learning competition?

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.