blog toolscompetition

Concept To Clinic tools highlight: FOSSA

Complying with open source licenses is difficult. Today we're talking about a tool that has made it easier for the Concept to Clinic challenge.

Isaac Slavitt
Co-founder

Pull requests, commits, code reviews, and issues—these are the beating heart of open source projects. Without these transactions of code patches and technical discussion, the Concept to Clinic challenge wouldn't exist.

But it also wouldn't exist without the concept of free and open source (FOSS) software itself.

Today we want to highlight one of the less glamorous aspects of open source, but an aspect that goes to the heart of what "open source" actually means: licenses.

Software licenses: choose your own adventure

The choice between MIT, BSD, Apache, GPLv2, AGPL, LGPL can be daunting. These are not the most friendly or inviting acronyms, and none of the actual documents are what you might call "thrilling and plot driven." Most software developers are also not lawyers, so some developers view licenses as a necessary evil at best, old-fashioned gobbledygook at worst, or even decide to ignore licenses completely.

While some are vocal about free software and have strong preferences for which license to use on their projects, others are much less interested. After all it's not like the Free Software police burst in and put you in handcuffs if you use some code from a repository without a license. But (lack of) enforcement is not the point.

Choose your own adventure

Licenses are important because they make explicit the agreement between individual and community, between end user and contributor, between contributors and corporations, so we should care about selecting the right licenses for our projects and about ensuring that the way we use software respects the license under which the code was released.

Not just a concern for hobbyists

That was a lofty argument for being explicit about licenses and about complying with license terms, but for any formal organization—from the largest corporation to the smallest non-profit—employees spend a lot of time and resources worrying about compliance and legal liability. Software licenses matter to these people.

What many hobbyists don't realize about professional software development is that free software is often viewed with suspicion rather than excitement by non-developers. This is particularly true in larger, more established companies who have staff specifically concerned with legal compliance. It gets even more fraught in companies that work in heavily regulated industries such as healthcare or banking.

Instead of a cool piece of technology or a time saving asset, free software is often treated as a liability. Common questions from managers would include:

  • What if we need support?
  • What if we adopt the technology and they decide to start charging money for this?
  • Can we trust the software quality?

These are reasonable questions to ask, and they also have pretty satisfying answers when the open source model is explained. But the biggest question non-developers tend to worry about is this:

How do we know if we are legally allowed to use this software?

This is where the legal language of rights and obligations embodied in OSS licenses is important. The generally recognized OSS licenses use well understood and standard legal terms of art to very clearly lay out permissions and responsibilities.

Although there hasn't been much case law in the United States, these licenses are at least well characterized enough that organizations know the tradeoffs involved when they use packages released under, for example, the GPLv3 as opposed to BSD licenses.

Dependencies all the way down

Bearing in mind that a medium or large project can have tens or hundreds of other dependencies, each of which have dependencies of their own and so forth recursively, and given that organizations generally need to see a well known license in order to use each individual package, the next question is obvious:

How do we keep track of all the licenses that govern our use of free software?

In some organizations, a large spreadsheet or central tracker is used and reviewed periodically. Other organizations only evaluate software licences when a new dependency is added. In both cases, there is a review process to determine what licenses are attached to proposed libraries. Traditionally, this involved several stages:

  1. Every time a package was added to the project, engineers would have to remember to check its license.
  2. Then they would have to decide whether the license was compatible or not. If they were lucky, there would have been a preordained list of acceptable licenses but in reality it usually means that non-experts are interpreting what it means to "distribute" software.
  3. Finally, they would need to manually investigate all of that new dependency's respective dependencies recursively until everyone is so tired of the process that they give up on writing software and decide to become vegetable farmers.

If only there was some type of machine that could traverse graph structures recursively and execute a repetitive checking operation more efficiently than a human developer...

Enter FOSSA

For this project, we wanted contributors to be able to focus on writing code, but we needed the licenses to stay compliant. Concept to Clinic is an open source project, but the same desire holds for virtually every professional software development project.

Thankfully, a colleague recommended FOSSA which is a service that can "continuously scan and comply with open source licenses without slowing down development."

In addition to automatically figuring out the whole dependency tree, FOSSA guesses each dependency's license and flags possible conflicts and automatically notifies maintainers who can manually review and resolve any issues. They also do deep code scanning, where they check inside files for code released under a different license than the project.

We're grateful to Kevin and the rest of the FOSSA team for sponsoring the Concept to Clinic project. Head over to fossa.io and check them out!

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

winners

Meet the winners of the AI for Advancing Instruction Challenge

Learn how the winners of the AIAI challenge leveraged multimodal classroom data to identify instructional activities and classroom discourse content.

case studies

Automating wildlife monitoring with Zamba & Zamba Cloud

DrivenData partnered with conservation researchers to create Zamba, an open-source machine learning solution that helps wildlife researchers process camera trap footage, reducing months of manual review to hours of automated analysis.

community

Community Spotlight: Paola Ruiz, Néstor González, Daniel Crovo

The Community Spotlight features fantastic members from our DrivenData community. Three members of the IGCPHARMA team, Paola Ruiz, Néstor González, and Daniel Crovo talk to us about data science, drug discovery, diverse databases and more!

community

Community Spotlight: Kirill Brodt

The Community Spotlight features fantastic members from our DrivenData community. Kirill Brodt, a researcher in computer graphics at the University of Montreal, talks animation, pose estimation, and data science challenges.

case studies

Jump-starting data infrastructure and in-house data expertise

DrivenData designed and built a data warehouse to centralize, organize, and visualize data across CodePath's operations. Our team also provided technical hiring assistance to find the right talent to carry the work forward.

case studies

A production application to support survivors of human trafficking

DrivenData developed Freedom Lifemap, a digital tool designed to support survivors of human trafficking on their journey toward reintegration and independence.

insights

Life beyond the leaderboard

What happens to winning solutions after a machine learning competition?

insights

(Tech) Infrastructure Week for the Nonprofit Sector

Reflections on how to build data and AI infrastructure in the social sector that serves the needs of nonprofits and their beneficiaries.

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

insights

AI sauce on everything: Reflections on ASU+GSV 2025

Data, evaltuation, product iteration, and public goods: reflections on the ASU+GSV Summit 2025.

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

case studies

Crowdsourcing solutions for AI-assisted early literacy screening

DrivenData ran a machine learning competition to develop models for scoring audio recordings from literacy screener exercises completed by students in kindergarten through 3rd grade.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

case studies

Mapping agricultural trends in Yemen during crisis

DrivenData partnered with The World Bank to use machine learning and remote sensing data to track agricultural changes across Yemen from 2019-2024, providing critical insights for food security planning in a conflict-affected region.

case studies

Making higher education data more accessible

DrivenData partnered with Science for America to develop scipeds, an open source Python library and interactive data visualization platform designed to simplify the analysis of U.S. higher education data from IPEDS and to illuminate trends and disparities in STEM education.

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.