The challenge¶
Inland water bodies provide a variety of critical services for both human and aquatic life, including drinking water, recreational and economic opportunities, and marine habitats. A significant challenge water quality managers face is the formation of harmful algal blooms (HABs). One of the major types of HABs is cyanobacteria. HABs produce toxins that are poisonous to humans and their pets, and threaten marine ecosystems by blocking sunlight and oxygen.
While there are established methods for using satellite imagery to detect cyanobacteria in larger water bodies like oceans, detection in small inland lakes and reservoirs remains a challenge. Manual water sampling is accurate, but too time intensive and difficult to perform continuously. Satellite data and other remote sensing data can enable faster, more comprehensive monitoring.
The approach¶
DrivenData hosted the Tick Tick Bloom challenge to rapidly test a wide variety of possible data sources, model architectures, and features. Over 1,300 participants competed to detect cyanobacteria blooms in small, inland water bodies using publicly available satellite, climate, and elevation data.
The competition provided critical research code and a proof of concept for detection. DrivenData carried the results forward in a few ways to transform those results into an actionable tool in the hands of end users.
- User interviews: Conducted human-centered design (HCD) interviews to better understand how to optimally address on-the-ground user needs. Undertanding current workflows is crucial to technical decisions like prediction format, most relevant performance metrics, and compute constraints.
- Model experimentation: Combined and iterated on the most useful pieces from competition-winning models to determine which approaches were the most robust, accurate, and generalizable outside of the competition setting.
- Code organization: Simplified and restructured code to create a more efficient, configurable, and deployable pipeline.
The results¶
DrivenData developed an open source tool, CyFi (Cyanobacteria Finder), which enables satellite-based detection of HAB outbreaks in lakes, reservoirs, and rivers. In a benchmark comparison, we found that CyFi performs at least as well as Sentinel-3 based tools but captures ten times the number of lakes across the U.S. Our work helps water quality managers better allocate resources for in situ sampling, make more informed decisions around when to issue public health warnings, and ultimately keep the human and marine life that rely on small inland water bodies safe and healthy.

The landing page for CyFi's documentation
CyFi is written to reflect best practices in open and reproducible data science, and anyone can use the package or contribute to the code on Github. A paper published in SciPy proceedings documents the full process of creating CyFi in detail.
As part of the Tick Tick Bloom competition, DrivenData aggregated manual cyanobacteria labels from 14 data providers across the U.S., creating a unique, nationally representative ground truth dataset. The full dataset of 23,570 measurements is now publicly available for anyone to learn from.
Partners¶
The Tick Tick Bloom competition was created on behalf of NASA, with collaboration from NOOA, EPA, USGS, DOD's Defense Innovation Unit, Berkeley AI Research, and Microsoft AI for Earth.