case studies

30 Aug 2022

Protecting endangered beluga whales with computer vision

DrivenData designed and hosted a computer vision challenge that produced state-of-the-art machine learning models to automatically identify and match individual endangered beluga whales from aerial photography.

The organizations¶

The Bureau of Ocean Energy Management and the National Marine Fisheries Service (NOAA Fisheries) are responsible for management and conservation of the United States' marine ecosystems. This includes monitoring Cook Inlet belugas, an endangered population of now fewer than 300 whales.

The challenge¶

In order to track the health of the Cook Inlet belugas, NOAA Fisheries conducts annual aerial photographic surveys when the whales collect around river mouths in the spring and summer months near Anchorage, Alaska. The photographs are then reviewed to identify individual whales based on their color, marks, scarring, and other physical features. This process is called photo-identification—a noninvasive technique for identifying and tracking individuals of a wild animal population over time.

Manual photo-identification is a slow and labor-intensive process. Given a new photo of an individual whale, reviewers need to carefully consider subtle physical features against those from a large database of past photos in order to determine which individual the new photo matches. A tool for making this process more efficient must not only handle this difficult computer vision challenge to accurately surface likely matches, but also must be interpretable so that the reviewer can quickly decide on which candidate photo is the right match.

The approach¶

DrivenData hosted the Where's Whale-do? challenge to rapidly experiment with different approaches to applying computer vision for this problem. Our data scientists worked closely with wildlife computer vision experts from Wild Me, the organization behind the data platform Flukebook that NOAA Fisheries uses, to design a sophisticated evaluation procedure that considered ten different scenarios of query and database image setups to ensure solutions would be robust and generalizable. The challenge also featured an explainability bonus round where top performers were invited to submit model explainability solutions for their models.

The results¶

The challenge received over 1,000 submissions over the course of two months. The winning solutions combined performant machine learning models with training techniques that had previous reported success in facial recognition tasks. Further analysis by challenge partners at Wild Me found that winners outperformed previous state-of-the-art solutions by 15-20% across key accuracy metrics.

Following the challenge, engineers at Wild Me adapted the techniques from the winning solutions into a new algorithm MIEW-ID, which they incorporated into the image analysis pipeline of their wildlife data platform software. The new functionality includes explanatory visualizations that show users the parts of the image the model considered influential, using the Grad-CAM technique that was produced by the challenge's visualization bonus track.

Grad-CAM heatmap — Example Grad-CAM explanation visualization from the competition-winning solution, produced in the challenge's Explainability Bonus Round

The impact has not been limited to conservation of beluga whales. The resulting system has been successfully cross-applied to a dozen other cetacean species like bottlenose dolphins, and has even seen success with terrestrial species like African lions and African leopards.

Our real-world impact

All projects

Partners: Max Planck Institute for Evolutionary Anthropology, Arcus Foundation, WILDLABS

Automating wildlife identification for research and conservation

Detected wildlife in images and videos—automatically and at scale—by building the winning algorithm from a DrivenData competition into an open source python package and a web application running models in the cloud.

Partners: CodePath

Data engineering from the ground up

Built data infrastructure to ingest, clean, integrate, and organize data across CodePath, created interactive dashboards for accurate monitoring of program trends, and provided trusted data expertise to identify and hire talent to carry the work forward.

Partners: The National Center for State Courts

Building a private LLM sandbox for NCSC

We worked with the National Center for State Courts to build an LLM chat sandbox for private usage. This sandbox allows users to experiment with LLM tools in a way that is safe, secure, and cost-effective, with specific use cases and prompts relevant to their work.

Partners: The World Bank, The Conflict and Environment Observatory

Identifying crop types using satellite imagery in Yemen

Used satellite imagery to identify crop extent, crop types and climate risks to agriculture in Yemen, informing World Bank development programs in the country after years of civil war.

Partners: Private sector, social sector

Building applied solutions with LLMs

Built solutions using LLMs for multiple real-world applications, across tasks including semantic search, summarization, named entity recognition, and multimodal analysis. Work has spanned research on state-of-the-art models tuned for specific use cases to production ready retrieval-augmented AI applications.

Partners: Bureau of Ocean Energy Management, NOAA Fisheries, Wild Me

Protecting endangered beluga whales with computer vision

Designed and administered a computer vision challenge that produced state-of-the-art machine learning models to identify and match individual endangered beluga whales from photo surveys.

Partners: EverFree

A production application to support survivors of human trafficking

Built the Freedom Lifemap platform, a digital tool designed to support survivors of human trafficking on their journey toward reintegration and independence

Partners: ReadNet

Crowdsourcing solutions for AI assisted early literacy screening

Ran a machine learning challenge to develop automatic scoring methods for audio clips from literacy screener exercises. Automated scoring can help teachers quickly and reliably identify children in need of early literacy intervention.

Partners: Science for America

Making higher education data more accessible

Created an open source Python library and interactive data visualization platform for analyzing U.S. higher education data and illuminating trends and disparities in STEM education.

Partners: IDEO.org

Illuminating mobile money experiences in Tanzania

Analyzed millions of mobile money records to uncover patterns in behavior, and then combined these insights with human-centered design to shape new approaches to delivering mobile money to low-income populations in Tanzania.

Partners: Insecurity Insight, Physicians for Human Rights

Tracking attacks on health care in Ukraine

Built a real-time, interactive map to visualize attacks on the Ukrainian health care system since the Russian invasion began in February of 2022. The map will support partner efforts to provide aid, hold aggressors accountable in court, and increase public awareness.

Partners: Wellcome

Addressing algorithmic bias in medical research

Conducted a literature review to understand the current state of bias identification & mitigation in mental health research, and synthesized recommended best practices from the field of machine learning.

Partners: CABI Plantwise

Mining chat messages with plant doctors using language models

Automated recognition of agricultural entities (such as crops, pests, diseases, and chemicals) in WhatsApp and Telegram messages among plant doctors, enabling new ways to surface emerging trends and improve science-based guidance for smallholder farmers.

Partners: NASA

Monitoring water quality from satellite imagery

Created an open-source package to detect harmful algal blooms using machine learning and satellite imagery. Included running a machine-learning competition, conducting end user interviews, and engineering a robust, deployable pipeline.

Partners: Data science company foundation

Matching students with schools where they are likely to succeed

Used machine learning to match students with higher education programs where they are more likely to get in and graduate based on their unique profile, with a focus on backgrounds traditionally less likely to attend college or apply to more competitive programs.

Partners: University of Maryland

Processing multimodal tutoring data

Built well-engineered data pipelines to extract machine learning features from audio, video and transcript data collected from online tutoring sessions, enabling a team at the University of Maryland to study how relationship-building affects student outcomes.

Partners: Fair Trade USA

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.

Protecting endangered beluga whales with computer vision

The organizations¶

The challenge¶

The approach¶

The results¶

Our real-world impact

Automating wildlife identification for research and conservation

Data engineering from the ground up

Building a private LLM sandbox for NCSC

Identifying crop types using satellite imagery in Yemen

Building applied solutions with LLMs

Protecting endangered beluga whales with computer vision

A production application to support survivors of human trafficking

Crowdsourcing solutions for AI assisted early literacy screening

Making higher education data more accessible

Illuminating mobile money experiences in Tanzania

Tracking attacks on health care in Ukraine

Addressing algorithmic bias in medical research

Mining chat messages with plant doctors using language models

Monitoring water quality from satellite imagery

Matching students with schools where they are likely to succeed

Processing multimodal tutoring data

Mapping fair trade products from source to shelf

Developing performance indicators and repayment models in off-grid solar

Modeling patient pathways through hospitals

Smart auto-tagging of K-12 school spending

Work with us to build a better world