The organization¶
CABI's mission is to improve people’s lives worldwide by providing information and applying expertise to solve problems in agriculture and the environment. Plantwise is a global programme led by CABI that helps farmers lose less of what they grow to plant health problems by providing timely, appropriate, and actionable advice.
CABI has established a global plant clinic network, run by trained plant doctors, where farmers can find practical plant health advice. Farmers visit with samples of their crops, and plant doctors diagnose the problem and make science-based recommendations on ways to manage it. Plant doctors have access to the Plantwise Knowledge Bank, which includes diagnostic resources and best-practice pest management advice.
The challenge¶
Plant doctors use WhatsApp and Telegram chats to communicate. These messages contain valuable real-time information on plant health and crop issues. Yet without systematic, automated analysis, it is very difficult to identify places where farmers are receiving bad advice or to surface important patterns like emerging pests and trends in agricultural issues.
The approach¶
This project was focused on prototyping methods for entity extraction: the process of automatically identifying and digesting the agricultural units that users are discussing such as plants and pests.

Example hand labeled message containing a crop, pathogen, and many symptoms.
To accomplish this objective, the DrivenData team worked with CABI to:
- Collect data sources into an entity knowledgebase
- Create "gold labels" based on subject matter knowledge for evaluation
- Develop a baseline pattern matching model
- Train a statistical named entity recognition (NER) model
- Visualize results in interactive dashboards
- Hand off a reproducible pipeline
Two thousand chat messages were hand-labeled by the team to identify entities (crop
, pest
, chemical
, pathogen
, fungus
, and symptom
) to provide the corpus for evaluation.
The results¶
The prototyped NER model correctly identified 69% of entities in chat messages. In 93% of cases where entity text is correct, the label is too. The best performing categories were crops, fungus, and chemicals. A proof-of-concept comparison of automated extraction from chats with clinic records reflected the expected spike in fall armyworm mentions due to the outbreak in 2017.

Diagram of the named entity recognition (NER) pipeline.
This work demonstrates the ability to extract entities that can enable trend-level analysis from plant doctor messages and help guide interventions and early action.