Weakly supervised bird-flock counting in wetlands based on multimodal optical image perception
Published 19 June, 2025
Wetland avifauna serves as crucial bioindicators for ecosystem health assessment while its population monitoring of wetland birds represents a critical component in wetland management and conservation. However, traditional counting methods, such as point counting and line transects, are time-consuming, costly, and prone to human error. Optical image-based bird counting makes large-scale bird counting tasks possible, but target detection and accurate counting remain challenging in complex environmental conditions.
To address these challenges, a team of researchers in China presents an annotation-free avian population estimation approach that integrates optical characteristics with visual semantics, utilizing quantitative annotations to achieve weakly supervised counting while significantly reducing labeling costs.
The study is published in the KeAi journal Watershed Ecology and the Environment.
“Building upon enhanced optical image features, we constructed a multimodal perception model incorporating learnable feature adapters,” shares corresponding author Chang Liu, professor at the Institute of Applied Mathematics, Beijing University of Information Science and Technology. “The model employs visual prompts to focus on counting-relevant features and utilizes residual connections to address challenges posed by pose variations and complex backgrounds.”
The count regression problem was transformed into a classification task by embedding ordered numerical sequences (e.g., “0 birds”, “5 birds”, … “100+ birds”) as semantic category labels. The text template "There are [class] birds in the picture" lead the model aligns numerical semantics from text with image features, enabling accurate counting without the need for explicit object localization. In addition, to handle multi-scale variations in bird flocks, the researchers designed a cross-scale information interaction module that propagates visual prompts across different feature scales, generating semantically rich fused representations.
"We compiled and released the Wetland-Bird-Count, a novel optical image dataset specifically designed for coastal wetland avian population assessment of the Yellow River Delta, filling a critical gap in ecological monitoring resources," adds Liu. “Experimental results on the Wetland-Bird-Count dataset, which contains optical images from coastal wetlands in the Yellow River Delta, show that the proposed method achieves a MAE of 45.2 and an MSE of 54.2, outperforming existing weakly supervised and unsupervised methods and achieving comparable results to fully supervised methods.”
The study verifies that the weakly supervised cluster counting using optical image visual cues can improve the accuracy of bird flock counting under lightweight annotation, providing a reliable quantitative analysis tool for optical image ecological monitoring.

Contact author: Chang Liu, Institute of Applied Mathematics, Beijing Information Science and Technology University, Beijing, China, liu.chang.cn@ieee.org
Funder: This research was supported by the National Natural Science Foundation of China: No.61931003, 62171044.
Conflict of interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
See the article: FENG S, LYU M, HAN X, et al. Weakly supervised bird-flock counting in wetlands based on multimodal optical image perception. Watershed Ecology and the Environment, 2025.https://doi.org/10.1016/j.wsee.2025.05.006