American Society of Mechanical Engineers

Times are displayed in (UTC-05:00) Central Time (US & Canada) Change

Session: Government Agency Student Posters

Paper Number: 173035

Memory-Efficient Replay for Domain-Incremental Gesture Recognition in Human–robot Interaction

Natural, reliable hand-gesture communication is a cornerstone of next-generation human-robot interaction (HRI), where robots must seamlessly interpret a human user’s intent while operating in visually diverse workplaces such as factory floors, warehouses, or open-air tarmacs. In practice, these settings exhibit frequent domain shifts, such as changes in lighting, background clutter, camera viewpoints, and individual motion styles, that systematically challenge a model’s accuracy and trigger catastrophic forgetting whenever it is simply fine-tuned on new data. To address this challenge, we address gesture recognition as a domain-incremental learning (DIL) problem and introduce a replay-based strategy that achieves high adaptability without compromising prior knowledge. Existing rehearsal and regularization methods either require large buffers or still forget earlier domains. Our framework combines three technical points. (i) Memory-efficient exemplar replay: instead of archiving full videos, we compress each gesture video into an abstract latent embedding with a frozen encoder, reducing memory footprint by more than 75% while keeping corresponding data. (ii) Clustering-driven sample selection: K-Means in the latent space maintains a balanced, diverse buffer of just 25% of each domain’s samples, chosen by proximity to cluster centroids, so that salient variations, rather than merely common ones, are preserved. (iii) Balanced rehearsal: every gradient step draws an equal number of exemplars from the previous domain and new samples from the incoming domain, ensuring plasticity toward the new domain while preserving knowledge of earlier experience. Together, these strategies combine a lightweight domain incremental learning framework that modifies only the task-specific learning module by keeping the encoder frozen for abstract latent embedding generation. We evaluate the method on a two-stage benchmark that mimics realistic deployment: Stage 1 trains on 2,076 RGB and skeleton videos from our in-house air-marshalling data (domain d_t-1); Stage 2 adapts to 5,364 videos of the same six gestures drawn from the public NATOPS set (domain d_t), whose different recording conditions create a diverse domain shift. A model trained only on d_t-1 attains 87% accuracy on its home domain but collapses to 24% on the new domain (NATOPS) d_t. Simple fine-tuning reverses the pattern 74% on d_t but only 26% on d_t-1, revealing a 61% knowledge loss. In contrast, our replay-based DIL recovers 74% on the old domain and pushes performance to 91% on the new one; the combined-domain accuracy improves from the fine-tuning baseline of 66% to 89% and measured forgetting drops to a modest 13%. Average and harmonic-mean scores likewise improve to 82.5% and 81.6%, respectively, confirming a better stability–plasticity trade-off. These results demonstrate that the proposed exemplar-replay pipeline delivers robust, adaptable gesture recognition tailored to dynamic HRI scenarios, all while respecting an efficient memory footprint. Future work will integrate conformal prediction-based new domain sample detection and extend the method to multi-domain scenarios to further enhance trustworthiness in long-term robot deployment.

Presenting Author: Kanchon Kanti Podder Kennesaw State University

Presenting Author Biography: Kanchon Kanti Podder is a Ph.D. candidate in the Department of Electrical and Computer Engineering at Kennesaw State University, where he is part of the Embodied Intelligence Lab and serves as a Graduate Research Assistant. His research centers on lifelong robot learning, computer vision, and deep learning, with a particular focus on continual hand-gesture and sign-language recognition for human-robot interaction. Before joining KSU, he earned an M.Sc. in Biomedical Physics & Technology from the University of Dhaka and a B.Sc. in Electrical & Electronic Engineering from Chittagong University of Engineering & Technology. He has co-authored more than 18 peer-reviewed publications spanning robotic perception, medical imaging, and sign-language technology. He has also served as a Research Assistant with Qatar University’s Machine Learning Group and collaborates widely on AI tools that expand accessibility for people with disabilities. His long-term goal is to create trustworthy, adaptive robots that can collaborate with people in dynamic, real-world environments.

Authors:

Kanchon Kanti Podder Kennesaw State University
Jian Zhang Kennesaw State University

Memory-Efficient Replay for Domain-Incremental Gesture Recognition in Human–robot Interaction

Paper Type

Government Agency Student Poster Presentation