Automated Active Learning for Large Scale Ecommerce Product Categorization
Rizky Agung Dwi Putranto, Santosh Yadaw, Diamond Ravi, and 3 more authors
2023
IP.com Technical Disclosure
Large ecommerce marketplaces, such as Tokopedia, have hundreds of millions of products listed and uploaded frequently by millions of merchants. Therefore it is essential to create reliable categorisation engines that are as automated as possible, scale to millions of listings, and attain a high classification accuracy. Several issues like taxonomy changes, poor data quality, class imbalances and data drift issues cause drops in model performance. Frequent data labelling and model training activities are required to maintain model accuracy. Active learning can be employed to optimise the data labelling activities. This paper discusses the implementation of a novel automated active learning loop that can monitor model performance, use active learning for data collection, integrate with data labelling tools and trigger model re-trainings that can improve productivity by multi-folds.