Han, QiweiJustus, Adrien Robert2025-01-032025-01-032024-01-192024-01-19http://hdl.handle.net/10362/176971Using state-of-the-art models for text (BERT-based) and image (ResNet, VGG16, ViT) analysis, this study develops a multimodal approach that leverages the collective strengths of both domains. Our results show that the combination of knowledge from text and image domains leads to the best classification framework, which achieves a remarkable macro-F1 score of 98% at all levels. This innovative approach significantly improves classification accuracy and efficiency in e-commerce. In addition, the study explores self-supervised learning and introduces a detailed taxonomy that provides comprehensive insights. This research highlights the superiority of a synergistic multimodal strategy to improve product understanding.engSelf supervised learningUnsupervised learningTaxonomySimclrHierarchical clusteringNavigating e-commerce product taxonomy challenges and label ambiguities trough self- and unsupervised learningmaster thesis203681916