Ducho: A Unified Framework for Multimodal Feature Extraction in AI-Powered Recommendations
Abstract and 1 Introduction and Motivation
2 Architecture and 2.1 Dataset
2.2 Extractor
2.3 Runner
3 Extraction Pipeline
4 Ducho as Docker Application
5 Demonstrations and 5.1 Demo 1: visual + textual items features
5.2 Demo 2: audio + textual items features
5.3 Demo 3: textual items/interactions features 6
Conclusion and Future Work, Acknowledgments and References
6 CONCLUSION AND FUTURE WORK
In this paper we propose Ducho, a framework for extracting highlevel features for multimodal-aware recommendation. Our main purpose is to provide a unified and shared tool to support practitioners and researchers in processing and extracting multimodal features used as side information in recommender systems. Concretely, Ducho involves three main modules: Dataset, Extractor, and Runner. The multimodal extraction pipeline can be highly customized through a Configuration component that allows the setup of the modalities involved (i.e., audio, visual, textual), the sources of multimodal information (i.e., items and/or user-item interactions), and the pre-trained models along with their main extraction parameters. To show how Ducho works in different scenarios and settings, we propose three demos accounting for the extraction of (i) visual/textual items features, (ii) audio/textual items features, and (iii) textual items/interactions features. They can be run locally, on Docker (as we also dockerize Ducho), and on Google Colab. As future directions, we plan to: (i) adopt all available backends (i.e., TensorFlow, PyTorch, and Transformers) to extract features for all modalities; (ii) implement a general extraction model interface allowing the users to follow the same naming/indexing scheme for all pre-trained models and their extraction layers; (iii) integrate the extraction of low-level multimodal features.
ACKNOWLEDGMENTS
This work was partially supported by the following projects: Secure Safe Apulia, MISE CUP: I14E20000020001 CTEMT – Casa delle Tecnologie Emergenti Comune di Matera, CT_FINCONS_III, OVS Fashion Retail Reloaded, LUTECH DIGITALE 4.0, KOINÈ.