This content originally appeared on HackerNoon and was authored by YAML
:::info Authors:
(1) Daniele Malitesta, Politecnico di Bari, Italy and daniele.malitesta@poliba.it with Corresponding authors: Daniele Malitesta (daniele.malitesta@poliba.it) and Giuseppe Gassi (g.gassi@studenti.poliba.it);
(2) Giuseppe Gassi, Politecnico di Bari, Italy and g.gassi@studenti.poliba.it with Corresponding authors: Daniele Malitesta (daniele.malitesta@poliba.it) and Giuseppe Gassi (g.gassi@studenti.poliba.it);
(3) Claudio Pomo, Politecnico di Bari, Italy and claudio.pomo@poliba.it;
(4) Tommaso Di Noia, Politecnico di Bari, Italy and tommaso.dinoia@poliba.it.
:::
Abstract and 1 Introduction and Motivation
2 Architecture and 2.1 Dataset
5 Demonstrations and 5.1 Demo 1: visual + textual items features
5.2 Demo 2: audio + textual items features
5.3 Demo 3: textual items/interactions features 6
Conclusion and Future Work, Acknowledgments and References
6 CONCLUSION AND FUTURE WORK
In this paper we propose Ducho, a framework for extracting highlevel features for multimodal-aware recommendation. Our main purpose is to provide a unified and shared tool to support practitioners and researchers in processing and extracting multimodal features used as side information in recommender systems. Concretely, Ducho involves three main modules: Dataset, Extractor, and Runner. The multimodal extraction pipeline can be highly customized through a Configuration component that allows the setup of the modalities involved (i.e., audio, visual, textual), the sources of multimodal information (i.e., items and/or user-item interactions), and the pre-trained models along with their main extraction parameters. To show how Ducho works in different scenarios and settings, we propose three demos accounting for the extraction of (i) visual/textual items features, (ii) audio/textual items features, and (iii) textual items/interactions features. They can be run locally, on Docker (as we also dockerize Ducho), and on Google Colab. As future directions, we plan to: (i) adopt all available backends (i.e., TensorFlow, PyTorch, and Transformers) to extract features for all modalities; (ii) implement a general extraction model interface allowing the users to follow the same naming/indexing scheme for all pre-trained models and their extraction layers; (iii) integrate the extraction of low-level multimodal features.
ACKNOWLEDGMENTS
This work was partially supported by the following projects: Secure Safe Apulia, MISE CUP: I14E20000020001 CTEMT - Casa delle Tecnologie Emergenti Comune di Matera, CTFINCONSIII, OVS Fashion Retail Reloaded, LUTECH DIGITALE 4.0, KOINÈ.
REFERENCES
[1] Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, Antonio Ferrara, Daniele Malitesta, and Claudio Pomo. 2022. Reshaping Graph Recommendation with Edge Graph Collaborative Filtering and Customer Reviews. In DL4SR@CIKM (CEUR Workshop Proceedings, Vol. 3317). CEUR-WS.org.
\ [2] Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2021. A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems. In CVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967.
\ [3] Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2022. Leveraging Content-Style Item Representation for Visual Recommendation. In ECIR (2) (Lecture Notes in Computer Science, Vol. 13186). Springer, 84–92.
\ [4] Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, and Tommaso Di Noia. 2023. Disentangling the Performance Puzzle of Multimodal-aware Recommender Systems. In EvalRS@KDD (CEUR Workshop Proceedings, Vol. 3450). CEUR-WS.org.
\ [5] Weiqing Min, Shuqiang Jiang, and Ramesh C. Jain. 2020. Food Recommendation: Framework, Existing Solutions, and Challenges. IEEE Trans. Multim. 22, 10 (2020), 2659–2671.
\ [6] Sergio Oramas, Oriol Nieto, Mohamed Sordo, and Xavier Serra. 2017. A Deep Multimodal Approach for Cold-start Music Recommendation. In DLRS@RecSys. ACM, 32–37.
\ [7] Aghiles Salah, Quoc-Tuan Truong, and Hady W. Lauw. 2020. Cornac: A Comparative Framework for Multimodal Recommender Systems. J. Mach. Learn. Res. 21 (2020), 95:1–95:5.
\ [8] Zixuan Yi, Xi Wang, Iadh Ounis, and Craig MacDonald. 2022. Multi-modal Graph Contrastive Learning for Micro-video Recommendation. In SIGIR. ACM, 1807– 1811.
\
:::info This paper is available on arxiv under CC BY 4.0 DEED license.
:::
\
This content originally appeared on HackerNoon and was authored by YAML

YAML | Sciencx (2025-02-16T20:10:36+00:00) Ducho: A Unified Framework for Multimodal Feature Extraction in AI-Powered Recommendations. Retrieved from https://www.scien.cx/2025/02/16/ducho-a-unified-framework-for-multimodal-feature-extraction-in-ai-powered-recommendations/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.