A New Way to Extract Features for Smarter AI Recommendations

This content originally appeared on HackerNoon and was authored by YAML

:::info Authors:

(1) Daniele Malitesta, Politecnico di Bari, Italy and daniele.malitesta@poliba.it with Corresponding authors: Daniele Malitesta (daniele.malitesta@poliba.it) and Giuseppe Gassi (g.gassi@studenti.poliba.it);

(2) Giuseppe Gassi, Politecnico di Bari, Italy and g.gassi@studenti.poliba.it with Corresponding authors: Daniele Malitesta (daniele.malitesta@poliba.it) and Giuseppe Gassi (g.gassi@studenti.poliba.it);

(3) Claudio Pomo, Politecnico di Bari, Italy and claudio.pomo@poliba.it;

(4) Tommaso Di Noia, Politecnico di Bari, Italy and tommaso.dinoia@poliba.it.

:::

Abstract and 1 Introduction and Motivation

2 Architecture and 2.1 Dataset

2.2 Extractor

2.3 Runner

3 Extraction Pipeline

4 Ducho as Docker Application

5 Demonstrations and 5.1 Demo 1: visual + textual items features

5.2 Demo 2: audio + textual items features

5.3 Demo 3: textual items/interactions features 6

Conclusion and Future Work, Acknowledgments and References

2 ARCHITECTURE

Ducho’s architecture is built upon three main modules, namely, Dataset, Extractor, and Runner, where the first two modules provide different implementations depending on the specific modality (i.e., audio, visual, textual) taken into account. We also remind the Configuration one among the other auxiliary components. The architecture is designed to be highly modular, possibly integrating new modules or customizing the existing ones. In the following, we dive deep into each outlined module/component.

2.1 Dataset

The Dataset module manages the loading and processing of the input data provided by the user. Starting from a general shared schema for all available modalities, this module provides three separate implementations: Audio, Visual, and Textual Datasets. As a common approach in the literature, the Audio and Visual Datasets require the path to the folder from which image/audio files are loaded, while the Textual Dataset works through a tsv file mapping all the textual characteristics to the inputs.

\ Noteworthy, and differently from other existing solutions, Ducho may handle each modality in two fashions, depending on whether the specific modality is describing either the items (e.g., product descriptions) or the interactions among users and items (e.g., reviews [1]). Concretely, while items are mapped to their unique ids (extracted from the filename or the tsv file), interactions are mapped to the user-item pair (extracted from the tsv file) they refer to. Although the pre-processing and extraction phases do not change at items- and interactions-level (see later), we believe this schema may perfectly suit novel multimodal-aware recommender systems with modalities describing every type of input source (even users).

\ Another important task for the Dataset module is to handle the pre-processing stage of data input. Depending on the specific modality involved, Ducho offers the possibility to:

\ • audio: load the input audio by extracting the waveform and sample rate, and re-sample it according to the sample rate the pre-trained model was trained on;

\ • visual: convert input images into RGB and resize/normalize them to align with the pre-trained extraction model;

\ • textual: (optionally) clean the input texts to remove or modify noisy textual patterns such as punctuation and digits

\ After the extraction phase (see later), the Dataset module is finally in charge of saving the generated multimodal features into numpy array format following the file naming scheme from the previous mapping.

2.2 Extractor

The Extractor module builds an extraction model from a pretrained network and works on each loaded/pre-processed input sample to extract its multimodal features. In a similar manner to the Dataset module, the Extractor provides three different implementations for each modality, namely, the Audio, Visual, and Textual Extractors. Ducho exposes a wide range of pre-trained models from three main backends: TensorFlow, PyTorch, and Transformers. The following modality/backend combinations are currently available:

\ • audio: PyTorch (Torchaudio) and Transformers;

\ • visual: Tensorflow and PyTorch (Torchvision);

\ • textual: Transformers (and SentenceTransformers).

\ To perform the feature extraction, Ducho takes as input the (list of) extraction layers for any pre-trained model. Since each backend handles the extraction of hidden layers within a network differently, we follow the guidelines provided in the official documentations, assuming that the user will follow the same naming/indexing scheme of the layers and know the structure of the selected pre-trained model in advance. The interested reader may refer to the README[2] under the config/ folder on GitHub for an exhaustive explanation on how to set the extraction layer in each modality/backend setting.

\ Finally, for the textual case, the user can also specify the specific task the pre-trained model should be trained on (e.g., sentiment analysis), as each pre-trained network may come with different versions depending on the training strategy.

2.3 Runner

The Runner module is the orchestrator of Ducho, whose purpose is to instantiate, call, and manage all the described modules. With its API methods, this module can trigger the complete extraction pipeline (see later) of one single modality or all the modalities involved simultaneously

\ The Runner module is conveniently customized through an auxiliary Configuration component which stores and exposes all parameters to configure the extraction pipeline. Even if a default configuration is already made available for the user’s sake, Ducho allows to override some (or all) its parameters through an external configuration file (in YAML format) and/or key-value pairs as input arguments if running the scripts from the command line. Once again, we suggest the readers refer to the README under the config/ folder on GitHub to understand the general schema of the YAML configuration file.

\ \

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

This content originally appeared on HackerNoon and was authored by YAML

Print Share Comment Cite Upload Translate Updates

APA

YAML | Sciencx (2025-02-16T20:10:07+00:00) A New Way to Extract Features for Smarter AI Recommendations. Retrieved from https://www.scien.cx/2025/02/16/a-new-way-to-extract-features-for-smarter-ai-recommendations/

MLA

" » A New Way to Extract Features for Smarter AI Recommendations." YAML | Sciencx - Sunday February 16, 2025, https://www.scien.cx/2025/02/16/a-new-way-to-extract-features-for-smarter-ai-recommendations/

HARVARD

YAML | Sciencx Sunday February 16, 2025 » A New Way to Extract Features for Smarter AI Recommendations., viewed ,<https://www.scien.cx/2025/02/16/a-new-way-to-extract-features-for-smarter-ai-recommendations/>

VANCOUVER

YAML | Sciencx - » A New Way to Extract Features for Smarter AI Recommendations. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/02/16/a-new-way-to-extract-features-for-smarter-ai-recommendations/

CHICAGO

" » A New Way to Extract Features for Smarter AI Recommendations." YAML | Sciencx - Accessed . https://www.scien.cx/2025/02/16/a-new-way-to-extract-features-for-smarter-ai-recommendations/

IEEE

" » A New Way to Extract Features for Smarter AI Recommendations." YAML | Sciencx [Online]. Available: https://www.scien.cx/2025/02/16/a-new-way-to-extract-features-for-smarter-ai-recommendations/. [Accessed: ]

rf:citation

» A New Way to Extract Features for Smarter AI Recommendations | YAML | Sciencx | https://www.scien.cx/2025/02/16/a-new-way-to-extract-features-for-smarter-ai-recommendations/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

2 ARCHITECTURE

2.1 Dataset

2.2 Extractor

2.3 Runner

Related Posts