Andrea Bianchi, Antinisca Di Marco, Francesca Marzi, Giovanni Stilo, Cristina Pellegrini, Stefano Masi, Alessandro Mengozzi, Agostino Virdis, Marco Salvatore Nobile, Marta Simeoni. "Trustworthy Machine Learning Predictions to Support Clinical Research and Decisions." 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), L'Aquila, Italy, 2023.

Nowadays, physicians have at their hands a huge amount of data produced by a large set of diagnostic and instrumental tests integrated with data obtained by high-throughput technologies. If such data were opportunely linked and analysed, they might be used to strengthen predictions, so that to improve the prevention and the time-to-diagnosis, reduce the costs of the health system, and bring out hidden knowledge. Machine learning is the principal technique used nowadays to leverage data and gain useful information. However, it has led to various challenges, such as improving the interpretability and explainability of the employed predictive models and integrating expert knowledge into the final system. Solving those challenges is of paramount importance to enhance the trust of both clinicians and patients in the system predictions. To solve the aforementioned issues, in this paper we propose a software workflow able to cope with the trustworthiness aspects of machine learning models and considering a multitude of heterogeneous data and models.

A. Bianchi, A. Di Marco and C. Pellegrini, "Comparing HISAT and STAR-based pipelines for RNA-Seq Data Analysis: a real experience," 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), L'Aquila, Italy, 2023.

One of the first step in RNA-Sequencing (RNA-Seq) data analysis consists of aligning (Next Generation Sequencing) reads to a reference genome. In literature, there are several tools implemented by practitioners and researchers for the alignment step. However, two tools are the de-facto-standard used by bioinformatics researchers in their pipelines: HISAT (version 2) and STAR (version 2). The aim of this study is to determine the impact of the alignment tool on the RNA -Seq analysis in terms of biological relevance of the results and computational time. The two implemented pipelines return different results on the biological side. This is due to assumptions the used tools made and to the specific characteristics of the underlying (statistical) models. The study provides valuable insights for researchers interested in optimizing their RNA-Seq pipelines and making informed decisions about which pipeline to use. As lesson learned, we suggest bioinformatics researchers to use more pipelines when make experiments to reduce the prediction errors induced by assumption of a specific tool or method.