Using Guided Transfer Learning to Predispose AI Agent to Learn Efficiently from Small RNA-sequencing Datasets

17 Nov 2023 · Kevin Li, Danko Nikolić, Vjekoslav Nikolić, Davor Andrić, Lauren M. Sanders, Sylvain V. Costes ·

Given the increasing availability of RNA-seq data and its complex and heterogeneous nature, there has been growing interest in applying AI/machine learning methodologies to work with such data modalities. However, because omics data is characterized by high dimensionality and low sample size (HDLSS), current attempts at integrating AI in this domain require significant human guidance and expertise to mitigate overfitting. In this work we look at how transfer learning can be improved to learn from small RNA-seq sample sizes without significant human interference. The strategy is to gain general prior knowledge about a particular domain of data (e.g. RNA-seq data) by pre-training on a general task with a large aggregate of data, then fine-tuning to various specific, downstream target tasks in the same domain. Because previous attempts have shown traditional transfer learning failing on HLDSS, we propose to improve performance by using Guided Transfer Learning (GTL). Collaborating with Robots Go Mental, the AI we deploy here not only learns good initial parameters during pre-training, but also learns inductive biases that affect how the AI learns downstream tasks. In this approach, we first pre-trained on recount3 data, a collection of over 400,000 mouse RNA-seq samples sourced from thousands of individual studies. With such a large collection, patterns of expression between the ~30,000 genes in mammalian systems were pre-determined. Such patterns were sufficient for the pre-trained AI agent to efficiently learn new downstream tasks involving RNA-seq datasets with very low sample sizes and performed notably better on few-shot learning tasks compared to the same model without pre-training.

PDF Abstract