Beyond “Junk DNA”: Re-exploring Pseudo gene Annotation and Functional Analysis with Artificial Intelligence and Machine Learning
Abstract
Pseudogenes, once regarded as nonfunctional genomic relics, are now recognized as important contributors to gene regulation, chromatin remodeling, transcriptional modulation, and disease-associated pathways. Traditional annotation pipelines—built primarily on heuristic mutation-based criteria and sequence similarity—frequently misclassify pseudogenes, overlooking subtle yet significant biological functions. Recent advances in artificial intelligence (AI) and machine learning (ML) have enabled the integration of multi-omics datasets, allowing for deeper and more accurate functional inference of pseudogenes. Deep learning architectures such as convolutional neural networks (CNNs), long short-term memory networks (LSTMs), transformers, and graph neural networks (GNNs) have demonstrated remarkable capability in modeling genomic sequences, identifying hidden open reading frames (ORFs), reconstructing regulatory networks, and predicting pseudogene-mediated effects on virulence and immunity. This review synthesizes the rapid developments in AI-driven pseudogene annotation, describes emerging multi-omics approaches, highlights the link between pseudogenes and virulence, and outlines future directions toward comprehensive, mechanistic pseudogene catalogs.

