Once seen merely as intermediaries from DNA to protein, RNA is now recognized as dynamic, structurally sophisticated molecules that regulate gene expression and complex cellular processes. Their functional versatility depends on precise three-dimensional folding, making accurate RNA structure modelling crucial for biology and medicine, and positioning RNA engineering as a key driver of biotechnological innovation. Today, generative artificial intelligence (AI) models offer powerful tools for designing RNA sequences, as DNA language models trained on large genomic datasets show promise for predicting gene regulation and RNA structure. However, while the rise of these models feels revolutionary, such advances in generative RNA biology build on decades of prior research, long before transformers entered the scene.
In the early 1990s, the introduction of stochastic context-free grammars (SCFGs) and covariance models was a landmark transition in RNA sequence analysis. More specifically, these models provided rigorous probabilistic frameworks to capture the structural grammar of RNA, rather than treating RNA as linear strings. The foundational formalization of this approach was published in 1994 by Sean R. Eddy and Richard Durbin in a paper titled ‘RNA sequence analysis using covariance models’. In this paper, the authors introduced covariance models as a class of profile SCFGs that turn a multiple alignment of an RNA family into a probabilistic model of sequence conservation and consensus secondary structure. Their key innovation was capturing covariant substitutions in base-paired positions, enabling structural homology searches far beyond what sequence comparison alone could achieve.
Comments (0)