Scientists from the Agency for Science, Technology and Research (A*STAR)’s Genome Institute of Singapore (GIS) have developed a new tool, named Bambu, which uses artificial intelligence to identify and characterize new genes, enabling an adaptable analysis across various species and samples. With a better understanding of which and how genes are expressed in samples, it provides a better understanding of how cells function.
Bambu is a long-read RNA sequencing tool that can be used in both clinical and research settings to discover how DNA encodes novel transcripts and quantifies them. This innovative tool is named after the bamboo plant, which has extremely long reeds that are analogous to the long reads that Bambu uses. A study detailing the methodology and evaluation of Bambu was published recently in Nature Methods.
The human genome, which comprises 3.2 billion base pairs, is dwarfed by the lungfish genome with 43 billion, and even more so by the Japanese flower, Pari japonica, with 149 billion base pairs. Despite a human’s relatively smaller genome, there are over 140,000 unique transcripts—and given the complexity of the body’s organs, life stages, and responses to perturbations such as diseases, it is estimated that there are many yet to be identified. This is where Bamabu comes in.
How does Bambu work?
Bambu employs a machine learning model to rank the likelihood of candidate transcripts representing biologically relevant products. It can identify new transcripts and quantify them with high precision and sensitivity, providing a more comprehensive understanding of an organism's genetic makeup.
This will allow researchers to identify new role players, such as genes, proteins, and other elements in their field of research and empower them to explore less-studied organisms. Furthermore, the discovery of new genes—especially from clinical samples—can lead to the identification of biomarkers for the early detection of diseases or as targets of therapeutics.
“It is fascinating to see that scientists are still discovering new genes even in genomes that have been studied for many years, such as the human or mouse genome. However, the key question is if these transcripts are relevant, or they could be artifacts. To address this, Bambu quantifies the probability that a transcript is real, making transcript and gene discovery much more reliable,” said Jonathan Göke, PhD, group leader of the laboratory of Computational Transcriptomics at A*STAR’s GIS and the corresponding author of the study. He adds, “By providing such a measure of confidence, Bambu can more reliably be applied to find new genes that play a role in human diseases such as cancer.”
An early release of Bambu has been benchmarked by independent preprint studies where it is shown to be a top performer among its contemporaries.
Andre Sim, PhD, a postdoctoral fellow at A*STAR’s GIS and co-first author of the study remarked, “Identifying new transcript models requires numerous decisions. Bambu simplifies this process using its machine learning model, making this task more accessible to the scientific community.”
- This press release was supported by the Agency for Science, Technology and Research (A*STAR)