Data Entry: Please note that the research database will be replaced by UNIverse by the end of October 2023. Please enter your data into the system https://universe-intern.unibas.ch. Thanks

Login for users with Unibas email account...

Login for registered users without Unibas email account...

 
What is hidden in the darkness? Deep-learning assisted large-scale protein family curation uncovers novel protein families and folds
Discussion paper / Internet publication
 
ID 4665269
Digital Object Identifier DOI 10.1101/2023.03.14.532539
Author(s) Durairaj, Janani; Waterhouse, Andrew M.; Mets, Toomas; Brodiazhenko, Tetiana; Abdullah, Minhal; Studer, Gabriel; Akdel, Mehmet; Andreeva, Antonina; Bateman, Alex; Tenson, Tanel; Hauryliuk, Vasili; Schwede, Torsten; Pereira, Joana
Author(s) at UniBasel Schwede, Torsten
Durairaj, Janani
Waterhouse, Andrew
Studer, Gabriel
Year 2023
Month and day 03-19
Title What is hidden in the darkness? Deep-learning assisted large-scale protein family curation uncovers novel protein families and folds
Pages 23
Publisher / Institution Cold Spring Harbor Laboratory
Abstract Driven by the development and upscaling of fast genome sequencing and assembly pipelines, the number of protein-coding sequences deposited in public protein sequence databases is increasing exponentially. Recently, the dramatic success of deep learning-based approaches applied to protein structure prediction has done the same for protein structures. We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database. These models cover most of the catalogued natural proteins, including those difficult to annotate for function or putative biological role based on standard, homology-based approaches. In this work, we quantified how much of such "dark matter" of the natural protein universe was structurally illuminated by AlphaFold2 and modelled this diversity as an interactive sequence similarity network that can be navigated at https://uniprot3d.org/atlas/AFDB90v4 . In the process, we discovered multiple novel protein families by searching for novelties from sequence, structure, and semantic perspectives. We added a number of them to Pfam, and experimentally demonstrate that one of these belongs to a novel superfamily of toxin-antitoxin systems, TumE-TumA. This work highlights the role of large-scale, evolution-driven protein comparison efforts in combination with structural similarities, genomic context conservation, and deep-learning based function prediction tools for the identification of novel protein families, aiding not only annotation and classification efforts but also the curation and prioritisation of target proteins for experimental characterisation.
edoc-URL https://edoc.unibas.ch/94398/
Full Text on edoc Available
ISI-Number PPRN:46915313
 
   

MCSS v5.8 PRO. 0.364 sec, queries - 0.000 sec ©Universität Basel  |  Impressum   |    
10/05/2024