Data Entry: Please note that the research database will be replaced by UNIverse by the end of October 2023. Please enter your data into the system https://universe-intern.unibas.ch. Thanks

Login for users with Unibas email account...

Login for registered users without Unibas email account...

 
Embedding-based alignment: combining protein language models and alignment approaches to detect structural similarities in the twilight-zone
Discussion paper / Internet publication
 
ID 4665271
Digital Object Identifier DOI 10.1101/2022.12.13.520313
Author(s) Pantolini, Lorenzo; Studer, Gabriel; Pereira, Joana; Durairaj, Janani; Schwede, Torsten
Author(s) at UniBasel Pantolini, Lorenzo
Studer, Gabriel
Durairaj, Janani
Schwede, Torsten
Year 2022
Month and day 12-20
Title Embedding-based alignment: combining protein language models and alignment approaches to detect structural similarities in the twilight-zone
Pages 9
Publisher / Institution Cold Spring Harbor Laboratory
Abstract Language models are now routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful tools in the bioinformatics field. Protein language models (pLMs) generate high dimensional embeddings on a per-residue level and encode the "semantic meaning" of each individual amino acid in the context of the full protein sequence. Multiple works use these representations as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA), and show how these capture structural similarities even in the twilight zone, outperforming both classical sequence-based scores and other approaches based on protein language models. The method shows excellent accuracy despite the absence of training and parameter optimization. We expect that the association of pLMs and alignment methods will soon rise in popularity, helping the detection of relationships between proteins in the twilight-zone.
edoc-URL https://edoc.unibas.ch/94399/
Full Text on edoc Available
ISI-Number PPRN:35848118
 
   

MCSS v5.8 PRO. 0.369 sec, queries - 0.000 sec ©Universität Basel  |  Impressum   |    
09/05/2024