Recovering accuracy methods for scalable consistency library
Lladós Segura, Jordi
MetadataShow full item record
Multiple sequence alignment (MSA) is crucial for high-throughput next generation sequencing applications. Large-scale alignments with thousands of sequences are necessary for these applications. However, the quality of the alignment of current MSA tools decreases sharply when the number of sequences grows to several thousand. This accuracy degradation can be mitigated using global consistency information as in the T-Coffee MSA-Tool, which implements a consistency library. However, consistency-based methods do not scale well because of the computational resources required to calculate and store the consistency information, which grows quadratically. In this paper, we propose an alternative method for building the consistency-library. To allow unlimited scalability, consistency information must be discarded to avoid exceeding the environment memory. Our first approach deals with the memory limitation by identifying the most important entries, which provide better consistency. This method is able to achieve scalability, although there is a negative impact on accuracy. The second proposal, aims to reduce this degradation of accuracy, with three different methods presented to attain a better alignment.
Is part ofJournal of Supercomputing, 2015, vol. 71, núm. 5, p. 1833-1845
European research projects
Except where otherwise noted, this item's license is described as cc-by (c) Lladós Segura, Jordi et al., 2015
Showing items related by title, author, creator and subject.
High Performance computing improvements on bioinformatics consistency-based multiple sequence alignment tools Orobitg Cortada, Miquel; Guirado Fernández, Fernando; Cores Prado, Fernando; Lladós Segura, Jordi; Notredame, Cedric (Elsevier, 2014-10-08)Multiple Sequence Alignment (MSA) is essential for a wide range of applications in Bioinformatics. Traditionally, the alignment accuracy was the main metric used to evaluate the goodness of MSA tools. However, with the ...
Orobitg Cortada, Miquel; Guirado Fernández, Fernando; Notredame, Cedric; Cores Prado, Fernando (Springer Verlag, 2011)Multiple Sequence Alignment (MSA) constitutes an extremely powerful tool for important biological applications such as phylogenetic analysis, identification of conserved motifs and domains and structure prediction. In ...
Lladós Segura, Jordi; Cores Prado, Fernando; Guirado Fernández, Fernando (Springer, 2019)With the advent of new high-throughput next-generation sequencing technologies, the volume of genetic data processed has increased significantly. It is becoming essential for these applications to achieve large-scale alignments ...