Universitat de Lleida
    • English
    • català
    • español
  • English 
    • English
    • català
    • español
  • Login
Repositori Obert UdL
View Item 
  •   Home
  • Recerca
  • Informàtica i Enginyeria Industrial
  • Articles publicats (Informàtica i Enginyeria Industrial)
  • View Item
  •   Home
  • Recerca
  • Informàtica i Enginyeria Industrial
  • Articles publicats (Informàtica i Enginyeria Industrial)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Optimization of Consistency-Based Multiple Sequence Alignment using Big Data technologies

Thumbnail
View/Open
Postprint (346.4Kb)
Issue date
2019
Author
Lladós Segura, Jordi
Cores Prado, Fernando
Guirado Fernández, Fernando
Suggested citation
Lladós Segura, Jordi; Cores Prado, Fernando; Guirado Fernández, Fernando; . (2019) . Optimization of Consistency-Based Multiple Sequence Alignment using Big Data technologies. The Journal of Supercomputing, 2019, vol. 75, p. 1310–1322. https://doi.org/10.1007/s11227-018-2424-4.
Impact


Web of Science logo    citations in Web of Science

Scopus logo    citations in Scopus

Google Scholar logo  Google Scholar
Share
Export to Mendeley
Metadata
Show full item record
Abstract
With the advent of new high-throughput next-generation sequencing technologies, the volume of genetic data processed has increased significantly. It is becoming essential for these applications to achieve large-scale alignments with thousands of sequences or even whole genomes. However, all current MSA tools have exhibited scalability issues when the number of sequences increases. The main drawback of these methods is that errors made in early pairwise alignments are propagated to the nal result, a ecting the accuracy of the global alignment. The use of consistency information enables the nal result to be improved and makes it more stable from the accuracy point of view. However, such methods are severely limited by the memory required to store the consistency information. Authors in a previous work analyzed the structure and distribution of the data stored in the constraint library and demonstrated that it could be possible to reduce it without loosing accuracy and thus it is possible to increase the number of sequences to be aligned. However, the execution time for obtaining the constraint library for a bigger number of sequences also increases greatly. In the present paper, the authors apply Big Data technologies to take advantage of the high degree of parallelism provided by the MapReduce paradigm in order to reduce considerably the library calculation time. Moreover, Big Data infrastructure provides a distributed storage system to improve the library scalability and machine-learning algorithms to enhance the consistency selection policies.
URI
http://hdl.handle.net/10459.1/69309
DOI
https://doi.org/10.1007/s11227-018-2424-4
Is part of
The Journal of Supercomputing, 2019, vol. 75, p. 1310–1322
European research projects
Collections
  • Articles publicats (Informàtica i Enginyeria Industrial) [935]
  • Publicacions de projectes de recerca del Plan Nacional [2684]

Contact Us | Send Feedback | Legal Notice
© 2022 BiD. Universitat de Lleida
Metadata subjected to 
 

 

Browse

All of the repositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

Statistics

View Usage Statistics

D'interès

Política institucional d'accés obertDiposita les teves publicacionsDiposita dades de recercaSuport a la recerca

Contact Us | Send Feedback | Legal Notice
© 2022 BiD. Universitat de Lleida
Metadata subjected to