Scalable Consistency in T-Coffee Through Apache Spark and Cassandra Database

dc.contributor.authorLladós Segura, Jordi
dc.contributor.authorCores Prado, Fernando
dc.contributor.authorGuirado Fernández, Fernando
dc.date.accessioned2020-07-14T08:45:03Z
dc.date.available2020-07-14T08:45:03Z
dc.date.issued2018
dc.description.abstractNext-generation sequencing, also known as high-throughput sequencing, has increased the volume of genetic data processed by sequencers. In the bioinformatic scientific area, highly rated multiple sequence alignment tools, such as MAFFT, ProbCons, and T-Coffee (TC), use the probabilistic consistency as a prior step to the progressive alignment stage to improve the final accuracy. However, such methods are severely limited by the memory required to store the consistency information. Big data processing and persistence techniques are used to manage and store the huge amount of information that is generated. Although these techniques have significant advantages, few biological applications have adopted them. In this article, a novel approach named big data tree-based consistency objective function for alignment evaluation (BDT-Coffee) is presented. BDT-Coffee is based on the integration of consistency information through Cassandra database in TC, previously generated by the MapReduce processing paradigm, to enable large data sets to be processed with the aim of improving the performance and scalability of the original algorithm.ca_ES
dc.description.sponsorshipThis work has been supported by the MEyC-Spain under contract Nos. TIN2014-53234-C2- 2-R and TIN2017-84553-C2-2-R.ca_ES
dc.identifier.doihttps://doi.org/10.1089/cmb.2018.0084
dc.identifier.idgrec027254
dc.identifier.issn1066-5277
dc.identifier.issn1557-8666
dc.identifier.urihttp://hdl.handle.net/10459.1/69308
dc.language.isoengca_ES
dc.publisherMary Ann Liebertca_ES
dc.relationinfo:eu-repo/grantAgreement/MINECO//TIN2014-53234-C2-2-R/ES/PENSAMIENTO COMPUTACIONAL E INGENIERIA DEL RENDIMIENTO PARA APLICACIONES DE CIENCIAS DE LA VIDA Y MEDIOAMBIENTALES - UDL/ca_ES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-84553-C2-2-R/ES/APROVECHANDO LOS NUEVOS PARADIGMAS DE COMPUTO PARA LOS RETOS DE LA SOCIEDAD DIGITAL - UDL/ca_ES
dc.relation.isformatofVersió postprint del document publicat a https://doi.org/10.1089/cmb.2018.0084ca_ES
dc.relation.ispartofJournal of Computational Biology, 2018, vol, 25, nun. 8, p. 894-906ca_ES
dc.rights(c) Mary Ann Liebert , 2018ca_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca_ES
dc.subjectCassandraca_ES
dc.subjectHadoopca_ES
dc.subjectLarge-scale alignmentsca_ES
dc.subjectMSAca_ES
dc.subjectSparkca_ES
dc.subjectT-Coffeeca_ES
dc.titleScalable Consistency in T-Coffee Through Apache Spark and Cassandra Databaseca_ES
dc.typeinfo:eu-repo/semantics/articleca_ES
dc.type.versioninfo:eu-repo/semantics/acceptedVersionca_ES
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
027254.pdf
Size:
398.68 KB
Format:
Adobe Portable Document Format
Description:
Postprint
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: