Design and implementation of an Algorithm for an Author Disambiguation problem
Echeverria Rovira, Lluís
MetadataShow full item record
Person name disambiguation is basic to distinguish persons that share the same name where unique identifiers are not defined. This problem is common in many domains, including digital libraries or data bases with publications, where the same name can refer to multiple unique authors. With the aim
to attributing correctly the work, the data bases must be disambiguated. This project wants to give a possible solution to this problem, designing and implementing an algorithm for the disambiguation of the names. Different techniques and tools, within the scope of the distributed computations, like Spark or Hadoop, will be used in the development, in order to improve the efficiency of the process. As a base data set, the more than 8 millions of publications from the AGRIS (International System for Agricultural and Technology) repository will be used in the disambiguation process.
The following license files are associated with this item:
Showing items related by title, author, creator and subject.
Romeu Farré, Carolina (2019-09)Use big data technology to implement the algorithm BLAST uses to align genetic sequences that need to process large amounts of data and connect to save the data with a NoSQL database (Apache Cassandra).
MPI-based implementation of an enhanced algorithm to solve the LPN problem in a memory-constrained environment Teixidó Torrelles, Ivan; Sebé Feixas, Francesc; Conde Colom, Josep; Solsona Tehàs, Francesc (Elsevier, 2014)In recent years, several lightweight cryptographic protocols whose security lies in the assumed intractability of the learning parity with noise (LPN) problem have been proposed. The LPN problem has been shown to be ...
PPCAS: Implementation of a Probabilistic Pairwise Model for Consistency-Based Multiple Alignment in Apache Spark Lladós Segura, Jordi; Guirado Fernández, Fernando; Cores Prado, Fernando (Springer, 2017)Large-scale data processing techniques, currently known as Big-Data, are used to manage the huge amount of data that are generated by sequencers. Although these techniques have significant advantages, few biological ...