Utilize este identificador para referenciar este registo: http://hdl.handle.net/10362/5308
Título: Burrows-wheeler transform in secondary memory
Autor: Pereira, Sérgio Miguel Cachucho
Orientador: Russo, Luís
Palavras-chave: Suffix arrays
External sorting
Pattern matching
Data de Defesa: 2010
Editora: Faculdade de Ciências e Tecnologia
Resumo: A suffix array is an index, a data structure that allows searching for sequences of characters. Such structures are of key importance for a large set of problems related to sequences of characters. An especially important use of suffix arrays is to compute the Burrows-Wheeler Transform, which can be used for compressing text. This procedure is the base of the UNIX utility bzip2. The Burrows-Wheeler transform is a key step in the construction of more sophisticated indexes. For large sequences of characters, such as DNA sequences of about 10 GB, it is not possible to calculate the Burrows-Wheeler transform in an average computer without using secondary memory. In this dissertation we will study the state-of-the-art algorithms to construct the Burrows-Wheeler transform in secondary memory. Based on this research we propose an algorithm and compare it against the previous ones to determine its relative performance. Our algorithm is based on the classical external Heapsort. The novelty lies in a heap that is especially designed for suffix arrays, which we call String Heap. This algorithm aims to be space-conscious, while trying to handle the disk access dominance over main memory access. We divide our solution in two parts, splitting and merging suffix arrays, the latter is the main application of the String Heap. The merging part produces the BWT, as a side effect of merging a set of partial suffix arrays of a text. We also compare its performance against the other algorithms. We also study a second version of the algorithm that accesses secondary memory in blocks.
Descrição: Master’s Thesis in Computer Engineering
URI: http://hdl.handle.net/10362/5308
Aparece nas colecções:FCT: DI - Dissertações de Mestrado

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
Pereira_2010.pdf1,4 MBAdobe PDFVer/Abrir

FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.