Development of Deduced Protein Database Using Variable Bit Binary Encoding

B. Parvathavarthini; B. Rajesh Kanna; L. Rajeswaridevi

doi:10.3844/jcssp.2008.467.473

Research Article Open Access

Development of Deduced Protein Database Using Variable Bit Binary Encoding

B. Parvathavarthini, B. Rajesh Kanna and L. Rajeswaridevi

Abstract

A large amount of biological data is semi-structured and stored in any one the following file formats such as flat, XML and relational files. These databases must be integrated with the structured data available in relational or object-oriented databases. The sequence matching process is difficult in such file format, because string comparison takes more computation cost and time. To reduce the memory storage size of amino acid sequence in protein database, a novel probability-based variable bit length encoding technique has been introduced. The number of mapping of triplet CODON for every amino acid evaluates the probability value. Then, a binary tree has been constructed to assign unique bits of binary codes to each amino acid. This derived unique bit pattern of amino acid replaces the existing fixed byte representation. The proof of reduced protein database space has been discussed and it is found to be reduced between 42.86 to 87.17%. To validate our method, we have collected few amino acid sequences of major organisms like Sheep, Lambda phage and etc from NCBI and represented them using proposed method. The comparison shows that of minimum and maximum reduction in storage space are 43.30% and 72.86% respectively. In future the biological data can further be reduced by applying lossless compression on this deduced data.

Journal of Computer Science

Volume 4 No. 6, 2008, 467-473

DOI: https://doi.org/10.3844/jcssp.2008.467.473

Submitted On: 1 September 2008 Published On: 30 June 2008

How to Cite: Parvathavarthini, B., Kanna, B. R. & Rajeswaridevi, L. (2008). Development of Deduced Protein Database Using Variable Bit Binary Encoding. Journal of Computer Science, 4(6), 467-473. https://doi.org/10.3844/jcssp.2008.467.473

Copyright: © 2008 B. Parvathavarthini, B. Rajesh Kanna and L. Rajeswaridevi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

3,672 Views
3,015 Downloads
0 Citations

Download

Keywords

Binary tree
protein sequence
amino acid