Research Article Open Access

Development of Deduced Protein Database Using Variable Bit Binary Encoding

B. Parvathavarthini, B. Rajesh Kanna and L. Rajeswaridevi


A large amount of biological data is semi-structured and stored in any one the following file formats such as flat, XML and relational files. These databases must be integrated with the structured data available in relational or object-oriented databases. The sequence matching process is difficult in such file format, because string comparison takes more computation cost and time. To reduce the memory storage size of amino acid sequence in protein database, a novel probability-based variable bit length encoding technique has been introduced. The number of mapping of triplet CODON for every amino acid evaluates the probability value. Then, a binary tree has been constructed to assign unique bits of binary codes to each amino acid. This derived unique bit pattern of amino acid replaces the existing fixed byte representation. The proof of reduced protein database space has been discussed and it is found to be reduced between 42.86 to 87.17%. To validate our method, we have collected few amino acid sequences of major organisms like Sheep, Lambda phage and etc from NCBI and represented them using proposed method. The comparison shows that of minimum and maximum reduction in storage space are 43.30% and 72.86% respectively. In future the biological data can further be reduced by applying lossless compression on this deduced data.

Journal of Computer Science
Volume 4 No. 6, 2008, 467-473


Submitted On: 1 September 2008 Published On: 30 June 2008

How to Cite: Parvathavarthini, B., Kanna, B. R. & Rajeswaridevi, L. (2008). Development of Deduced Protein Database Using Variable Bit Binary Encoding. Journal of Computer Science, 4(6), 467-473.

  • 0 Citations



  • Binary tree
  • protein sequence
  • amino acid