Friday, June 14, 2019
Efficiency of Clustering algorithms for mining large biological data Research Paper
Efficiency of Clustering algorithmic rules for mining large biological info bases - Research Paper ExampleThey argon categorized into portioning, hierarchical and graph-based techniques. The most widely utilize of the three algorithms are the graph-based technique, and the hierarchical technique. However, the partitioning techniques are used in other disciplines it is less used in gene sequence clustering and as such, there is no substantial theory of whether the partitioning methods are efficient. This study analyzes four clustering mining algorithms using four large protein sequence data sets. The analysis highlights the weakness and shortcomings of the four and proposes a new algorithm based on the shortcomings of the four algorithms. Introduction Today, protein sequences are more than one million (Sasson et al., 2002) and as such, there is need in bioinformatics for identifying meaningful patterns for the purposes of understanding their functions. For a long time, protein and gene sequences have been analyzed, compared and grouped using junction methods. According to Cai et al. (2000), alignment methods are algorithms constructed to arrange, RNA, DNA, and protein sequences to detect similarities that may be as a go away of evolutionary, functional or structural sequence relationships. Mount (2002) asserts that comparing and clustering sequences is done using pair-wise alignment method, which are of two types, global and local. Consequently, local alignment algorithm proposed by Waterman and Smith (Bolten et al., 2001) is utilized in identifying amino acid patterns that have been conserved in protein sequences. The global alignment algorithm proposed by Wunsh and Needleman (Bolten et al., 2001) is used to try and align many characters of the entire sequence. It is clear from the above that the pair-wise alignment method is expensive when it comes to comparing and clustering a large protein data set. This is because there are very many comparisons perfo rmed during computation, since every single protein in a data set is compared to all the proteins in the data set (Bolten et al., 2001). This brings into nous the efficiency of the pair-wise alignment methods in comparing and clustering of large protein data sets. The pair-wise alignment method, both local and global, do not put into consideration the surface of the data set, especially too large data sets that may overwhelm the computer memory. Han & Kamber (2000) argues that, unsupervised learning is aimed at identifying from a data set, a valid partition or a natural pattern with the help of a distance function. Biology and life science fields have extensively work clustering techniques in sequence analysis to classify similar sequences into either protein or gen families (Galperin & Koonin, 2001). Currently, protein sequences can be classified in similar patterns using various, pronto available sequencing and clustering methods. As had earlier been mentioned, these methods c an be grouped as graph-based, partitioning and hierarchical methods. These methods, especially graph-based and hierarchical methods, have been used consecutively or together to complement each other as argued by Sasson et al. (2002), Sperisen & Pagni (2005), Essoussi & Fayech (2007) and Enright & Ouzounis (2000). In the field of protein comparison and sequence clustering, there are very a couple of(prenominal) instances in which partitioning techniques have been used. For instance, Guralnik & Karypis (2001) proposed an algorithm or sequencing method-on the
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.