Virulence factors of influenza
Chew, Chery Li Ting
Date of Issue2017-04-25
School of Computer Science and Engineering
Influenza viruses have been capable of causing epidemics outbreak throughout history, causing widespread death and economic losses in the commercial poultry industry. There are currently many ongoing research trying to determine the virulence factor of the influenza virus. In view of the death toll caused by multiple low pathogenicity influenza virus outbreak in many poultry farms, animal experiments result indicating virulence of the influenza virus strain will provide more concrete information in determining the pathogenicity of the influenza virus strain instead of relying solely on sequence analysis. However, as biological experiments are usually expensive and time consuming, it is not always feasible in every research. By integrating the results of the animal experiments indicating virulence of the influenza virus into a single dataset, computational analysis can be carried out to determine any patterns in the sequences of the virus strains. Therefore, part of the work in this project was to integrate the results of the animal experiments indicating virulence into a single dataset. Based on the results of animal testing, pathogenicity level for each of the sequences were annotated. The goal of this project was to carry out classification on the protein sequences of the influenza A virus of subtype H5N1, H1N1, H3N2 and H5N7 isolated from the Avian species according to the pathogenicity of the virus strains. The project follows the Knowledge Discovery and Data Mining (KDD) process, i.e. data collection, data pre-processing, data mining technique, i.e. classification and evaluation and analysis of results. Data were mainly collected from the NCBI database, followed by the annotations of the influenza virus with the pathogenicity of the virus strains in different animals dictated by the animal experiments result in research papers. Data pre-processing steps were taken on the dataset and multiple sequence alignment of the protein sequences of the virus using the MAFFT tool are subsequently applied. Binary and multi-label classification were performed on the aligned sequences and results of the classification are then analysed and performance of the classifiers evaluated.
Final Year Project (FYP)
Nanyang Technological University