Ontology applications in systems biology: a machine learning approach
Keywords
BioinformaticsOntology
Machine learning
Formal concept analysis
Systems biology
Propositional learning
Concept lattice
Visual analytics
First order ontology learning
Full record
Show full item recordOnline Access
http://handle.unsw.edu.au/1959.4/53749Abstract
Biology is flooded with an overwhelming accumulation of data and biologists require methods to apply their knowledge to explain biological networks of interacting genes or proteins in comprehensible terms. Therefore the focus of modern bioinformatics has shifted towards systems-wide analysis to understand mechanisms such as those underlying important diseases. Knowledge acquisition from such exponentially growing, inherently noisy and unstructured data is only likely to be achieved by combining bioinformatics, machine learning and semantic technologies such as ontologies. The major contribution of this thesis is on novel ontology applications to integrate complex multi-relational data towards learning models of biological systems. First we examined machine learning using ontology annotations to integrate heterogeneous data on systems biology. A series of propositional learning tasks to learn predictive models of intra-cellular expression in cells showed that feature construction and selection improved performance. Learning to predict phenotype is harder than predicting protein or gene expression, since identifying systems responses requires the integration of multiple potential causes and effects. In this thesis we applied Formal Concept Analysis (FCA) to integrate multiple experiments and identify common subsets of genes that share common systemic behaviour. Visual analytics was then applied to enable users to navigate concept lattices and generate training sets for further analysis by Inductive Logic Programming (ILP). This showed learned rules with biological background knowledge contained potentially interesting relations when validated. However, these rules are not always verifiable by humans. To address this issue a novel method called “visual closure", by analogy to the closure of formal concepts, was implemented. Rules, viewed as concepts, can be expanded by conversion to Datalog queries which then are used to search for additional knowledge in biological databases. The visual closure technique is then applied to complete these expanded concepts for visualization by domain specialists. This thesis has demonstrated novel ontology applications in systems biology. However, the question of how to acquire ontologies remains. Ontologies in systems biology often require relational representations due to the importance of network structures. Therefore, as our final step, an initial version of automated ontology construction in a first order representation is demonstrated.Date
2014Type
ThesisIdentifier
oai:unsworks.unsw.edu.au:1959.4/53749http://handle.unsw.edu.au/1959.4/53749