Integrating Information in Biological Ontologies and Molecular Networks to Infer Novel Terms
--The unification of Discordant Networks (UNICORN) algorithm and supplementary files
Le Li and Kevin Y. Yip.
Department of Computer Science and Engineering
The Chinese University of Hong Kong
1. The Unicorn program
There are three parts of the Unicorn pipeline:
- The Unicorn algorithm: produce the integrated network;
- CliXO: take the integrated networks to infer the ontologies;
- Unicorn_measurement: automatically measure the performance of the inferred ontology (recall, precision and F-score).
Here we provide the Matlab code for Unicorn and Unicorn_measurement, you need to download part 2 (the CliXO algorithm) at https://github.com/mhk7/clixo_0.3.
The readme file shows the details of the algorithms, and how to smoothly run the program.
The reference paper of the Unicorn algorithm is:
2. Supplementary files
a. The potential novel cases suggested by all the Unicorn Ontologies.
Here we provide supplementary files of the above reference paper which are the lists of novel terms discovered by Unicorn with either a parent term or
a child term aligned to a GO term with alignment score > 0.8. We have used different training data and parameters to infer the ontology, and all these results are included here.
The files in the package are named in a uniform way. For example, for the file 'found_subontologies_(0071944_0.02_0.5_JRR_newterm_minscore_0.8.txt':
- '0071944' is the term index of the training part (GO:0071944). There are in total 33 cases involved;
- '0.02' is the alpha parameter of the CliXO algorithm. Possible values include 0.005, 0.01, 0.015 and 0.02;
- '0.5' is the beta parameter of the CliXO algorithm, which is fixed;
- 'minscore_0.8' states that the minimum alignment score of a novel term is 0.8.
The novel terms on these lists, although not verified, could be treated as candidates of new terms in gene ontology.
b. The ontology (and annotation) samples inferred by Unicorn.
Besides, we also picked out the best ontology among the plenty of the ontologies inferred by Unicorn for each subtype, i.e. Biological Process, Cellular Component, and Molecular Function. These three best ontologies could be utilized for further research or analysis. It should be noted that these three ontologies don't cover all the above novel cases.
If you have any questions about the algorithm or the supplementary files, please contact us.