Ninety (90) graph-theoretic indices were calculated for a diverse set of 3692 chemicals to test the efficacy of using graph-theoretic indices in determining similarity of chemicals in a large, diverse data base of structures. Principal component analysis was used to reduce the 90-dimensional space to a 10-dimensional subspace which explains 93% of the variance. Distance between chemicals in this 10-dimensional space was used to measure similarity. To test this approach, ten chemicals were chosen at random from the set of 3692 chemicals and the five nearest neighbors for each of these ten target chemicals were determined. The results show that this measure of similarity reflects intuitive notions of chemical similarity.
Bibliographical noteFunding Information:
This research was supported by cooperative agreements (CR-810824-01 and CR-81 1981-01)b etween the U.S. Environmental Protection Agency and the University of Minnesota, Duluth. The authors are appreciative of the efforts of Cynthia Frane, Greg Grunwald, Mark Rosen and Jane Zeleznikar for their assistance in the project.