Background: Network Component Analysis (NCA) has shown its effectiveness in discovering regulators and inferring transcription factor activities (TFAs) when both microarray data and ChIP-on-chip data are available. However, a NCA scheme is not applicable to many biological studies due to limited topology information available, such as lack of ChIP-on-chip data. We propose a new approach, motif-directed NCA (mNCA), to integrate motif information and gene expression data to infer regulatory networks. Results: We develop motif-directed NCA (mNCA) to incorporate motif information into NCA for regulatory network inference. While motif information is readily available from knowledge databases, it is a "noisy" source of network topology information consisting of many false positives. To overcome this problem, we develop a stability analysis procedure embedded in mNCA to resolve the inconsistency between motif information and gene expression data, and to enable the identification of stable TFAs. The mNCA approach has been applied to a time course microarray data set of muscle regeneration. The experimental results show that the inferred TFAs are not only numerically stable but also biologically relevant to muscle differentiation process. In particular, several inferred TFAs like those of MyoD, myogenin and YY1 are well supported by biological experiments. Conclusion: A novel computational approach, mNCA, has been developed to integrate motif information and gene expression data for regulatory network reconstruction. Specifically, motif analysis is used to obtain initial network topology, and stability analysis is developed and applied with mNCA to extract stable TFAs. Experimental results on muscle regeneration microarray data have demonstrated that mNCA is a practical and reliable computational method for regulatory network inference and pathway discovery.
Copyright 2013 Elsevier B.V., All rights reserved.