Policy gradient methods for approximate optimal control and reinforcement learning fix parameterized form of the controller and then perform gradient descent on the cost-to-go function. In reinforcement learning for stochastic state-feedback problems, it has been shown that the natural gradient of the cost-to-go function can be approximated via samples of the state and step-cost, using no information about the plant model. There, the natural gradient is the gradient with respect to the Riemannian metric defined by the Fisher information matrix of the controller parameters. We give a general method for approximating the natural gradient for nonlinear output-feedback stochastic control problems with dynamic controllers. For linear systems, we give explicit formulas to compute the natural gradient when plant matrices are known, in both state and output feedback cases.