Over the course of the last year, we have worked to adapt our multifluid PPM code to run well at scale on the Blue Waters machine at NCSA as well as on networks of Intel Xeon Phi coprocessors. The work on Blue Waters has been in collaboration with Cray and that with Intel's MIC co-processors in collaboration with Intel. Our starting point for this work was a version of the code that was developed to run well at scale on the Los Alamos Roadrunner machine. We therefore began with an implementation that was designed to take advantage of heterogeneous processor systems. In this paper, we will discuss scaling issues encountered on Blue Waters as well as issues encountered with Intel's MIC co-processors. We present the code structure that we developed in this work, beginning with its parallel implementation using heterogeneous MPI processes and proceeding to its parallel implementation on a single multi-or many-core CPU. We also present a sampling of results from a simulation on Blue Waters on a 1.18 trillion cell grid that ran at a sustained rate in 32-bit precision of 1.5 Pflop/s.