A parallel implantation of the direct simulation Monte Carlo (DSMC) method is presented. The implementation uses a hierarchical three-level Cartesian grid for the flow field discretization which is adaptively refined to the local mean-free-path using non-binary refinement. The finest cells act as collision cells where each is refined to the local value of the mean free-path and employs its own time step step set to a fraction of the local mean-collision-time. Cutcell algorithms are used to embed arbitrary triangulated surface geometries from the background flow grid. All particles on each partition can then be moved for an entire time step even passing through portions of the grid owned by neighboring partitions. The resulting parallel implementation guarantees a single and efficient synchronization step per partition at each time step. An idealized problem of free stream flow is simulated to demonstrate how grid partitioning relative to the bulk flow direction and load balancing can negatively influence parallel scalability. Good strong scaling is demonstrated for this idealized case when as few as 80,000 particles are contained on each partition. Finally, 3D flow solutions are presented for hypersonic flow over a planetary probe geometry which highlight a number of the features of the parallel implementation as well as the challenges in performing scalable DSMC calculations for strong-gradient flows over complex geometries.