Presented are results demonstrating that, in developing a system with its first objective being the sustained detection of adults and young children as they move and interact in a normal preschool setting, the direct application of the straightforward RGB-D innovations presented here significantly outperforms even far more algorithmically advanced methods relying solely on images. The use of multiple RGB-D sensors by this project for depth-aware object localization economically resolves numerous issues regularly frustrating earlier vision-only detection and human surveillance methods, issues such as occlusions, illumination changes, unexpected postures, atypical morphologies, erratic or unanticipated motions, reflections, and misleading textures and colorations. This multiple RGB-D installation forms the front-end for a multi-step pipeline, the first portion of which seeks to isolate, in situ, 3D renderings of classroom occupants sufficient for a later analysis of their behaviors and interactions. Towards this end, a voxel-based approach to foreground/background separation and an effective adaptation of supervoxel clustering for 3D were developed, and 3D and image-only methods were tested and compared. The project's setting is highly challenging, but then so are its longer term goals: the automated detection of early childhood precursors, ofttimes very subtle, to a number of increasingly common developmental disorders.