Current computer vision algorithms can process video sequences and perform key low-level functions, such as motion detection, motion tracking, and object classification. This motivates activity detection (e.g. recognizing people's behavior and intent), which is becoming increasingly important. However, they all have severe performance limitations when used over an extended range of applications. They suffer from high false detection rates and missing detection rates, or loss of track due to partial occlusions, etc. Also, activity detection is limited to 2D image domain and is confined to qualitative activities (such as a car entering a region of interest). Adding 3D information will increase the performance of all computer vision algorithms and the activity detection system. In this paper, we propose a unique approach which creates a 3D site model via sensor fusion of laser range finder and a single camera, which then can convert the symbolic features (pixel based) of each object to physical features (e.g. feet or yards). We present experimental results to demonstrate our 3D site model.