This study evaluated the performance of three frequently applied statistical downscaling tools including SDSM, SVM, and LARS-WG, and their model-averaging ensembles under diverse moisture conditions with respect to the capability of reproducing the extremes as well as mean behaviors of precipitation. Daily observed precipitation and NCEP reanalysis data of 30 stations across China were collected for the period 1961-2000, and model parameters were calibrated for each season at individual site with 1961-1990 as the calibration period and 1991-2000 as the validation period. A flexible framework of multi-criteria model averaging was established in which model weights were optimized by the shuffled complex evolution algorithm. Model performance was compared for the optimal objective and nine more specific metrics. Results indicate that different downscaling methods can gain diverse usefulness and weakness in simulating various precipitation characteristics under different circumstances. SDSM showed more adaptability by acquiring better overall performance at a majority of the stations while LARS-WG revealed better accuracy in modeling most of the single metrics, especially extreme indices. SVM provided more usefulness under drier conditions, but it had less skill in capturing temporal patterns. Optimized model averaging, aiming at certain objective functions, can achieve a promising ensemble with increasing model complexity and computational cost. However, the variation of different methods' performances highlighted the tradeoff among different criteria, which compromised the ensemble forecast in terms of single metrics. As the superiority over single models cannot be guaranteed, model averaging technique should be used cautiously in precipitation downscaling.