In this paper we systematically compare forecasting accuracy of hypothesis testing procedures with that of a model combining algorithm. Testing procedures are commonly used in applications to select a model, based on which forecasts are made. However, besides the well-known difficulty in dealing with multiple tests, the testing approach has a potentially serious drawback: controlling the probability of Type 1 error at a conventional level (e.g., 0.05) often excessively favors the null, which can be problematic for the purpose of forecasting. In addition, as shown in this paper, testing procedures can be very unstable, which results in high variability in the forecasts. Selecting a candidate forecast by testing and combining forecasts are both useful but for complementary situations. Currently, there seems to be little guidance in the literature on when combining should be preferred to selecting. We propose instability measures that are helpful for a forecaster to gauge the difficulty in selecting a single optimal forecast. Based on empirical evidences and theoretical considerations, we advocate the use of forecast combining when there is considerable instability in model selection by testing procedures. On the other hand, when there is little instability, testing procedures could work well or even better than forecast combining in terms of forecast accuracy.