Travel time estimation is an important component in modern transportation applications. The state of the art techniques for travel time estimation use GPS traces to learn the weights of a road network, often modeled as a directed graph, then apply Dijkstra-like algorithms to find shortest paths. Travel time is then computed as the sum of edge weights on the returned path. In order to enable time-dependency, existing systems compute multiple weighted graphs corresponding to different time windows. These graphs are often optimized offline before they are deployed into production routing engines, causing a serious engineering overhead. In this paper, we present STAD, a system that adjusts-on the fly-travel time estimates for any trip request expressed in the form of origin, destination, and departure time. STAD uses machine learning and sparse trips data to learn the imperfections of any basic routing engine, before it turns it into a full-fledged time-dependent system capable of adjusting travel times to real traffic conditions in a city. STAD leverages the spatio-Temporal properties of traffic by combining spatial features such as departing and destination geographic zones with temporal features such as departing time and day to significantly improve the travel time estimates of the basic routing engine. Experiments on real trip datasets from Doha, New York City, and Porto show a reduction in median absolute errors of 14% in the first two cities and 29% in the latter. We also show that STAD performs better than different commercial and research baselines in all three cities.