In recent years, reinforcement learning (RL) algorithms have been successfully used in energy management strategies (EMS) for hybrid electric vehicles (HEVs) and extended range electric vehicles (EREVs) operating on standard driving cycles and fixed driving routes. For many real-world applications like last-mile package delivery however, although vehicles may traverse the same region, the actual distance and energy intensity can be significantly different day-to-day. Such variation renders existing RL approaches less useful for optimizing energy consumption because vehicle velocity trajectories and routes are not known a priori. This paper presents an actor-critic based RL framework with continuous output to optimize a rule-based (RB) vehicle parameter in the engine control logic during the trip in real-time under uncertainty. The EMS is then tested on an in-use EREV for delivery equipped with two-way vehicle-to-cloud connectivity. The algorithm was trained on 52 historical trips to learn a generalized strategy for different trip distance and energy intensity. An average of 21.8% fuel efficiency improvement in miles per gallon gasoline equivalent was demonstrated on 51 unforeseen test trips made by the same vehicle with a distance range of 31 to 54 miles. The framework can be extended to other RB methods and EREV applications like transit buses and commuter vehicles.