Thread-level parallelism at the chip level is critical in overcoming some of the challenges that have been ushered in through the advent of modern multicore processors (CMP). Extracting speculatively parallel threads from sequential applications and executing these threads on multicore processors is a promising technique to speed up these applications on multicore systems. However, the potential degradation in energy efficiency associated is an important factor that hinders the deployment of this technique. For multicore systems that integrate same-ISA heterogeneous cores, it is possible to judiciously allocate speculative threads to achieve energy-efficient performance improvement. In this paper, we examine multicore systems with multiple same-ISA heterogeneous cores, some of which supporting simultaneous multithreading. In this environment, we propose thread-allocation mechanisms that dynamically determine how speculative threads are allocated. The proposed mechanisms can potentially allow heterogeneous multicore systems to aim to achieve significant performance improvement with moderate energy increase. At run time, for each segment of speculative parallel execution and sequential execution, the thread-allocation mechanisms make the following three decisions: (i) whether the speculative parallel threads should be deployed to a single core with SMT support or to multiple cores each supporting a single thread of execution; (ii) whether the parallel/sequential threads should utilize more powerful cores with a high issue width or a less powerful core with low issue width; (iii) whether the L1 caches should be fully activated or partially activated. The proposed thread-allocation mechanisms migrate threads and/or re-size L1 caches to maximize energy efficiency (measured in ED2P), based on these decisions. Throttling mechanisms have been incorporated in the proposed system to suppress thread management operations when the performance/energy benefit of these operations cannot justify the associated overhead. By evaluating speculatively parallelized benchmarks from SPEC CPU 2006 and 2000, we found that the proposed heterogeneous multicore system with dynamic thread management is 13% more energy efficient, in terms of ED2P, than the most energy-efficient homogeneous system. This corresponds to 4% performance improvement and 6% reduction in energy consumption. When compare to a four-issue superscalar core that execute the unmodified sequential program with a fixed L1 cache size, the proposed system is 44% more energy efficient, in terms of ED2P. This corresponds to a 38% performance improvement with 6% increase in energy consumption.