Polar codes have emerged as important channel codes because of their capacity-achieving property. For low-complexity polar decoding, hardware architectures for successive cancellation (SC) algorithm have been investigated in prior works. However, belief propagation (BP)-based architectures have not been explored in detail. This paper begins with a review of min-sum (MS) approximated BP algorithm, and then proposes a scaled MS (SMS) algorithm with improved decoding performance. Then, in order to solve long critical path problem in the SMS algorithm, we propose an efficient critical path reduction approach. Due to its generality, this optimization method can be applied to both of SMS and MS algorithms. Compared with the state-of-the-art MS decoder, the proposed (1024, 512) SMS design can lead to 0.5dB extra decoding gain with the same hardware performance. Besides, the proposed optimized MS architecture can also achieve more than 30% and 80% increase in throughput and hardware efficiency, respectively.