Shared memory is a common inter-processor communication paradigm for on-chip multiprocessor SoC (MPSoC) platforms. The latency overhead of switch-based interconnection networks plays a critical role in shared memory MPSoC designs. In this paper, we propose a directory-cache embedded switch architecture with distributed shared cache and distributed shared memory. It is able to reduce the number of home node cache accesses, which results in a reduction in the inter-cache transfer time and the total execution time. Simulation results verify that the proposed methodology can improve performance substantially over a design in which directory caches are not embedded in the switches.