In this paper, we developed a deep learning-based compression model to reduce the data rate of multichannel action potentials in neural recording experiments. The proposed compression model is built upon a deep compressive autoencoder (CAE) with discrete latent embeddings. The encoder network of CAE is equipped with residual transformations to extract representative features from spikes, which are mapped into the latent embedding space and updated via vector quantization (VQ). The indexes of VQ codebook are further entropy coded as the compressed signals. The decoder network reconstructs spikes with high quality from the latent embeddings. Experimental results on both synthetic and in-vivo datasets show that the proposed model consistently outperforms conventional methods that utilize hand-crafted features and/or signal-agnostic transformations by achieving much higher compression ratios (20-500 ×) and better or comparable signal reconstruction accuracies. Furthermore, we have estimated the hardware cost of the CAE model and shown the feasibility of its on-chip integration with neural recording circuits. The proposed model can reduce the required data transmission bandwidth in large-scale recording experiments and maintain good signal qualities, which will be helpful to design power-efficient and liahtweiaht wireless neural interfaces.