Population structure analyses and genome-wide association studies (GWAS) conducted on crop germplasm collections provide valuable information on the frequency and distribution of alleles governing economically important traits. The value of these analyses is substantially enhanced when the accession numbers can be increased from ~1,000 to ~10,000 or more. In this research, we conducted the first comprehensive analysis of population structure on the collection of 14,000 soybean accessions [Glycine max (L.) Merr. and G. soja Siebold & Zucc.] using a 50K- SNP chip. Accessions originating from Japan were relatively homogenous and distinct from the Korean accessions. As a whole, both Japanese and Korean accessions diverged from the Chinese accessions. The ancestry of founders of the American accessions derived mostly from two Chinese subpopulations, which reflects the composition of the American accessions as a whole. A 12,000 accession GWAS conducted on seed protein and oil is the largest reported to date in plants and identified single nucleotide polymorphisms (SNPs) with strong signals on chromosomes 20 and 15. A chromosome 20 region previously reported to be important for protein and oil content was further narrowed and now contains only three plausible candidate genes. The haplotype effects show a strong negative relationship between oil and protein at this locus, indicating negative pleiotropic effects or multiple closely linked loci in repulsion phase linkage. The vast majority of accessions carry the haplotype allele conferring lower protein and higher oil. Our results provide a fuller understanding of the distribution of genetic variation contained within the USDA soybean collection and how it relates to phenotypic variation for economically important traits.