DeepSeek Coder comprises a series of code language models trained from scratch on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T tokens. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on repo-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, resulting in foundational models (DeepSeek-Coder-Base). We further fine-tune the base model with 2B tokens of instruction data to get instruction-tuned models, namedly DeepSeek-Coder-Instruct.

  • Pretrained on 2 Trillion tokens over more than 80 programming languages.
  • Various model sizes (1.3B, 5.7B, 6.7B and 33B) to support different requirements.
  • A window size of 16K window size, supporting project-level code completion and infilling.
  • State-of-the-Art performance among open code models.
  • Open source and free for research and commercial use.
Description of the GIF


We evaluate DeepSeek Coder on various coding-related benchmarks. The result shows that DeepSeek-Coder-Base-33B significantly outperforms existing open-source code LLMs. Compared with CodeLLama-34B, it leads by 7.9%, 9.3%, 10.8% and 5.9% respectively on HumanEval Python, HumanEval Multilingual, MBPP and DS-1000. Surprisingly, our DeepSeek-Coder-Base-7B reaches the performance of CodeLlama-34B. And the DeepSeek-Coder-Instruct-33B model after instruction tuning outperforms GPT-3.5-turbo on HumanEval and achieves comparable result with GPT-3.5-turbo on MBPP.

图像2 图像1

Fig. State-of-the-art Performance on various Coding Benchmarks and Multilingual HumanEval

(1) Performance of different Code LLMs on Multilingual HumanEval Benchmark


(2) Performance of different Code LLMs on MBPP Benchmark


(3) Performance of different Code LLMs on DS-1000 Benchmark


(4) Performance of different Code Models on Math-Reasoning Tasks.


How to Use DeepSeek Coder

Contact Us

If you have any questions, please raise an issue or contact us at agi_code@deepseek.com.