What You Need to Know About DeepSeek’s R1
DeepSeek is a Chinese artificial intelligence company based in Hangzhou, Zhejiang, founded in 2023 by Liang Wenfeng, who also serves as its CEO. The company specializes in developing open-source large language models (LLMs) and is owned and funded by the Chinese hedge fund High-Flyer.
In January 2025, DeepSeek released its 1st free chatbot app, based on the DeepSeek-R1 model, for iOS and Android platforms. By January 27, 2025, this app had surpassed ChatGPT as the most-downloaded free app on the iOS App Store in the United States. This rapid ascent led to a significant impact on the tech industry, including an 18% drop in Nvidia’s share price.
DeepSeek’s models have been noted for their efficiency, achieving performance comparable to leading models like OpenAI’s GPT-4 but at a fraction of the cost and computing power. The company employs innovative techniques, such as the “mixture of experts” approach, which activates only the necessary computing resources for a given task, enhancing efficiency.
Comparison between OpenAI’s ChatGPT and DeepSeek’s R1
Use Case Comparison
Cost Comparison
The essence of DeepSeek’s R1 model is extremely cost efficient. The secret sauce is through “mixture of experts” (MoE) model.
How MoE Works:
The Mixture of Experts (MoE) model operates by dividing tasks among specialized sub-models, known as experts, to improve efficiency and performance. The process begins with input processing, where input data, such as text or images, is fed into the model.
A router (or gating network) then analyzes the input and determines which experts are best suited to handle it. This router plays a critical role in assigning weights to each expert based on the input, which dictate how much each expert contributes to the final output. To optimize computational efficiency, only the top-k experts (e.g., the top 1 or 2) with the highest weights are activated, ensuring that resources are focused on the most relevant tasks.
Each expert in the MoE model is a specialized neural network trained to handle specific types of inputs or tasks. Once the router selects the appropriate experts, they process the input independently and generate their respective outputs. These outputs are then combined based on the weights assigned by the router, resulting in a weighted combination that forms the final output of the MoE model. This dynamic routing and combination process allows the model to leverage the strengths of multiple specialized networks while maintaining computational efficiency.
The training of the MoE model involves jointly training the router and the experts, typically using techniques like backpropagation and gradient descent. This joint training ensures that the router learns to effectively assign inputs to the most appropriate experts, optimizing the overall performance of the model.
Through this collaborative process, the MoE architecture achieves a balance between specialization, scalability, and computational efficiency, making it a powerful approach for handling complex tasks in large-scale AI systems.
In summary, the benefits of MoE are:
✅ More Efficient: Uses only a fraction of the total model at a time, reducing computing power.
✅ Faster & Cost-Effective: Runs large models with lower costs, making AI more accessible.
✅ Scalability: Can scale to very large models without exponentially increasing resource demands.
In Summary
ChatGPT is a polished, enterprise-friendly AI with deep safety measures and business integrations.
DeepSeek is a cost-effective, open-source alternative disrupting the AI space with its efficiency and accessibility.