2 months ago
Mon Jan 27, 2025 8:02am PST
Ask HN: Could you explain deepseek r1 to me?
I want to gain a deeper understanding of this topic. I’ve read the paper and watched a few videos, but I lack the background and I’d really appreciate insights from someone with a solid grasp of it.

From what I understand, the key contribution lies in its cost efficiency during training. However, is this referring to the whole training (including pre-training) phase, or just the reinforcement learning stage?

Additionally, it seems that the cost savings primarily come from improvements in the reward model. The paper mentions two examples: for math problems, they use fixed answers to generate reward scores, and for LeetCode problems, they rely on a compiler.

However, these examples cover only a narrow set of problem types. Not all logical challenges fall under math or coding. Can a model trained mainly on math and coding problems generalize well to other types of logical reasoning tasks?

comments:
add comment
loading comments...