ModelForge

Wed Dec 18, 2024 1:36pm PST

Karma:

332

submitted

Wed Dec 31, 2025 3:40pm PST

The State of LLMs 2025: Progress, Problems, and Predictions

@ModelForge

3

Tues Nov 4, 2025 3:00pm PST

A Researcher's Field Guide to Non-Standard LLM Architectures

@ModelForge

2

Mon Nov 3, 2025 4:59pm PST

Explanation of Gated DeltaNet (Qwen3-Next and Kimi Linear)

@ModelForge

3

Mon Oct 27, 2025 3:40pm PST

The Core Components of Modern LLMs and the Models Beyond Transformers [video]

@ModelForge

3

Wed Oct 15, 2025 2:17pm PST

Popular Attention Alternatives: GQA, MLA, SWA

@ModelForge

4

Mon Oct 13, 2025 6:24pm PST

Multi-Head Latent Attention

@ModelForge

4

Sat Oct 11, 2025 7:57pm PST

Thinking Machines Lab Co-Founder Departs for Meta

@ModelForge

7

Fri Oct 10, 2025 8:41pm PST

OpenAI's internal Slack messages could cost it billions in copyright suit

@ModelForge

1

1

8

Sun Oct 5, 2025 3:55pm PST

LLM Evaluation from Scratch: Multiple Choice, Verifiers, Leaderboards, LLM Judge

@ModelForge

4

Wed Aug 20, 2025 2:01pm PST

Gemma 3 270M re-implemented in pure PyTorch for local tinkering

@ModelForge

14

57

417

Sun Aug 10, 2025 3:06pm PST

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

@ModelForge

16

97

490

Wed Dec 18, 2024 2:17pm PST

LLM Research Papers: The 2024 List

@ModelForge

5

Wed Dec 18, 2024 1:37pm PST

Scaling Test-Time Compute with Open LLM Models

@ModelForge

3