1 week ago

Mon Feb 2, 2026 1:10am PST

Show HN: Democlean – Score robot demos by motion quality

I built a CLI that scores robot demonstration episodes using mutual information between states and actions.

The problem: robot learning datasets contain bad demos (jerky movements, hesitation, inconsistent timing). Training on these hurts policy performance. Manual review doesn't scale.

    pip install democlean
    democlean analyze lerobot/pusht

democlean scores each episode by how predictable the actions are given the states. Smooth, purposeful motion scores high. Jerky, inconsistent motion scores low.

Validation: I correlated MI scores with motion metrics on lerobot/pusht (human teleoperation data). High-MI episodes had 12% lower jerk (p=0.02) and 24% higher state-action correlation (p=0.03). I did not train policies to measure downstream improvement.

Limitations I want to be upfront about:

- MI correlates with episode length (r≈0.8). Longer episodes score higher.

- This measures motion smoothness, not task success.

- Works best with 50+ episodes from a single task.

- Inspired by DemInf (Hejna et al., RSS 2025) but uses raw KSG estimation instead of their VAE pipeline. Simpler, probably less accurate for high-dimensional observations.

Complements score_lerobot_episodes which catches visual issues (blur, lighting). This catches behavioral issues.