hckrnws
back
10 months ago
Mon May 20, 2024 3:33pm PST
26× Faster Inference with Layer-Condensed KV Cache for Large Language Models
@georgehill
read article
comments:
add comment
loading comments...