10 months ago
Mon May 20, 2024 3:33pm PST
26× Faster Inference with Layer-Condensed KV Cache for Large Language Models
read article
comments:
add comment
loading comments...