Background (please correct if I am wrong): computational inference costs go up (linearly to quadratically) with context size. Therefore, it's costly to accumulate information in the context. By fine-tuning the model regularly it will "learn" and adapt to the user or to a specific task. One could e.g. fine-tune overnight using unused computational power. Thanks.