3 months ago
Mon Oct 13, 2025 2:38pm PST
This question is inspired by the work done in https://transformer-circuits.pub/2025/attribution-graphs/biology.html . When we give a prompt such as https://blog.fsck.com/2025/10/05/how-im-using-coding-agents-in-september-2025/#:~:text=are%20the%20fixes%20they%20propose%20the%20correct%20ones , are we really expecting the llm to have built an adequate representation of 'whatever is the required world model' and reasoning to be able to answer if the 'PR is correct?' . Is any research on this?