2 years ago
Fri Jan 6, 2023 6:54pm PST
Why do Internet Archive archived pages take so long to load?
I love using the Internet Archive and really appreciate the mission to archive data that too often ends up disappeared from the web. Using it recently I was curious—why is is it often very slow (7-10 seconds) to load archived pages, or even to search to discover a page is not archived?

I understand that due to the size of the data stored the full archive may retrieve archived pages from cheaper/colder storage to reduce storage and rare retrieval costs.

Curious to read more details on their architecture[0], whether temporary CDN-based static page caching might be feasible for hotter pages, etc.

I suspect it would be a qualitatively different-feeling archive lookup experience using it if it loaded many of the pages as fast as modern sites.

[0]: There are

https://help.archive.org/help/archive-org-site-architecture-and-glossary/

and

https://web.archive.org/web/20090709135157/http://www.cs.huji.ac.il/~kirk/IAArchitecture.pdf

but I wonder how out of date the latter is.

Update: came across https://twitter.com/textfiles/status/1217202059163979776?lang=en which suggests it is the server set / hard drive look up step

comments:
add comment
loading comments...