Tech Note - Permanent Versions
last edit: 2022-12-07, Pete document status: draft, would like to expand the bullet sections to prose
Use Case and Overview
Wikis and their pages are meant to be edited and evolve over time. However, sometimes we wish to publish a page to the larger web in a way that is fixed and unchanging.
From the wiki side, wiki authors want to present the page as it was on the date of publication, even though the original page may change or be deleted as its home wiki changes.
From the web side, people who want to link to the wiki page from their web pages want to know that their future readers will find the same page content as when the link was originally created.
In terms of longevity, we wish for "permanent" versions to be findable, linkable, and readable in both the near term, and the far future: decades, centuries, millennia from present -- in other words, practically forever.
To satisfy these needs, we introduce the "permanent versions" pattern.
"Permanent Versions" folder
To implement permanent versions, create a top-level folder called "permanent versions". Change the case and separator character as appropriate for your wiki. Translate "permanent versions" to your home language as appropriate.
File Formats
For each published page, choose whether you want to publish the permanent version in HTML, Markdown, or even another format such as plain text. For the sake of simplicity and longevity, it is recommended to choose just one publication format for each page rather than presenting, for example, a Markdown page together with its HTML rendering as two separate files.
HTML will usually be the preferred format for publication to the HTML web.
Non-content Sections
Styling and site navigation can be rendered in the HTML version, although any non-content additions should be simple and it easily separable from the main page content -- perhaps even more so than in the usual rendered parts of the wiki.
Above the content, there should be a link to another page in the wiki about the permanent versions pattern, which explains that pages in the permanent versions folder have been frozen and published, and can be relied to not to change, even though they are presented within a wiki where other pages do change. Readers should be encouraged to use the title and body of the frozen page to search for updated versions in the rest of the wiki.
Content Hash in Filenames
To demonstrate and confirm that the page content is unchanged, it is recommended that the filename of the published page (and therefore, the resulting URL when posted to the web) be constructed with a content hash such as SHA-256. The hash can be computed over the whole published page, or just the page content. If computed over just the page content, there should be obvious markers in the page where the hashed content begins and ends. Note that automated hash confirmations will be somewhat easier if the whole page is hashed, rather than just the content.
Do Not Edit Permanent Versions
It is recommended that published pages not be edited. If they need to be changed or removed, the content can be removed and replaced with a short message that links to a revised page in the permanent versions folder, or a message that says that the published page has been deleted.
The message should be obviously not the intended content, and of course, it will fail a hash confirmation calculation.
Other Names Considered
- "permanent version"; but the plural conveys there may be more than one version, which is good
- just "permanent"; but adding "versions" that there may be other versions in the wiki, which is good
- "permapage" or "permapages"; recalls "permalink" from blogs, but we prefer plain language over jargon
- "forever"; but doesn't improve on "permanent", and "permanent" is more often used for documents and writing
Content Hash in Filename
- use SHA-256
- use hex representation ("9a131c562ea2ffa8a24df1de03f243c6419bf775c1980aaa5e4ecccf67d1b288.html")
- no preference between uppercase and lowercase; use the case appropriate for your wiki
- base 36 representation ("2unb9wug81t8k4hesf40tvplr0zu8c3pfp18vcvsry12goxs69.html") was considered, because it's shorter and perhaps a little more human friendly. however, it doesn't save many characters (50 vs. 64), and it's less obviously generated by a well-known hash function
- filename can be just the hash ("9a131c562ea2ffa8a24df1de03f243c6419bf775c1980aaa5e4ecccf67d1b288.html"), or some title words and then the hash ("dark and stormy night 9a131c562ea2ffa8a24df1de03f243c6419bf775c1980aaa5e4ecccf67d1b288.html")
- don't use multibase because it adds complication; use a bog-standard hash instead
- we assume that detecting and confirming SHA-256 will require computing the hash over the content, but that's reasonably inexpensive, and based on the format of the hash, SHA-256 will be one of the first hashes to guess
- IPFS CID was considered, but it's complicated, and requires care to generate unambiguously
- IPFS CID Reproducibility - Stack Overflow
- When adding or hashing a file manually via the api... - Stack Overflow ... "there are options to alter how files are chunked into blocks, how those blocks are linked together, and how the blocks are hashed. If any option values differ then the resulting hash and the CID that contains it will be different, even if the input file is the same."
Implementations
More information about using Massive Wiki Builder to generate a permanent version of a particular file will go here.
Discussion
WLA (2022-12-11): created an airtable for constructing Permanent Versions property metrics: https://airtable.com/appQ9KYTm2YxgCR2A/tblpTrYtVSqa7p5Xl/viwyiBVcKmR1HlBdc?blocks=hide
- the contents of the fields need discussion and clarification.
- it seems that we probably want to truncate the SHA-256 hash to the first 128 bits (32 hex characters); 64 characters take up a lot of room on a line in an email or a web page
- A similar but different way to do it, use the regular filename and filepath as a page, but save a fixed version suffixed with a date stamp: "/example folder/example page 20221212T183227.123Z.html" or similar. Cons: doesn't integrate a hash to guarantee no changes, different fixed pages will end up in different directories; original page name may disappear so preserving the filename may not help you find the current version anyway.
- A similar but different way to do it: Tech Note - Permanent HTML Snapshots