Schema on write is better to live by
Thoughts on a decade of personal knowledge management

Let me take you with me on my walk of shame. It's over a decade old, and it includes hundreds of browser tabs that I kept on life support by never, ever restarting my machine. It includes bookmarks bars that I can no longer scroll to the end of in any reasonable time. Multiple knowledge management systems, one paper that I'm now glad I never sent in for peer-review, and thousands of pieces of information float around me, doomed to never perish - but to never be remembered again. Until a new civilisation finds the remains of what was once our society, unearths my devices, and intelligently decides that it isn't worth their time to find structure or intent in this jumble of interesting.

I have a playlist of things on Youtube I found interesting, was relevant or that I wanted to watch later - it's literally YT's Watch Later playlist - that is now too long for me to ever work through. I've tried every Mindmap software in the last 10 years. I can no longer open my tabs on Chrome on Android. I cross the point within days of getting a phone where it gives you a nihilistic :D in place of the number of tabs, and the current state is too numerous to count.

If any of this rings true - even a little bit - maybe you'll like this. Or maybe you won't, but it'll still be fun thinking about it.

I have been obsessed with trying to remember more for as long as I can remember. The digital explosion of access to the amazing world we live in has only made it worse, to the point where I remember being obsessed with why I had an obsession with remembering and archiving. I initially diagnosed myself as a digital hoarder, someone who hangs out on r/DataHoarder and buys hard drives faster than a chia miner.

Except I wasn't - and am not. My obsession, I've come to find, has a lot to do with interest and a desire to complete. I find things that lead to threads of thought that lead into even more interesting places, and I'm often interrupted - and I'd just like to finish. On top of this, I'd like the things I find to help me in the future, but beyond the limits of my own memory. As a developer and an engineer, I've enjoyed the freedom and space of mind that comes from only having to store pointers to the things I remember, knowing where to find them or the right keywords to invoke them but not the actual things.

Why then am I so bad at it? Why do I have this digital knapsack of shame? Why does everyone else? How can we do better?

§The Problem is Structure

Let us define the problem here - at least as I personally experience it. I find things and think about things that I would like preserved for later. I'd like to remember how to find them later, and be successful in doing so. This is all I want. I'd like swap space for my mind. For a stretch goal, I'd like to preserve mental spaces - the context surrounding thoughts and the links that connect them - in a way I can revisit later, and embed myself into, and continue the thought process therein.

Why is this so hard? I have found and heard of many different solutions, or many different framings of this problem in ways that suggest a solution. The problem is having a good enough search engine is one I hear often and believe at least partially. The problem is having links that connect things argues Dendron and Roam, modern versions of the mindmap. The problem is finding a new modality to represent the mind as it thinks, argued I at one point - trying to find something that freed us from the constraints of desktops and windows and HTTP.

These are all problems, but I've come to believe that the problem lies in Schema and the enforcement of it. Not just in the psychological meaning of it, but in the computing definition of it as a Database Schema. Schema is the structure of a database, defining the nature of the things contained by a database, the relationships between them, and what information is permitted. Information without schema - implicit or explicit - is entropy.

You can get really philosophical with this concept. A star can be considered information organised through the schema of gravity, the strong and weak nuclear forces, and the underlying rules of the universe. A consciousness observing a probabilistic event can be considered an imposition of schema. Your mum cleaning your room can be considered a conversion of entropy to information - enforced through the ultimate database engine: your mum.

I apologise - that got away from me. Looking at information - especially the information that surrounds you and that you would like to preserve, the importance of structure becomes clear. From creation to introduction, to storage, retrieval and finally consumption, structure needs to be imposed at some point along the way if something is to become useful. Even though we're only interested here in the preservation of the part of this lifecycle that goes from storage to retrieval, we can see this structure transform through the entirety.

Consider an article that you like and would like to make use of in the future. Upon creation it is granted the schema of an article, with somewhat rigid guidelines imposed on it by the form. Moving to publication, the medium and the publisher impose their own structure, sometimes transforming the piece simply through this act - as anyone involved in publishing a book will tell you. If you find it online, you find it through the overall cage imposed on it by the search engine and your browser, at which point you peruse it and reorganise it based on your personal context and the things that are important to you. Upon making later use of it, you once again impose a new structure - perhaps through a different personal context - into something else.

Most of the payment I have received from society has been for the reorganisation of information, so this is pretty important to me. Looking at storage and retrieval from the lens of schema makes something clear: there is a spectrum that stretches from schema-on-read to schema-on-write. On one side, you have a laundry basket that holds all your clothes to be washed, and a wardrobe that holds clean clothes: separated by function, form and the manner in which they are preserved best. On the other, you have a room that simply contains clothes, in a mix of washed, unwashed, and perhaps in between. The former is schema-on-write. As you deposit things into the room, you impose structure that best resembles what each item is. The latter, unfortunately is schema-on-read: you sniff things as you pick them up, in an attempt to deduce the nature of the item you are holding in an attempt to impose structure once again so you can make use of it.

I observe the same thing with my personal library. The things I dump into a playlist or a bookmarks bar without much fanfare lose most of their structure, very quickly losing all ability to be retrieved to my diminishing memory of them. Chances are when I need them again I will have forgotten of their existence. Without the physical world to impose warning signs like a lack of space or that smell I need to investigate, my digital knapsacks are happy to hold infinitely many items without complaining. They are as good as lost.

§The solution is structure as early as possible

It seems to me that the only solution forward is to impose as much structure as I conveniently can as things go into storage. It is a preponement of the work I will have to do, but it also makes this work much, much smaller. This means using standardised keywords, tagging, hierarchical organisation that isn't simply connective but also restrictive in what it allows for. It means an examination of whether something must be stored in its entirety or it's better cut up into the parts that best fit the structure. It is a lot of work.

There is still a lot of structure that is present at the moment of storage that I may not be able to reasonably convert to schema or metadata. Things like my personal context at the moment of consumption, expected intents I'm trying to cover for at the point of recovery, connective links to concepts in my head that haven't fully materialized yet, among other things. I have been attempting to solve those through technology, through the fact that the machines we use know a lot more about us if we only know to ask them the right way, but that's a discussion I will leave for another time.

To me, this is more of an understanding that the systems I use need a little more help from me to be better at retrieval, not just storage. This may sound like common sense, but it's taken me a lot of failed attempts to accept. It has always been easier to build a solution or to try to change my environment than to change myself - or even accept that I should.

It's also me understanding how I can make these systems better. Most of my efforts so far have been focused on improving retrieval, to cobble together structure at the point of use, rather than the point of storage. How can we change things if we refocus instead on making storage as strongly imposing of structure as we can, instead of shifting this technological and mental load to retrieval?

For one, I've begun preferring less open platforms. I still use Notion, but it is tightly controlled into fixed pages. I make use of playlists and bookmarks, but I resist the urge to have a knapsack folder I throw things into. I try to accept that going into the knapsack is as good as being lost to time, and if I don't have the time to write down the schema it exists under, I should skip this one.

It's also helped me relax a little, and remember that no matter how strong your efforts, how precious the memory, how interesting something is or how much potential it has for change, a lot of things will be lost to the winds of time, washed away like tears in rain. It's not worth making a movie about - maybe because it's already been done - but it's important for me to remember this.

Hrishi Olickel
21 Aug 2021