It never had to be like this: the git “index”

Hot on the heels of another Git-related article that was making the rounds recently, I was reminded of Git’s own structure and how it influences user experience. Specifically what we assume is part of how Git works, is actually a part of the porcelain (in Git speak, the user interface and commands that back it). As someone developing a Git client, it’s interesting to think Git’s user experience could be significantly different with a different interface, particularly because people have a particular mental model of Git influenced by the default interface. Said influence is enough that libgit2’s API emulates the porcelain’s semantics, in-process.

The index, stage, cache – whatever you want to call it – is the middleman that records the state of files. Confusingly, all the names are arguably valid:

  • It’s the index because it keeps track of files tracked by Git and their states in operations like merges.
  • It’s the stage because files are added and removed from it to be applied into commits; that is, it’s the “staging area”.
  • It’s the cache because it keeps track of metadata used to check if a file has changed. The index format records a bunch of stuff like device and inode that Git never actually tracks.

It should be noted that the index is actually not what gets written out as part of a commit. While the differences recorded in the index are used to make the commit, the directory structure – not the index – is stored in tree objects. The format of the tree entries is much simpler than the index’s. The index is more a convenience for tracking changes out of the checked-out working tree.

This underlies my point – the index itself isn’t really innate to how Git actually stores things. A custom Git tool could completely omit the concept of the index and simply have another way to signal changes to trees, which are fundamental structures. One example of this is git9, a Git implementation for Plan 9, which has an alternative implementation of the index that omits the concept of a staging area altogether and simply indicates if files are tracked or not.

While people talk about how confusing the terminology, let alone the entire concept of the index, is in Git, we should remember that the index isn’t some concept fundamental to Git’s core workings. Instead, it’s only fundamental to the most popular Git implementation and things trying to be like it; the fact that it’s the most popular doesn’t make it the only solution to the problem. The toilet isn’t just the porcelain, it’s the whole sewer; there’s room for bidets in source control.

2 thoughts on “It never had to be like this: the git “index”

  1. Jan August 19, 2021 / 1:31 am

    gitless was another take at this, providing a git compatible porcelain with nicer semantics

Leave a Reply

Your email address will not be published. Required fields are marked *