Ronaldo Vitto Lewerissa

Software engineering learning documentation.

Git Behind The Scene: Objects

We need to understand that git consists of four distinct but related object.

These objects are:

  • commit
  • tree
  • blob
  • tag

They are saved in .git/objects/ directory.

We refer them to as objects because they are being compressed -- using a library called zlib (when repo gets large, it uses something called as packfiles).

So if you try to open it directly using text editor or something, it won't work. You need to decompress it first.

Git is very reliable, it is unlikely for Git to delete an object, unless you do a reset --hard. If you did, it will delete the commits, trees, and blobs that you create for the commits you specified in the reset (it won't delete objects created on previous snapshots).

Hashing

Hashing is a function which converts an arbitrary input size to a unique fixed-length output. The idea is that it has to be deterministic, and every input has to have it's own unique output, so there can't be two inputs with the same output, we refer them to as collision.

As far as I know, the hashing function that is being used is SHA-1.

Every Git object is being hashed for identification.

Commit Object

A commit object has to be the most familiar among developers who have used Git previously.

It holds information about the author of the snapshot, when it was saved, a description of why it was saved, and most importantly: reference to the snapshot itself (tree).
Commit Object

To hash a commit object, you are required to provide the following information:

  • commit message
  • committer
  • commit date
  • author
  • author date
  • tree hash (snapshot)
  • parent tree hash

Those are the stuff that are going to be hashed.

What's the format?

commit {size}\0{content}  

Where,

  • {size}: the number of bytes in {content}

  • {content}:

tree {tree_sha}  
{parents}
author {author_name} <{author_email}> {author_date_seconds} {author_date_timezone}  
committer {committer_name} <{committer_email}> {committer_date_seconds} {committer_date_timezone}

{commit message}

So how does it look like? Well, it's just a plain string. Not all the string is printable since it contains hidden characters like NUL, line feed, etc.

tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579  
author Scott Chacon <schacon@gmail.com> 1243040974 -0700  
committer Scott Chacon <schacon@gmail.com> 1243040974 -0700

first commit  

Tree Object

Format:

tree [content size]\0[Entries having references to other trees and blobs]  

Where,

  • [Entries having references to other trees and blobs]:
[mode] [file/folder name]\0[SHA-1 of referencing blob or tree]

What does it look like?

tree 192\0  
40000 octopus-admin\0 a84943494657751ce187be401d6bf59ef7a2583c  
40000 octopus-deployment\0 14f589a30cf4bd0ce2d7103aa7186abe0167427f  
40000 octopus-product\0 ec559319a263bc7b476e5f01dd2578f255d734fd  
100644 pom.xml\0 97e5b6b292d248869780d7b0c65834bfb645e32a  
40000 src\0 6e63db37acba41266493ba8fb68c76f83f1bc9dd  

Blob Object

Format:

blob [size of string] NUL [string]  

References

Written by Ronaldo Vitto Lewerissa

Read more posts by this author.