Git Behind The Scene: Objects

We need to understand that git consists of four distinct but related object.

These objects are:

  • commit
  • tree
  • blob
  • tag

They are saved in .git/objects/ directory.

We refer them to as objects because they are being compressed -- using a library called zlib (when repo gets large, it uses something called as packfiles).

So if you try to open it directly using text editor or something, it won't work. You need to decompress it first.

Git is very reliable, it is unlikely for Git to delete an object, unless you do a reset --hard. If you did, it will delete the commits, trees, and blobs that you create for the commits you specified in the reset (it won't delete objects created on previous snapshots).

Hashing

Hashing is a function which converts an arbitrary input size to a unique fixed-length output. The idea is that it has to be deterministic, and every input has to have it's own unique output, so there can't be two inputs with the same output, we refer them to as collision.

As far as I know, the hashing function that is being used is SHA-1.

Every Git object is being hashed for identification.

Commit Object

A commit object has to be the most familiar among developers who have used Git previously.

It holds information about the author of the snapshot, when it was saved, a description of why it was saved, and most importantly: reference to the snapshot itself (tree).
Commit Object

To hash a commit object, you are required to provide the following information:

  • commit message
  • committer
  • commit date
  • author
  • author date
  • tree hash (snapshot)
  • parent tree hash

Those are the stuff that are going to be hashed.

What's the format?

commit {size}\0{content}  

Where,

  • {size}: the number of bytes in {content}

  • {content}:

tree {tree_sha}  
{parents}
author {author_name} <{author_email}> {author_date_seconds} {author_date_timezone}  
committer {committer_name} <{committer_email}> {committer_date_seconds} {committer_date_timezone}

{commit message}

So how does it look like? Well, it's just a plain string. Not all the string is printable since it contains hidden characters like NUL, line feed, etc.

tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579  
author Scott Chacon <schacon@gmail.com> 1243040974 -0700  
committer Scott Chacon <schacon@gmail.com> 1243040974 -0700

first commit  

Tree Object

Format:

tree [content size]\0[Entries having references to other trees and blobs]  

Where,

  • [Entries having references to other trees and blobs]:
[mode] [file/folder name]\0[SHA-1 of referencing blob or tree]

What does it look like?

tree 192\0  
40000 octopus-admin\0 a84943494657751ce187be401d6bf59ef7a2583c  
40000 octopus-deployment\0 14f589a30cf4bd0ce2d7103aa7186abe0167427f  
40000 octopus-product\0 ec559319a263bc7b476e5f01dd2578f255d734fd  
100644 pom.xml\0 97e5b6b292d248869780d7b0c65834bfb645e32a  
40000 src\0 6e63db37acba41266493ba8fb68c76f83f1bc9dd  

Blob Object

Format:

blob [size of string] NUL [string]  

References

Types of Operating System

There are roughly two types operating system that nowadays system uses: monolothic and microkernel.

It differs on where this two operating system's services located, basically.

Monolithic OS

Monolithic has all services lies on its kernel, using the same kernel's address space, which means, crash in monolithic can be catastrophic. It halts the entire PC.

Monolithic kernels can be compiled to be more modular, meaning that module can be inserted to and runs from the same space that handles core functionality (kernel space).

Example for this kind of operating system is Linux.

Microkernel OS
  • Advantage: small, failed service can easily be restarted.
  • Disadvantage: performance due to constant system call.

Taken from Wikipedia: a microkernel is a minimal computer operating system kernel which, in its purest form, provides no operating system services at all, only the mechanisms needed to implement such services, such as:

  • low-level address space management

  • thread management

  • inter-process communication (IPC).

So other OS services like file system, process management, or network protocols are running as user-level programs.

To be able to communicate between user-level program they use, for instance to a file system, you need to use the IPC (inter-process communication) from the OS.

Due to this circumstances, what in a monolithic kernel requires a single system call may require in a microkernel multiple system calls and context switches.

To elaborate this, when a program needs to use the disk, it needs to system call the IPC which will call the file system, subsequently, the file system, which a user-level program itself, needs to execute several system call before being able to get itself access to the disk.

An example of this OS would be MacOS.

Unit Testing and TDD

After all research and experience on this topic, this is currently my latest understanding and brief note on unit testing and test driven development.

Unit testing is a form of test.

Test only units with complex logic.

It can also be used as a design purpose. Unit that is hard to test is a solid indication of bad code. Too much dependency means a lot of mocking, meaning high coupling.

TDD is a preference, and it’s job is as a guidance. It can also promotes YAGNI (you ain’t gonna need it), which will helps you to write only the codes which are necessary.

Operating System Protection Boundary and System Call

Nowadays CPU has multiple ring (protections). Normal users (programs) have restrictions on giving CPU instructions, especially access hardwares like RAM, disk, GPU, etc.

Kernel-privileged programs has no limitation on giving CPU instructions. Operating system, for instance, is a program that operates under kernel mode.

For normal application to be able to access the underlying hardwares, it needs to ask on behalf of the operating system.

Operating system has an API served just for that specific purpose, each request is referred to as a “system call”. By specifying a predefined API, programs now have constriction on what it can do, thus making it much more safe for our application to not break the entire system.

For short: programs running in user mode can't directly access any hardware on it's own, it needs to ask the OS to do it for them, namely "system call". This occur due to safety, that is why operating system exist in the first place.

Services provided by OS includes scheduler, memory manager, block device driver, file system, and more. These service types categorize the huge list API.

Examples of the system call API and their corresponding services (in Windows OS):

File System:

  • CreateFile()
  • ReadFile()
  • WriteFile()

Device Manipulation:

  • SetConsoleMode()
  • ReadConsole()
  • WriteConsole()

Other things to note:

The compiler generates native machine code that can be run straight on the processor. The executable files that you get from the compiler, however, contain both the code and other needed data, for example, instructions on where to load the code in the memory.

When you make a system call, it is just an instruction in the machine code that calls the OS.

React's .createElement()

I'm trying to dig some of React's API. This is one of them. I refer to Paul O Shannesy's video on Building React from Scratch.

Due to JSX we don't actually write this by ourselves, but know that JSX will compiles to vanilla JavaScript which uses this API.

function createElement(type, config, children, [children2, ...]) {  
  // Clone the passed in config (props). In React we move some special props off of this object (keys, refs).
  let props = Object.assign({}, config);

  // Build props.children. We'll make it an array if we have more than 1.
  let childCount = arguments.length - 2;
  if (childCount === 1) {
    props.children = children;
  } else if (childCount > 1) {
    props.children = [].slice.call(arguments, 2);
  }

  return {
    type,
    props
  };
}

The type parameter is a string, config is an object, and the children are another createElement() invocation result.

Children of the returned object will be placed inside of the props key, it can be an array or not depending on the number of children.

Introduction to Operating System

I have been really curious on the subject of operating system. Since I left my school days before I hit this subject, I suppose I can just learn this by myself. Couple of days ago I took the course on Udacity, tought by lecturers from University of Georgia, which also available on Youtube, for free.

Operating system sounds really fancy. But it turns out to be just another computer program like those we as a software engineer build day to day.

What differ is the purpose.

Operating system is a program that manages other programs.

It manages how each of our program uses the available resources such as the disk, memory, or processor.

It provide isolation and protection among applications, and hides hardware complexities through system calls (sort of API of using your hardware on behalf of the OS).

Examples of operating systems are Mac OS and Linux, which both are Unix-based. Other forms might include Android, iOS, and Symbian on embedded platforms.

A Little Question on FPS

Rendering optimization for web application requires us to know how to speed up FPS.

A display can have 60 Hz, 120 Hz, 144 Hz, or other variance (known as refresh rate).

60 Hz means that it will refresh the image it displays 60 times a second. So we can put up 60 images each second (60 fps), more images won't be displayed. If we have less images, say 30 fps, then two refreshes will display the same frame.

The question I have been wondering about is, what if I'm just looking at a static web pages without any events triggered to it.

Is the GPU still producing 60 fps of the same exact image, or maybe it does not produce any image at all?

If we take a look at Wikipedia's definition on GPU:

A graphics processing unit (GPU), occasionally called visual processing unit (VPU), is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device.

I suppose the display refreshes it screens and show images from the frame buffer.

If no events triggered, then the GPU does not need to alter the frame buffer.

What is the frame buffer anyway?

Well, go back to Wikipedia and we will find out that it is:

A framebuffer (frame buffer, or sometimes framestore) is a portion of RAM containing a bitmap that is used to refresh a video display from a memory buffer containing a complete frame of data.

It all makes sense now.

Another Browser Rendering Nitty Gritty

I did a post before on how browser rendering works previously though, but I feel it was not enough.

There are certainly gaps in my head I wanted to fill, and I came across this conference talk which I find really informative. Not the best presentation but it's decent.

Check your self right here:

There are several key points I'd like to make about this video if you prefer not to watch it.

  • When HTML parser reaches <script> tag, it will halt parsing, fetch the script and execute it before continuing to parse.

  • When a <script> tag is found and parsing is halted, the browser will create a new thread in a new process with the browser to search for external images and CSS to fetch in parallel. It probably will also look for another <script> tag to download, but I'm not sure about executing it.

  • On initial render, it needs all the Parse Tree to finished before it proceeds to making a DOM, and it needs all DOM nodes and CSSOM to be completed before being combined into a Render Tree, and so on.

  • JavaScript can interfere with the DOM and CSSOM on initial render.

  • For subsequent rendering, it will set at a regular interval when it will reflow and repaint. Every time we mutate the DOM, it will still be immediately alter the Render Tree, but then it will not immediately proceed to the next stage (layout). Altered nodes in Render Tree will be marked dirty, the batch will traverse the tree and find all dirty trees at a regular interval, so multiple dirty nodes can be reflow and repaint on a single flow.

  • Immediate reflow occur on several actions, such as doing a font size change, resizing the browser, and accessing several properties like node.offsetHeight!

Some Performance Insights
  • Do all reading in one go, writing in one go.