Git Behind The Scene: Objects

We need to understand that git consists of four distinct but related object.

These objects are:

  • commit
  • tree
  • blob
  • tag

They are saved in .git/objects/ directory.

We refer them to as objects because they are being compressed -- using a library called zlib (when repo gets large, it uses something called as packfiles).

So if you try to open it directly using text editor or something, it won't work. You need to decompress it first.

Git is very reliable, it is unlikely for Git to delete an object, unless you do a reset --hard. If you did, it will delete the commits, trees, and blobs that you create for the commits you specified in the reset (it won't delete objects created on previous snapshots).

Hashing

Hashing is a function which converts an arbitrary input size to a unique fixed-length output. The idea is that it has to be deterministic, and every input has to have it's own unique output, so there can't be two inputs with the same output, we refer them to as collision.

As far as I know, the hashing function that is being used is SHA-1.

Every Git object is being hashed for identification.

Commit Object

A commit object has to be the most familiar among developers who have used Git previously.

It holds information about the author of the snapshot, when it was saved, a description of why it was saved, and most importantly: reference to the snapshot itself (tree).
Commit Object

To hash a commit object, you are required to provide the following information:

  • commit message
  • committer
  • commit date
  • author
  • author date
  • tree hash (snapshot)
  • parent tree hash

Those are the stuff that are going to be hashed.

What's the format?

commit {size}\0{content}  

Where,

  • {size}: the number of bytes in {content}

  • {content}:

tree {tree_sha}  
{parents}
author {author_name} <{author_email}> {author_date_seconds} {author_date_timezone}  
committer {committer_name} <{committer_email}> {committer_date_seconds} {committer_date_timezone}

{commit message}

So how does it look like? Well, it's just a plain string. Not all the string is printable since it contains hidden characters like NUL, line feed, etc.

tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579  
author Scott Chacon <schacon@gmail.com> 1243040974 -0700  
committer Scott Chacon <schacon@gmail.com> 1243040974 -0700

first commit  

Tree Object

Format:

tree [content size]\0[Entries having references to other trees and blobs]  

Where,

  • [Entries having references to other trees and blobs]:
[mode] [file/folder name]\0[SHA-1 of referencing blob or tree]

What does it look like?

tree 192\0  
40000 octopus-admin\0 a84943494657751ce187be401d6bf59ef7a2583c  
40000 octopus-deployment\0 14f589a30cf4bd0ce2d7103aa7186abe0167427f  
40000 octopus-product\0 ec559319a263bc7b476e5f01dd2578f255d734fd  
100644 pom.xml\0 97e5b6b292d248869780d7b0c65834bfb645e32a  
40000 src\0 6e63db37acba41266493ba8fb68c76f83f1bc9dd  

Blob Object

Format:

blob [size of string] NUL [string]  

References

Unit Testing and TDD

After all research and experience on this topic, this is currently my latest understanding and brief note on unit testing and test driven development.

Unit testing is a form of test.

Test only units with complex logic.

It can also be used as a design purpose. Unit that is hard to test is a solid indication of bad code. Too much dependency means a lot of mocking, meaning high coupling.

TDD is a preference, and it’s job is as a guidance. It can also promotes YAGNI (you ain’t gonna need it), which will helps you to write only the codes which are necessary.

Another Browser Rendering Nitty Gritty

I did a post before on how browser rendering works previously though, but I feel it was not enough.

There are certainly gaps in my head I wanted to fill, and I came across this conference talk which I find really informative. Not the best presentation but it's decent.

Check your self right here:

There are several key points I'd like to make about this video if you prefer not to watch it.

  • When HTML parser reaches <script> tag, it will halt parsing, fetch the script and execute it before continuing to parse.

  • When a <script> tag is found and parsing is halted, the browser will create a new thread in a new process with the browser to search for external images and CSS to fetch in parallel. It probably will also look for another <script> tag to download, but I'm not sure about executing it.

  • On initial render, it needs all the Parse Tree to finished before it proceeds to making a DOM, and it needs all DOM nodes and CSSOM to be completed before being combined into a Render Tree, and so on.

  • JavaScript can interfere with the DOM and CSSOM on initial render.

  • For subsequent rendering, it will set at a regular interval when it will reflow and repaint. Every time we mutate the DOM, it will still be immediately alter the Render Tree, but then it will not immediately proceed to the next stage (layout). Altered nodes in Render Tree will be marked dirty, the batch will traverse the tree and find all dirty trees at a regular interval, so multiple dirty nodes can be reflow and repaint on a single flow.

  • Immediate reflow occur on several actions, such as doing a font size change, resizing the browser, and accessing several properties like node.offsetHeight!

Some Performance Insights
  • Do all reading in one go, writing in one go.

Clean Function

I have learnt for a couple of days prior about how to write a clean function from Robert C. Martin or otherwise known as Uncle Bob.

The whole idea in my opinion always embarks on how expressive it needs to be. If your code is readable and easy to understand, then you have written a clean code.

Sometimes we as a programmer wants to show our co-workers and the readers of our code how smart we are. We tend to provide fancy ways of doing stuff. Making everything as one-liner as it can. If they can't comprehend what we have written, well, they are just plain dumb!

Uncle Bob said that this is a work of an amateur. Professional work the other way around.

Professional want their code to be expressive, understandable, and readable. It needs to be as concise and simple as much as possible!

Why? Because we work as a team, we want to ship fast avoid bugs! If you are the only person responsible for that part of code you written (because nobody understand it except you), then what happen if you leave?

Got the idea?

Let's dive in.

Naming

Naming function is critical to your program.

It should be a verb and it needs to clearly explain what is the intention of the function you are writing, so fellow programmers does not need to dive inside your function to see what it does, a glance from the name itself should be enough!

Do not present hidden implementation (side effects) that goes beyond the function name. Don't say that your function does A, but turns out it also does B.

Small

The first rule he aligns is that function has to be small! Functions needs to be two, three, or four lines long.

Do one thing

Function has to have one responsibility only! It needs to focus on one specific thing and abstract other implementation into a separate function.

Extract big function into smaller chunks, but do not over extract function if a restatement is the only effect it yields.

Sometimes it's hard to do it all from the beginning. Rule of thumb is, write all your implementation function first, then when you are finished, refactor it.

First make it work, then make it right!

Big function that polished into small abstracted steps seems to do more than one thing. It might look as follows:

function foo() {  
  doA();
  doB();
  doC();
}

This little abstractions made the function look like as if it does a lot.

So does it does one thing or three things?

One level of abstraction per function

Well, Uncle Bob state that if the statements within our functions are all at the same level of abstraction, they are definitely do "one thing".

What does it mean abstraction, anyway?

Well, each function you extract from the big function is an abstraction, because you are hiding it's implementation detail by wrapping it to a new function.

Making use of the function does not require you to know how it does it, only what it does (from the name).

So, what is level of abstraction?

appendToArray() and parseHTML() is two different level of abstraction. appendToArray() has simpler implementation therefore lower abstraction level. You certainly know how it does the way it does. Meanwhile parseHTML() is much more complex, more magic is going on, you are not entirely sure how it parse HTML.

Making sure your function has the same level of abstraction is key. One way to look level of abstraction is through LOC (lines of code) it has.

Another way is that a function and all it's extracted method should read well like a paragraph.

Here's the snippet from Clean Code book by Uncle Bob.

To include the setups and teardown, we include setups, then we include the test page content, and then we include the teardown.

To include the setups, we include the suite setup if this is a suite, then we include the regular setup.

To include the suite setup, we search the parent hierarchy for the “SuiteSetUp” page and add an include statement with the path of that page.

To search the parent. . .

Arguments

The best number of argument for a function is zero. More arguments means more confusion. You need to remember what to pass, the order of what to pass, and more testing.

Less is better.

Except, if the system you are trying to create has a natural order and requirements. For instance is a coordinate number setCoordinate(x, y).

Exceptions

Exceptions are beneficial since it's able to simplify error handling.

But handling error itself is a specific one thing of what a function does.

Uncle Bob wants you to extract your try/catch block.

function foo() {  
  try {
    doBar();
  } catch (err) {
    logError(err);
  }
}

Single Responsibility Principle

This is a brand new concept I just learnt. Having this concept in mind will generally make us able to design a more robust software architecture.

Single Responsibility Principle is deeply related to Coupling and Cohesion, which also relates to the idea of separating concerns (The separation of Concerns).

Uncle Bob states that SRP is the idea in which:

Each software modules should have one and only one reason to change.

Well, I did not grasp it instantly the first time I hear about it. Reason to change? What the heck it's suppose to mean?

Turns out it's not that complicated though. Theoretically.

Single Responsibility Principle is all about people.

Who do you think will likely to care about the method you wrote on class Foo?

Who do you think will likely to request changes on the method you wrote on class Bar?

If more than a single person or -- as stated by Uncle Bob -- a single tightly coupled group of people representing a single narrowly defined business function, it is likely that you violate Single Responsibility Principle!

For instance (example from here):

public class Employee {  
  public Money calculatePay();
  public void save();
  public String reportHours();
}

The financial department will likely to care about calculatePay() method.

The IT department will likely to care about save() method.

The operational department will likely to care about reportHours() method.

If, for instance, financial department request changes on calculatePay() and accidentally breaks reportHours(), whose fault is this!?

If each methods otherwise be separated to it's own class, this sort of accident would not have happened!

Concise Cohesion and Coupling

Cohesion and coupling is rather a common concept among programming languages. It derives several principles such as Single Responsibility Principle.

Cohesion

Cohesion simply means specific on what it's doing, or in other word, focus on one thing only!

It refers to all software entities: modules, classes, functions.

Cohesion is the degree to which elements of a whole belong together. Methods and fields in a single class and classes of a component should have high cohesion. High cohesion in classes and components results in simpler, more easily understandable code structure and design. - Robert C. Martin

Rule of thumb: high cohesion and low coupling.

But you will notice that low coupling and high cohesion are contradict to one and another. When you're trying to decouple an entity, you will find yourself having a low cohesive entity.

On classes or modules, it rather easier to identify either they are doing single or multiple things.

Low Cohesive:

public class Staff() {  
  public void checkEmail() {}
  public void sendEmail() {}
  public void emailValidate() {}
  public void printLetter() {}
}

Staff is responsible on multiple task, which violate cohesiveness.

High Cohesive:

public class Staff() {  
  private double salary;
  public void setSalary() {}
  public double getSalary() {}
  public void printLetter() {}

On a function, it's a bit tricky to see if a function is focused.

Large function that has been extracted to smaller pieces turns into function comprises set of steps.

It might seen as follows:

function foo() {  
  doA();
  doB();
  doC();
}

Clearly the function does three things.

According to Uncle Bob, we can say a function is focused if each steps contains the same level of abstraction.

What it means briefly is that the function has to be equal in terms of implementation detail! For instance, sum('2', '5') is a much lower abstraction level than renderPage(). Fastest way to notice abstraction level of a function is too see LOC (lines of code) it has. It does not matter if the next function also comprises of a set of steps as long as each steps has it's abstraction level equal to it's caller abstraction level.

So continuing the previous example, doA() might look like:

function doA() {  
  doD();
  doE();
  doF();
}

Another trick is, if you can't extract another function from it (nor extracting merely as a restatement*), then your function is focused.

*such as extracting an if statement to it's own function.

A function that receives a flag or ones which has a switch statement are likely to do a lot of things.

function receivesFlag(let isTrue) {  
  if (isTrue) {
    doSomething();
  } else {
    doSomethingElse();
  }
}
function handleCases(let case) {  
  switch(case) {
    case A: {
      doFoo();
    }
    case B: {
      doBar();
    }
  }
}

Coupling

Coupling is a concept for classes, modules, and component.

Two classes, components or modules are coupled when at least one of them uses the other. The less these items know about each other, the looser they are coupled. A component that is only loosely coupled to its environment can be more easily changed or replaced than a strongly coupled component. - Robert C. Martin

It refers to how related classes/modules and how dependent they are on each other, and the difficulty on replacing either entity.

High coupling would make your code being difficult to perform any changes (maintain) as well as being replaceable.

This is due to the fact that when you change your dependencies, it is likely that you need to update code that depends on it.

This is true for me, there are times when I import a community-driven module for my codebase, and when that module was being updated, it breaks my current code, therefore requiring my code to adapt to the latest version.

As for the replaceability issue, your code functionality depends on many of your dependencies. Changing your code functionality as a whole would be difficult without changing the dependencies itself, therefore hard to replace.

Low coupling is the preferred methodology.

Tightly coupled code experience much harder maintenance (because changing your dependencies means changing your code) and difficult to replace.

Some guy on the internet made an analogy on this using iPhone an another typical smartphone:

On an iPhone, if you ever find yourself having a broken battery, the cost to replace the battery is so expensive such that you're better to replace the entire iPhone itself, whereas on other smartphone you can easily replace it with a new battery.

Contributing to Open Source

I've learn from a couple of articles & tips here and there might help you to contribute to open source as well:

A tip from Dan Abramov:

Open source tip: you’ll learn A LOT from taking a single project you actively use, “watching” it on GitHub and reading every issue and PR.

It won’t make a lot of sense at first but stick with it.