Workflow, Github Global Pull Requests

Posted by Justin Reagor Sat, 28 Feb 2009 17:14:00 GMT

Just got this question and thought I'd make a blog post out of it.

why would someone want me to pull his changes into MY fork of this project?

This happens all the time, and usually for popular projects. There are a few reasons which lead themselves nicely into a discussion on Git[hub] distributed workflows.

The Short Answer

Git's distributed architecture can ensure that no single entity is the "central repository". Thus when someone feels they have important changes that all the forks and mirrors should utilize, they send out a global pull request to all or some of the forks on Github.

An Even Shorter Answer

It's a personal preference, one of Git's many available workflows.

Hardcore Forking

Personally, I have never worked that way on Github. I have done two things in maintaining forks.

  • Fork fork; my fork is an entirely separate entity with a separate development cycle and process (or even end result)

  • Support fork; my fork is a fork of the "sanctioned" main repo, and I am doing support work for the main project I will pull request to the "gate keeper" of that project (the main author or project maintainer).

Now if I really felt something was of urgency to all the forks I might do a global pull request if the main repo author didn't like my commit, he's gone AWOL and/or I still felt it was important (security, etc).

Otherwise, fork maintainers should have a local branch for tracking every other remote repository of other fork maintainer's. This is what I do to track only people I think have good commits that I can pull and include into my fork repo. Branches in this respect provide the links to other "node" repositories in the distributed glory which is Git.

Git suggests in their man pages and docs that you make the development decision to only pull in new commits, and never pushing out. Tracking remote repos demonstrates the beauty of following this workflow of "pull only".

Just Pull It

For shits and giggles lets build an example to demonstrate why "pull only" might be such a great idea for distributed project development.

I can think of no better example than Roy Fielding and the IETF's HTTP standard.

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext...

We are going to map the following HTTP technologies to git fetching commands.

  • git-push maps to Comet, a neologism to describe a web application model in which a long-held HTTP request allows a web server to push data to a browser, without the browser explicitly requesting it.

  • git-pull maps to Ajax, an acronym which describes a web browser technology capable of requesting only the content that needs to be updated, thus drastically reducing bandwidth usage and load time.

Comet, in general, uses the web server to push updates to the browser when they are available. Handy at times but breaks the HTTP protocol. It requires the server-side to maintain a connection and known state as the server needs to know whom the client is and where to send the new data. As we all know that's not very efficient in the grand scheme of HTTP. State is tough to keep synchronized. It limits the ease of going on and offline, as well as the client moving around across the internet (changing IPs, etc).

In contrast Ajax requests are much lighter. The client maintains when to request new content. Thus making a decision of how to handle the content, what it is and where it's coming from. When you are only doing git-pull you are choosing what to add and where to add it. There is more control over your own repository.

Summing it all Up

In a distributed environment everyone becomes the gate keeper to their own repository.

This is great for open source projects because the code is always kept free from central dictatorship.

Power is gained through this autonomous workflow when you allow anyone, at any moment, to add changes and bug fixes to the project or pull down someone else's updates for review and merging.

when timeouts aren't timeouts 2

Posted by Colin A. Bartlett Tue, 10 Feb 2009 00:11:00 GMT

Last Friday, a good chunk of our team spent a while debugging a problem afflicting some sites on one of our shared hosting machines.

After a bunch of red herrings, some TCP dumps determined that the sites in question were all hung up on numerous DNS requests. That led us to the call to a web service we use on the sites in question. As it turns out, the IP geocode service being called, HostIP.info, was completely down. (It’s free; you get what you pay for.)

We assumed this couldn’t be the case, since those calls are wrapped in a Ruby Timeout calls with a maximum of 2 seconds each (Timeout.timeout(2)). Well, apparently, Ruby Timeout just doesn’t work sometimes.

Big props to Andy for sleuthing that down. And bigger props to Philippe Hanrigou for writing that very detailed article where he says, specifically:

“In particular, initiating network connections and/or a broken or slow DNS server will typically block the whole Ruby process while the call completes.”

Bingo! We’re going to try Philippe’s library on these projects and others in the future when we need guaranteed timeouts.