February 26, 2010

Git – check your config

Posted in General at 8:46 pm by Srinivasan

If you are working with various projects; say work project and a sideline fun project or a open-source project for that matter. It’s important that you get author name or author email id right when you commit.

You don’t want your marty@localhost or xxx1337.AT.yahoo.COM to be showing in as author id in the corporate repository. Some opensource projects reject the commits if it is not from the registered email id.

You normally would have updated the git global config to a username/email id and forgot about it. So the tips is that

1) Set the global config to a name/id to the primary purpose you use that machine to, like home machine set to your personal id. office machine set to your office id.

git config --global user.name "My name is..."
git config --global user.email my_email@domain.com

2) Use git local config on sensitive projects to override global config settings. The easiest way to override to config is to run
git config user.name "My name for this project is..." on your local project folder.

You can edit the whole config file in vi/command line editor easily by using the command
git config --global -e
git config -e (to edit local config)

Finally use git config -l on the project folder to see merged config information that will be used when you commit.

N.B.
Install bash completion for git if you are not using it already.

http://justamemo.com/2009/02/09/bash-completion-along-with-svn-and-git-tab-completion/

December 25, 2009

It’s time to git

Posted in General, Software Development, Uncategorized tagged , , , , , , at 5:50 pm by Srinivasan

Every now and then a new technology comes, but few gather momentum and finally get adapted by masses. Git is certainly on the right track. GitHub certainly fueled the adaptation of git to masses.

Git is mainly effective/faster when used at command line. There are efforts in building UI around it like Eclipse plugin, but they aren’t completely done. I am more comfortable at terminal, so haven’t checked the UI progress lately.

With agile practices like pair programming, combined with distributed development – people want a distributed source control system that is snappier and comes with tools.

Couple of interesting things I liked from git are:

Git-Daemon: git-daemon utility bundled with the git release is a quick way to share your code across the network. Say you are at a barcamp or a cofee talk meet with a friend. he can share his local git repository over the network just by running

git-daemon --base-path=parent_path_to_the_repo

And you could clone his repository to your local by

git clone git://server-location/repo

Git-SVN: Those who are using SVN as production repository for your source code could still use git locally. git-svn helps you to sync the current workspace code into SVN directly. This is another reason for people to start using git locally, to get all the benefits of it; and still check into SVN as corporate needs you to.

Git-Stash: Git stash could be said as a coding context, say you have modified couple of files to fix bug 121 – you could create a context that store the files that were changed. Then it reverts the code to the HEAD (clean) state, so you could attack bug 75 and commit it before merging back the code for bug 121. These contexts are easy to create and so convenient in labeling them correctly.

The Dilemma:

For those still saying – ‘yeah, git is cool. but with the whole distributed thing – isn’t there a chance that I loose control of the code my developers do for me? How do I track them?’

Checking the code into the repository often is a practice of discipline, it could happen with use of any repository. With git you could ask your new developer to share his local git repository so you could give an overview, rather waiting until he gets access to the central repository & checks in his crap. In fact git gives the ability to pull code/feedback earlier, than until something gets checked in.

Getting developer access to central repository is a longer process normally in any corporate, instead of waiting for that time the developer can start coding, and as a lead you could keep track on progress.

Those who are looking for patterns to control the repository effectively look at this presentation: http://www.slideshare.net/err/git-machine starting from slide no.72 the author have pointed out several patterns (Anarchy, Blessed, Lieutenant, & Centralized) to manage the repository.

With all said, its time everyone should consider a distributed source control system – because it enables developers, and with a pattern you chose to control your repository its a win-win.

More Links:

  • How Git Index/Staging Area simplifies commit – http://plasmasturm.org/log/gitidxpraise/
  • A Git Branching Model – http://nvie.com/archives/323
  • November 6, 2009

    Serialization/Streaming Protocols: What we got?

    Posted in Software Development tagged , , , , , , , , at 5:40 pm by Srinivasan

    It’s takes a huge effort to build a friendly API, and build a community around it. But once you have a popular service API, the next thing is the handling the traffic. It doesn’t have to be external API, it can be a your web front-end posting requests to the backend service layer.

    As the user base explodes, a bit saved is bandwidth and money saved. This applies to mobile clients as well. With things hosted in clouds these days, it does matter how much bandwidth you use and how less resources you consume.

    Two things magnifies the problem:

    1) User Base – if the user base is really large then even transferring 1MB per user over wire is going to hit the wall. Imagine 1 million users trying to access your webpage.

    2) Amount of data transfer – if you are transferring huge amount of data, say your website is cloud based storage system or online cloud database, then again it’s going to hit the wall in performance soon again.

    So to move you objects from server to client, you need to see several serialization options. I will start with some standard ones, and list some recents ones that sounds interesting.

    XML:

    Human readable, and machine parse-able at the same time. But probably the most verbose serialization option we have. Also the human readable advantage goes down very quickly as the size of the XML file goes up.

    JSON:

    JSON (pronounced as Jason), stands for JavaScript Object Notation. Its pretty popular with AJAX, and JavaScript based web libraries. It keeps the data compact, and saves us from verbosity of XML. JSON format supports only text data, and does’nt have native support for binary data.

    Hessian:

    Hessian is been there for a while, and it is quite popular with J2ME world because of the small required dependencies, and efficient binary protocol. Starting from Hessian 1.0 Spec, it has now come to Hessian 2.0. Hessian 2.0 spec seems to be quite comparable with any of the new age/recent protocols that were released.

    Protocol Buffers:

    Coming from google, we can definitely assume it should have great scalability & performance. It supports both text and binary format. All your text representation will be converted to a binary format before sending it across the wire. You have to first create  a interface file (.proto) describing the fields, and compile them to Java/Any supported language classes. Then you can serialize/deserialize from binary format to Objects in your language. The main drawback is for you to specify the interface and compile them to objects, but having things statically compiled will give you some performance advantages. It does support binary data as well in the message structure.

    Apache Thrift:

    Thrift is originally created and used within FaceBook team, and later released as Apache OpenSource project. It pretty much similar to google with define-compile-use cycle. You need to define the message structure using .thrift file, and compile them using thrift compiler, and use them in you services/clients. Apache Thrift has poor documentation when compared to other protocols.

    Apache Avro:

    This is one of sub-projects of Apache Hadoop, a ‘Google Map-Reduce’ inspired framework for Java. This project is contributed heavily by Yahoo! and they said to use it extensively for their infrastructure. Avro’s design goal is as well to support Dynamic Typing; that is be able to exchange information without the compile-use cycle. The schema of the data structure is defined in JSON format, and its exchanged on the initial interaction; and the rest of the transfers client uses the schema to read the data.

    BERT & BERT-RPC:

    BERT stands for Binary ERlang Term. It is based on the Erlang’s binary serialization format. The author of this format is founder of the GitHub. The git-hub team posted a article on how they improved the performance of their site using this new protocol. Their main reason for not using Protocol Buffers & Thrift is that you have to go through mundane define-compile-use cycle. Instead they created this protocol which supports dynamic data format definition, so the actual data itself will contain meta-information about the data structure (the client can read them on the go). GitHub being a huge repository of open source projects, and people forking out branches, checking in/checking out huge code bases we can assume the traffic they could be handling; BERT should have been really comparable in-order to be a better alternative compared to Protocol Buffers & Thrift.

    Lets see what improvements, and comparison reports could future bring about these protocols.

    Links:

    Click on the protocol name on the above article to go to relevant page. And some more links below.

    http://hessian.caucho.com/doc/hessian-serialization.html#anchor2

    http://github.com/blog/531-introducing-bert-and-bert-rpc

    Next page

    Follow

    Get every new post delivered to your Inbox.