11.06.09

Serialization/Streaming Protocols: What we got?

Posted in Software Development tagged , , , , , , , , at 5:40 pm by Srinivasan

It’s takes a huge effort to build a friendly API, and build a community around it. But once you have a popular service API, the next thing is the handling the traffic. It doesn’t have to be external API, it can be a your web front-end posting requests to the backend service layer.

As the user base explodes, a bit saved is bandwidth and money saved. This applies to mobile clients as well. With things hosted in clouds these days, it does matter how much bandwidth you use and how less resources you consume.

Two things magnifies the problem:

1) User Base – if the user base is really large then even transferring 1MB per user over wire is going to hit the wall. Imagine 1 million users trying to access your webpage.

2) Amount of data transfer – if you are transferring huge amount of data, say your website is cloud based storage system or online cloud database, then again it’s going to hit the wall in performance soon again.

So to move you objects from server to client, you need to see several serialization options. I will start with some standard ones, and list some recents ones that sounds interesting.

XML:

Human readable, and machine parse-able at the same time. But probably the most verbose serialization option we have. Also the human readable advantage goes down very quickly as the size of the XML file goes up.

JSON:

JSON (pronounced as Jason), stands for JavaScript Object Notation. Its pretty popular with AJAX, and JavaScript based web libraries. It keeps the data compact, and saves us from verbosity of XML. JSON format supports only text data, and does’nt have native support for binary data.

Hessian:

Hessian is been there for a while, and it is quite popular with J2ME world because of the small required dependencies, and efficient binary protocol. Starting from Hessian 1.0 Spec, it has now come to Hessian 2.0. Hessian 2.0 spec seems to be quite comparable with any of the new age/recent protocols that were released.

Protocol Buffers:

Coming from google, we can definitely assume it should have great scalability & performance. It supports both text and binary format. All your text representation will be converted to a binary format before sending it across the wire. You have to first create  a interface file (.proto) describing the fields, and compile them to Java/Any supported language classes. Then you can serialize/deserialize from binary format to Objects in your language. The main drawback is for you to specify the interface and compile them to objects, but having things statically compiled will give you some performance advantages. It does support binary data as well in the message structure.

Apache Thrift:

Thrift is originally created and used within FaceBook team, and later released as Apache OpenSource project. It pretty much similar to google with define-compile-use cycle. You need to define the message structure using .thrift file, and compile them using thrift compiler, and use them in you services/clients. Apache Thrift has poor documentation when compared to other protocols.

Apache Avro:

This is one of sub-projects of Apache Hadoop, a ‘Google Map-Reduce’ inspired framework for Java. This project is contributed heavily by Yahoo! and they said to use it extensively for their infrastructure. Avro’s design goal is as well to support Dynamic Typing; that is be able to exchange information without the compile-use cycle. The schema of the data structure is defined in JSON format, and its exchanged on the initial interaction; and the rest of the transfers client uses the schema to read the data.

BERT & BERT-RPC:

BERT stands for Binary ERlang Term. It is based on the Erlang’s binary serialization format. The author of this format is founder of the GitHub. The git-hub team posted a article on how they improved the performance of their site using this new protocol. Their main reason for not using Protocol Buffers & Thrift is that you have to go through mundane define-compile-use cycle. Instead they created this protocol which supports dynamic data format definition, so the actual data itself will contain meta-information about the data structure (the client can read them on the go). GitHub being a huge repository of open source projects, and people forking out branches, checking in/checking out huge code bases we can assume the traffic they could be handling; BERT should have been really comparable in-order to be a better alternative compared to Protocol Buffers & Thrift.

Lets see what improvements, and comparison reports could future bring about these protocols.

Links:

Click on the protocol name on the above article to go to relevant page. And some more links below.

http://hessian.caucho.com/doc/hessian-serialization.html#anchor2

http://github.com/blog/531-introducing-bert-and-bert-rpc

07.09.08

using Spring Web Flow 2

Posted in Software Development tagged , , , , at 9:44 pm by Srinivasan

I have got opportunity to work with Spring WebFlow 2 recently in a project, here I share my personal views on that with you.

Let me first tell you all nice things about recent spring stack (spring 2.5 and above). Two things which  improved a lot with recent release are: annotation support, specific namespaces.

Annotations lets you spend your time more on writing code than to wiring components through xml. Off-course spring fails fast if you have messed up a configuration, but still annotations are lot better to avoid that in first place. With improved @Repository, @Service and @Component it’s easy to configure beans with required specific responsibilities by default.

Namespace improvements, help to keep the xml configuration minimal and typo-error free. Schema definitions helps to validate you configuration as you type, and also with convention over configuration approach they have reduced the lines of XML we need to wire up objects. If you want to replace a component with your custom implementation, sometimes its easy by using auto-wire option; sometime you have to configure them by the old way (i.e. using beans namespace and manually declaring most of the configuration) which is more painful after you getting used to the new way.

With SpringTest framework it’s fairly easy to write integration test cases. With simple annotation spring will automatically loads the application context on the test start up. Also with @Timed you could even clock your test method, and make it fail if it exceeds specified time. And it also supports Transactional test with automatic rollback on default, so if you could write tests which doesn’t dirties up the database.

Let’s come back to the original topic Spring web flow. Spring webflow works as advertised for, i.e. they are for application which has a natural flow behind in business, and UI acts as a way to capture input for the flow and to display something back. Not for an application that has a different requirement than stated above.

Everything is a flow, each flow has a starting point and a end point, and could have any number of transitions in between. As a part of transition you could go to a sub-flow and come back to the original flow later, but these transitions could only happen at the pre-defined places on the flow. It will be tough to implement a free-flow (random browse) kind of applications with it.

It serializes all the information you add to the flow context and restores them as you resume a flow after UI interaction, so every object like entities, repositories, and whatever should implement Serializable. This restricts what you could share in the flow context.

Most of the decision for transition could be easily handled in the flow definition, this avoids creating Action classes which returns just the outcome.

in JSF UI:

<h:commandButton action=”save” />

in Flow definition:

<view-state …

 <transition on=”save” >

    <expression =”validator.validate(model)” />

</transition>

As you could see, you don’t need to have the Action class which returns outcome ’save’, you could direct specify a transition on the command button. Ok, now you could ask what if the save has to be returned only on certain condition (say after only validation passes on the entity). For that you could have a expression executed on the transition, the transition will execute only if the validator returns true, if the validator returns false it will come back to the same view. The expression will accept any EL method expression, need not be just a validator. So you could run any action before the transition. As you could see the method executions in the action class are moved to the flow definition. This will look elegant only if the number of calls made at transition is small, or your application is well thought and designed to share less number of information in state, and keeping the method calls down. (Basically this is a nice feature , but would go awry for huge apps, and for apps which there is no certain business flow behind it)

Spring web flow also supports inheritance of flows, so you could inherit common transition rules from a parent flow. Which is a nice feature to keep the definition DRY as far as possible.

What makes flow definition looks ugly? Whenever there are more no. of mere actions which is called in the transitions to set a variable, to retrieve a variable from flowScope and setting back to the viewScope or so. One thing I had to do multiple times in flow definitions are to transform a List to dataModel for the UI, so I could use listName.selectedRow to identify item selected by the user.

Adding this kind of non-business related method executions and transformations, etc ., to the flow definitions makes it bulky, and also alienates the flow from resembling the business definitions. This defeats the very own cause of having a flow definition.

WebFlow provides convenient default variables like resourceBundle, currentUser, messageContext available in the flow context, which you could refer directly in the flow definition or pass it as arguments to bean action methods, or call actions on them.

When a root flow ends, all the information will be discarded. This is nice for cleaning unwanted data in the  memory but that also means that you cannot share anything with the user after the flow is ended. Suppose I would like to say that the user have successfully placed an order at the end of the flow, I could not do that! You could ask that why not keep the confirmation as part of the flow, well it depends on what time you are committing the changes to the db, or how you are sharing a persistent context, or even like its just a end message, there should not be interaction after that from the view to end the flow.

It’s like redirecting to the home page after successfully placing the order and showing a banner “Thank you for shopping with us!”, which is not just possible.

One last point is that with UrlMapper definition in the configuration you could make a simple url as a starting point of the flow, but otherwise generally can’t use a RESTFUL GET url to reach a page on the flow.

What’s your experience with Spring Web Flow?

06.09.08

Quick Groovy Scripting

Posted in Software Development tagged , , at 6:51 pm by Srinivasan

Recently I have to port some data from mainframe database to SQL based db for testing purposes. I have started with some text report files generated from mainframe. I have fond of using unix awk, grep for these kind of data munging. Also used perl and ruby for some scripting activities in the past. But given that I had to do this on windows and also with fading knowledge of perl, thought of getting in donw with groovy. Since eclipse also supports groovy it became easy to start with.

I got something running which spits SQL statements (using println) for every line of the input. Sooner my eclipse console started eating the output because of the buffer size for console display I had in my settings! Though I had the huge monolithic script which works fine, I cannot able to get the output in single shot. I had to rerun them in parts to get the final collective output. This slowed me on tweaking the final script. Given we didn’t have much re-factoring support in eclipse, I couldn’t either easily extract them as functions as I could in Java. But I am able to use a more powerful tool i.e. define a closure immediately and redirect the inputs to the println statements to a File without much changes to the original script.

println “insert into table_name (col1, col2, col3) into values (${col1},’${col2}’, ${col3})”

def file = new File( “C:\output.txt”)

def println = { line ->  file.append(line)}

Just adding these two line saved me a lot of time, also now I can switch back to see the output in command line or to capture them in a file very easily.

Other things that helped me to get things done quickly are the ability to refer the variables inside the string directly as ” ‘${col2}’”. This is especially useful where I have to qualify the column of string data type with quotes, otherwise for which I have to use endless escaping and + con-catenations!

Also for the next script I did, I started writing in small classes than single file, so made things easier to change at last minute.  Another gotcha for beginner for the groovy script is the use of ‘==’. Remember in groovy use of ‘==’ is actually converted to this.equals(that) before the execution. I ran into endless self-recursive calls as I used the == for reference comparison as we do in Java.

As I got the script completed there were lot of duplicate SQL statements in the output. As we get errors due to integrity constraints in database, I have to find some way to remove duplicate statements. In unix, I normally use `uniq` to get this done. Since I have to get that done quickly, i just looped thru the output file and added each line to the Set and dumped it back out to remove the duplicates.

Being used Perl, and Ruby in the past I know the libraries support in perl or ruby are far huge when compared to groovy. But the single fact that I have used to Java in past years and have to work with windows, Groovy was a life saver!

N.B. No data conversion is possible without effective use of Regular Expressions. I did used regular expressions to format the input files before running groovy scripts against them. I used Textpad to do  find/replace with regular expressions. The regular expression support in eclipse editor find/replace tool still needs improvements before could it could be really useful.

Next page