Friday, October 31, 2014


I was working on a Storm project where the work nodes run in a tightly controlled grid environment. So, directly RDBMS access through traditional mechanism is prohibited since only certain ports are open for inbound and outbound traffic. So, I ended up writing a quick and dirty HTTP proxy that exposes certain MySQL database operations using REST API calls.

Then I just found this is actually quite a common use case and several people have written generic HTTP servers to expose REST APIs for database access.

One such project is jdbc-http-server, really handy. I will see if I can replace my own version with that.

Thursday, August 28, 2014

Friday, March 14, 2014

Twisting Maven pom.xml for your legacy code

Recently, I have been working on a legacy project which was not using the standard Maven pom conventions, source code and test code are located at separate paths, the directory structure is also not follow the standard.

So, now we are starting to add more unit tests and integrate that with CI pipeline. The code base is ~200MB including everything. Instead of restructuring the whole project layout, which could block the active development and introduce unpredictable bugs, we decided to twist the pom as best as we can. Here are some nice tips I learned (I am a Maven newbie).

-1. Refactor Your Code

Testing should not be an after-thought. It should be considered along the initial code design and implementation. Apply design patterns, use modular designs and other techniques so that writing tests  become possible in the first place.

0. Writing Unit Tests

There are many unit testing frameworks. Two of the popular ones we use in the company are JUnit and TestNG. Here is a nice StackOverflow comparison for them. Note that if you are using Eclipse, you will need to install TestNG plugin, while JUnit support is built-in. Other IDEs like IntelliJ and Netbeans support both too.

There are several mocking frameworks as well: Mockito, Powermock, EasyMock, jMock, just to name a few. We are using Mockito and Powermock. Here is a nice quick guide how to use them. If you are wondering why we need both, it's because Powermock addresses several features missing from Mockito, such as mocking static methods, etc.

Vogella has several short but useful guides on unit testing, highly recommend:

1. Maven Surefire Plugin with both JUnit and TestNG tests

This is the plugin that runs the unit tests and publish test results. The plugin is documented well on its website, so I am not going to repeat anything here. Quick summary:

  • It can easily include, exclude tests, skip tests (think twice before you do), etc.
  • It support JUnit, TestNG, plain POJO tests. The report format is compatible with JUnit output, so it can be easily integrated with CI tools.
  • It also supports parallel test runs.
  • etc.
Here is a list of all the configuration options for Surefire plugin, very useful.

Normally, you would pick a testing framework and stick to it. But in our case, we have both tests written in JUnit and TestNG under the same test directory. So, how to support that?

Luckily, Marcin has found a solution already to have JUnit and TestNG tests live happily together, see his post here for details. He also has a sample pom that you can use as boilerplate. Basically, you declare dependencies inside Surefire plugin:




2. Maven compiler plugin

Maven compiler plugin is used to compile your source code. Here are the lists of configuration options for its compiler:compile and compiler:testCompile goals:

But our problem is that it turns out the compiler plugin assumes the source code and test should live under the same root source code directory. In our case, this is not the case. I tried various options such as "testSource", "testIncludes" to specify the test path, but without luck.

Finally, I found a plugin build-helper-maven, which allows customized source and test directories. And it worked like a charm (borrowing from its website):

                <source>some directory</source>

3. Maven Clover Plugin for Code Coverage

To generate code coverage, you can use the Maven Clover plugin, note that Clover is free for non-commierical use, for commercial use, you will need to obtain a license and configure the plugin to point to the license file.

4. Maven FindBugs Plugin

FindBugs is a code analysis tool to find potential bugs in your code. It has nice IDE integration and it also integrates well with Maven. You can set it up as a step in CI so that if the bugs will fail the build. I actually scanned our code base and found several severe bugs (one is a switch statement without break, similar to Apple's recent SSL bug).

Please see the plugin website to set it up, quite straightforward.

5. "One Last Thing"

Another useful tip I found out is that when some of the plugin runs, e.g. test, code coverage, findbugs, etc. they require a fair amount of memory. Take the Surefire for example, depending on your configuration, it will fork separate JVM and threads to run the tests. I had JVM exited abruptly in several cases due to this reason.

Refer to the above configuration options to add customized JVM options. For example, for Maven compiler plugin:

<argLine>-Xmx2048m -XX:MaxPermSize=1024m</argLine>

For Maven compiler plugin:


Friday, December 27, 2013

hadoop shell commands auto-completion

For many users including myself, one of the nice features of BASH is its tab completion, which saves us so much typing. So, when I switched to the Hadoop Shell, it feels so inconvenient since there are many commands and options to remember.

Then I searched around and found one hadoop completion script from Facebook hadoop-20 github repo, but the script does not work for my hadoop installed using Homebrew.

So, I modified it to make it work. You can try it out from my hadoop-completion repo on github. All the installation instructions are there:

BTW, Bash-Completion includes a collection of similar auto-completion scripts, highly recommend to use it. You will find your life with git cli, ssh, etc. much easier.


- Programmable Bash Completion Buildins (for compgen and complete commands)

- Write your own Bash Completion Function (how to write a customized Bash completion script)

- Get Bash Completion for Mac OS X (a set of built-in scripts for commonly used tools, svn, make, gzip, ssh, git, etc., note that you will need to install git CLI using Homebrew to install the git completion scripts)

Thursday, June 13, 2013

Node.js debugging with Theseus and Brackets

Brackets is an open source Web development editor from Adobe. It is built with HTML, CSS and JavaScript with features designed to make Web app development enjoyable rather than pain:

- live edit and preview: changes are reflected in the browser immediately without reload, currently only supporting Chrome)

- inline view and edit referenced resources: you can open the CSS definition for a tag without leaving the HTML code you are working on

- smart code hint: for JavaScript, it is using the Tern engine, one of the most advanced code analysis engines for JavaScript

- quick docs from inline documentation for various elements such as tag, values, etc.

- extensible architecture: has a fast growing set of extensions already and you can use the same languages you write Web app to write extensions

- best of all, starting sprint 21 release, it has a built-in Node.js process, which opens up a whole new world of features for JavaScript development

So, if you are currently using TextMate, Sublime Text 2/3, Vim, you should really give Brackets a try, it is so pleasant to work with it on Web apps.

Here is a quick demo video from nettuts+ article "A Peeak At Brackets":

There is a nice extension "Theseus" which makes debugging JavaScript Web app and Node.js app so much easier. You can watch a demo for debugging JavaScript Web app first:

Follow installation guide from the project github README:

  • Install Brackets
  • Use Brackets Extension Manager (File->Extension Manager or the lego-like icon on the right hand toolbar) to install Theseus, Click "Install from URL", then enter the Amazon S3 zip file URL, currently
  • Install Node.js if you haven't got that yet, then use npm to install node-theseus: npm install -g node-theseus

Now, write your app.js file, then run it using node-theseus app.js, then open that file in Brackets and you should be able to see the debug info in a very nice visual way.

To see it in action, view author's screencast here.


Monday, March 18, 2013

Set up Python 3 development tools on Mac

The default Python shipped with OS X Mountain Lion 10.8.x is Python 2.7.2. There are many nice posts about setting up the Python tools for 2.7.2 and I had a hard time finding ones that cover Python 3. So, this post serves as the notes for my Python 3 setup on OS X 10.8.x.

As a side note, the Python tool chain seems to be really messed up. And it is quite difficult to find a latest coherent source of truth for the state of art developer tools, especially after Python 3. This definitely falls short of my experience with RubyGems for Ruby and NPM for Node.JS, where specifications are clearly defined, actively documented and well supported by community.

Anyways, here are the steps I went through:

1. Install latest Python3. As of this writing, the latest stable Python3 is v3.3.0.

Simply download directly from official Python website and install: download links.

Another way is to install using Homebrew, the best/missing package manager for OS X. However, the latest Python3 formula available to me is v3.2.x.
brew install python3
Type python3 and you will enter python3 REPL:
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

2. Install distribute and pip

Basically distribute is a replacement for setuptools, pip is a replacement for easy_install. Distribute and pip provide better Python3 support.
$ curl | python3
$ curl | python3
Update your PATH:
Now, you can use pip to install Python3 packages. See pip documentation for more information about how to use pip, especially useful is the cookbook section.

3. Install virtualenv and virtualenvwrapper

In practice, pip is most useful used together with virtualenv. virtualenv is handy when you need to have multiple development environment with different configurations.

To install virtualenv, just run:
$pip install virtualenv
To create a virtual environment, run the following. It creates a virtual environment named "py3" and installs distribute instead of setuptools in the environment (--distribute option tells it to install distribute). Under the hood, it creates a directory "py3" in the current directory with all the required tools and libraries.
$virtualenv --distribute py3
Using base prefix '/Library/Frameworks/Python.framework/Versions/3.3'
New python executable in py3/bin/python3
Also creating executable in py3/bin/python
Installing distribute......................................................................................................................................................................................................................................................................................................................................................................................................done.
Installing pip................done.
To use the newly create environment, just run the activate script installed in the environment. It basically changes your $PATH and put virtual environment bin directory in front of existing $PATH. It also changes the command line prompt to show the name of the environment. You can run deactivate to restore to the previous $PATH (it is a function defined and exported in activate script).
$ source py3/bin/activate
(py3) $ pip install pgmagick
(py3) $ deactivate
To remove the environment, simply remove its directory:
$ rm -rf py3
For more details about virtualenv, please see its documentation.

To create virtual environments with different Python interpreters, just use -p option.

virtualenvwrapper is a set of extensions to manage and work across different virtual environments easier. It is not necessary, but it makes working with virtual environments much comfortable and effective. However, according to its official project page, it does not support Python 3.3 yet. I could not successfully install it either. Let's wait for Python 3.3 support.

If you are running Python 3.2 and under, simply install it by:
$ pip install virtualenvwrapper

An alternative to virtualenv is pythonbrew.

4. Install boiler template project skeleton generation tool

Paste and pastescript provides a nice tool to generate the skeleton project based on different templates. However, it is not ported to Python3 yet. If you are running Python 2.x, you should try it out. The project is actively worked on, so Python3 support should be around the corner.

After digging around, there are several alternatives:
- skeleton (last commit was 3 years ago):
- mr.bob (active development):

5. Documentation with Sphinx

Sphinx is the de facto documentation system for Python code. It also starts to support C/C++ projects now. Many projects including Python itself uses Sphinx. See Sphinx documentation for how to use it.

To publish Sphinx online, you can use Read The Docs. It allows you to host your project on Github and every commit will trigger the build of your documentation and publish it to Read The Docs automatically.

6. Advanced build system and continuous integration

Buildout is an advanced build system recommended by many projects and discussions online. Its documentation is here.

For Continuous Integration support, Travis CI is highly recommended. You can simply hook it up with your Github project.


1. Sergey Karayev: Setting up a development environment on Mac OS X 10.8 Mountain Lion, it uses the default Python 2.7.2 installation.

2. Stackoverflow: how to use pip with Python 3.x besides Python 2.x

3. Stackoverflow: Python 3.2 import issue

4. Stackoverflow: Alternatives to Python Pastescript's paster create

5. The Hitchhiker's Guide to Python

Wednesday, January 23, 2013

Two books

1. Async JavaScript by Trevor Burnham

This book is very concise and focused on various Async topics in JavaScript. This is probably one of the best JavaScript books I ever read. And it has many pointers and links for the readers to explore and try out by themselves.

He also wrote a CoffeeScript book, definitely should check it out as well.

2. Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman

Just started reading, but from the table of contents, it covers many interesting areas in the data mining with real world applications. And it provides a free version that readers can download.