Friday, March 14, 2014

Twisting Maven pom.xml for your legacy code

Recently, I have been working on a legacy project which was not using the standard Maven pom conventions, source code and test code are located at separate paths, the directory structure is also not follow the standard.

So, now we are starting to add more unit tests and integrate that with CI pipeline. The code base is ~200MB including everything. Instead of restructuring the whole project layout, which could block the active development and introduce unpredictable bugs, we decided to twist the pom as best as we can. Here are some nice tips I learned (I am a Maven newbie).

-1. Refactor Your Code

Testing should not be an after-thought. It should be considered along the initial code design and implementation. Apply design patterns, use modular designs and other techniques so that writing tests  become possible in the first place.

0. Writing Unit Tests

There are many unit testing frameworks. Two of the popular ones we use in the company are JUnit and TestNG. Here is a nice StackOverflow comparison for them. Note that if you are using Eclipse, you will need to install TestNG plugin, while JUnit support is built-in. Other IDEs like IntelliJ and Netbeans support both too.

There are several mocking frameworks as well: Mockito, Powermock, EasyMock, jMock, just to name a few. We are using Mockito and Powermock. Here is a nice quick guide how to use them. If you are wondering why we need both, it's because Powermock addresses several features missing from Mockito, such as mocking static methods, etc.

Vogella has several short but useful guides on unit testing, highly recommend:


1. Maven Surefire Plugin with both JUnit and TestNG tests

This is the plugin that runs the unit tests and publish test results. The plugin is documented well on its website, so I am not going to repeat anything here. Quick summary:

  • It can easily include, exclude tests, skip tests (think twice before you do), etc.
  • It support JUnit, TestNG, plain POJO tests. The report format is compatible with JUnit output, so it can be easily integrated with CI tools.
  • It also supports parallel test runs.
  • etc.
Here is a list of all the configuration options for Surefire plugin, very useful.

Normally, you would pick a testing framework and stick to it. But in our case, we have both tests written in JUnit and TestNG under the same test directory. So, how to support that?

Luckily, Marcin has found a solution already to have JUnit and TestNG tests live happily together, see his post here for details. He also has a sample pom that you can use as boilerplate. Basically, you declare dependencies inside Surefire plugin:

...
<properties>
    <surefire.version>2.16</surefire.version>
</properties>

...

<plugin>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>${surefire.version}</version>
    <dependencies>
        <dependency>
            <groupId>org.apache.maven.surefire</groupId>
            <artifactId>surefire-junit47</artifactId>
            <version>${surefire.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.maven.surefire</groupId>
            <artifactId>surefire-testng</artifactId>
            <version>${surefire.version}</version>
        </dependency>
    </dependencies>
</plugin>
...

2. Maven compiler plugin

Maven compiler plugin is used to compile your source code. Here are the lists of configuration options for its compiler:compile and compiler:testCompile goals:

But our problem is that it turns out the compiler plugin assumes the source code and test should live under the same root source code directory. In our case, this is not the case. I tried various options such as "testSource", "testIncludes" to specify the test path, but without luck.

Finally, I found a plugin build-helper-maven, which allows customized source and test directories. And it worked like a charm (borrowing from its website):

<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>build-helper-maven-plugin</artifactId>
        <version>1.8</version>
        <executions>
          <execution>
            <id>add-test-source</id>
            <phase>generate-test-sources</phase>
            <goals>
              <goal>add-test-source</goal>
            </goals>
            <configuration>
              <sources>
                <source>some directory</source>
                ...
              </sources>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

3. Maven Clover Plugin for Code Coverage

To generate code coverage, you can use the Maven Clover plugin, note that Clover is free for non-commierical use, for commercial use, you will need to obtain a license and configure the plugin to point to the license file.


4. Maven FindBugs Plugin

FindBugs is a code analysis tool to find potential bugs in your code. It has nice IDE integration and it also integrates well with Maven. You can set it up as a step in CI so that if the bugs will fail the build. I actually scanned our code base and found several severe bugs (one is a switch statement without break, similar to Apple's recent SSL bug).

Please see the plugin website to set it up, quite straightforward.


5. "One Last Thing"

Another useful tip I found out is that when some of the plugin runs, e.g. test, code coverage, findbugs, etc. they require a fair amount of memory. Take the Surefire for example, depending on your configuration, it will fork separate JVM and threads to run the tests. I had JVM exited abruptly in several cases due to this reason.

Refer to the above configuration options to add customized JVM options. For example, for Maven compiler plugin:

<argLine>-Xmx2048m -XX:MaxPermSize=1024m</argLine>

For Maven compiler plugin:

</compilerArgs><arg>-Xms2048m</arg><arg>-XX:MaxPermSize=1024m</arg></compilerArgs>

Friday, December 27, 2013

hadoop shell commands auto-completion

For many users including myself, one of the nice features of BASH is its tab completion, which saves us so much typing. So, when I switched to the Hadoop Shell, it feels so inconvenient since there are many commands and options to remember.

Then I searched around and found one hadoop completion script from Facebook hadoop-20 github repo, but the script does not work for my hadoop installed using Homebrew.

So, I modified it to make it work. You can try it out from my hadoop-completion repo on github. All the installation instructions are there:

https://github.com/guozheng/hadoop-completion

BTW, Bash-Completion includes a collection of similar auto-completion scripts, highly recommend to use it. You will find your life with git cli, ssh, etc. much easier.


References:

- Programmable Bash Completion Buildins (for compgen and complete commands)

- Write your own Bash Completion Function (how to write a customized Bash completion script)

- Get Bash Completion for Mac OS X (a set of built-in scripts for commonly used tools, svn, make, gzip, ssh, git, etc., note that you will need to install git CLI using Homebrew to install the git completion scripts)

Thursday, June 13, 2013

Node.js debugging with Theseus and Brackets

Brackets is an open source Web development editor from Adobe. It is built with HTML, CSS and JavaScript with features designed to make Web app development enjoyable rather than pain:

- live edit and preview: changes are reflected in the browser immediately without reload, currently only supporting Chrome)

- inline view and edit referenced resources: you can open the CSS definition for a tag without leaving the HTML code you are working on

- smart code hint: for JavaScript, it is using the Tern engine, one of the most advanced code analysis engines for JavaScript

- quick docs from webplatform.org: inline documentation for various elements such as tag, values, etc.

- extensible architecture: has a fast growing set of extensions already and you can use the same languages you write Web app to write extensions

- best of all, starting sprint 21 release, it has a built-in Node.js process, which opens up a whole new world of features for JavaScript development

So, if you are currently using TextMate, Sublime Text 2/3, Vim, you should really give Brackets a try, it is so pleasant to work with it on Web apps.

Here is a quick demo video from nettuts+ article "A Peeak At Brackets":



There is a nice extension "Theseus" which makes debugging JavaScript Web app and Node.js app so much easier. You can watch a demo for debugging JavaScript Web app first:


Follow installation guide from the project github README:

  • Install Brackets
  • Use Brackets Extension Manager (File->Extension Manager or the lego-like icon on the right hand toolbar) to install Theseus, Click "Install from URL", then enter the Amazon S3 zip file URL, currently https://s3.amazonaws.com/theseus-downloads/theseus-0.2.13.zip
  • Install Node.js if you haven't got that yet, then use npm to install node-theseus: npm install -g node-theseus

Now, write your app.js file, then run it using node-theseus app.js, then open that file in Brackets and you should be able to see the debug info in a very nice visual way.

To see it in action, view author's screencast here.

Enjoy!


Monday, March 18, 2013

Set up Python 3 development tools on Mac

The default Python shipped with OS X Mountain Lion 10.8.x is Python 2.7.2. There are many nice posts about setting up the Python tools for 2.7.2 and I had a hard time finding ones that cover Python 3. So, this post serves as the notes for my Python 3 setup on OS X 10.8.x.

As a side note, the Python tool chain seems to be really messed up. And it is quite difficult to find a latest coherent source of truth for the state of art developer tools, especially after Python 3. This definitely falls short of my experience with RubyGems for Ruby and NPM for Node.JS, where specifications are clearly defined, actively documented and well supported by community.

Anyways, here are the steps I went through:


1. Install latest Python3. As of this writing, the latest stable Python3 is v3.3.0.

Simply download directly from official Python website and install: download links.

Another way is to install using Homebrew, the best/missing package manager for OS X. However, the latest Python3 formula available to me is v3.2.x.
brew install python3
Type python3 and you will enter python3 REPL:
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.


2. Install distribute and pip

Basically distribute is a replacement for setuptools, pip is a replacement for easy_install. Distribute and pip provide better Python3 support.
$ curl http://python-distribute.org/distribute_setup.py | python3
$ curl https://raw.github.com/pypa/pip/master/contrib/get-pip.py | python3
Update your PATH:
PATH=/Library/Frameworks/Python.framework/Versions/3.3/bin:$PATH
Now, you can use pip to install Python3 packages. See pip documentation for more information about how to use pip, especially useful is the cookbook section.


3. Install virtualenv and virtualenvwrapper

In practice, pip is most useful used together with virtualenv. virtualenv is handy when you need to have multiple development environment with different configurations.

To install virtualenv, just run:
$pip install virtualenv
To create a virtual environment, run the following. It creates a virtual environment named "py3" and installs distribute instead of setuptools in the environment (--distribute option tells it to install distribute). Under the hood, it creates a directory "py3" in the current directory with all the required tools and libraries.
$virtualenv --distribute py3
Using base prefix '/Library/Frameworks/Python.framework/Versions/3.3'
New python executable in py3/bin/python3
Also creating executable in py3/bin/python
Installing distribute......................................................................................................................................................................................................................................................................................................................................................................................................done.
Installing pip................done.
To use the newly create environment, just run the activate script installed in the environment. It basically changes your $PATH and put virtual environment bin directory in front of existing $PATH. It also changes the command line prompt to show the name of the environment. You can run deactivate to restore to the previous $PATH (it is a function defined and exported in activate script).
$ source py3/bin/activate
(py3) $ pip install pgmagick
(py3) $ deactivate
$
To remove the environment, simply remove its directory:
$ rm -rf py3
For more details about virtualenv, please see its documentation.

To create virtual environments with different Python interpreters, just use -p option.

virtualenvwrapper is a set of extensions to manage and work across different virtual environments easier. It is not necessary, but it makes working with virtual environments much comfortable and effective. However, according to its official project page, it does not support Python 3.3 yet. I could not successfully install it either. Let's wait for Python 3.3 support.

If you are running Python 3.2 and under, simply install it by:
$ pip install virtualenvwrapper

An alternative to virtualenv is pythonbrew.

4. Install boiler template project skeleton generation tool

Paste and pastescript provides a nice tool to generate the skeleton project based on different templates. However, it is not ported to Python3 yet. If you are running Python 2.x, you should try it out. The project is actively worked on, so Python3 support should be around the corner.

After digging around, there are several alternatives:
- skeleton (last commit was 3 years ago): https://pypi.python.org/pypi/skeleton/
- mr.bob (active development): http://mrbob.readthedocs.org/en/latest/


5. Documentation with Sphinx

Sphinx is the de facto documentation system for Python code. It also starts to support C/C++ projects now. Many projects including Python itself uses Sphinx. See Sphinx documentation for how to use it.

To publish Sphinx online, you can use Read The Docs. It allows you to host your project on Github and every commit will trigger the build of your documentation and publish it to Read The Docs automatically.


6. Advanced build system and continuous integration

Buildout is an advanced build system recommended by many projects and discussions online. Its documentation is here.

For Continuous Integration support, Travis CI is highly recommended. You can simply hook it up with your Github project.


Reference:

1. Sergey Karayev: Setting up a development environment on Mac OS X 10.8 Mountain Lion, it uses the default Python 2.7.2 installation.

2. Stackoverflow: how to use pip with Python 3.x besides Python 2.x

3. Stackoverflow: Python 3.2 import issue

4. Stackoverflow: Alternatives to Python Pastescript's paster create

5. The Hitchhiker's Guide to Python



Wednesday, January 23, 2013

Two books

1. Async JavaScript by Trevor Burnham

This book is very concise and focused on various Async topics in JavaScript. This is probably one of the best JavaScript books I ever read. And it has many pointers and links for the readers to explore and try out by themselves.

He also wrote a CoffeeScript book, definitely should check it out as well.

2. Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman

Just started reading, but from the table of contents, it covers many interesting areas in the data mining with real world applications. And it provides a free version that readers can download.


Thursday, December 20, 2012

Set up Java 7, Eclipse and Netbeans on Retina Macbook Pro

Updates:

- Bad news: Netbeans 7.4 now requires Java 7 to run, which means the hack to force IDE to use Apple's Java 6 does not work anymore.

- Good news: According to Bug 215141, the Retina display issue is finally fixed in JDK 7u40+ and early access build of JDK 8! I just installed JDK 7u40 and Netbeans 7.4 RC2, and I can confirm that Retina display finally works!!!

I can happily announce that you guys can skip the rest of the post now. You can simply download latest JDK 7 7u40+ or JDK 8, and then install either Eclipse or Netbeans, Retina display should work as expected. No more blurriness!!!

Just got my laptop updated by my generous employer, it is a Retina Macbook Pro (rMBP). This is the laptop/computer with the highest performance I ever used, and the most expensive one too.

Here are the steps to set up Java 7 (J2SE 7u10), Eclipse (4.2.1) and Netbeans (7.2.1) on the rMBP as of this writing.

1. Install Jave 7.


Starting from Mountain Lion, Java is not installed by default. So, if you run a Java application, you might be prompted to install the Apple Java 6, just like this screenshot shows:


Unless you have Java applications that must use Java 6, I really don't see the point of installing both Java 6 and 7, especially now Oracle provides J2SE 7 for Mac already. There was a time when Oracle has not officially released J2SE 7 for Mac and I had to install OpenJDK to try out Java 7 new features.

So, simply to go Oracle Java 7 download page and download "Mac OSX x64", as of my writing it is jdk-7u10-macosx-x64.dmg. Install it and you will get it installed to:

/Library/Java/JavaVirtualMachines/jdk1.7.0_10.jdk 
To verify that you have installed it successfully, simply type "java -version" and you should see the following:

$ java -version
java version "1.7.0_10"
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
You also need to add JAVA_HOME environment variable by inserting this line to ~/.bash_profile:

export JAVA_HOME=`/usr/libexec/java_home -v 1.7`
java_home is a command to return to the Java home directory for the current user, -v 1.7 filters Java versions for 1.7.

2. Install Eclipse.


Installing Eclipse without installing Java 6 is quite tricky. I am simply surprised that Eclipse has not support Java 7 out of box yet. If you haven't installed Java 6, the Eclipse will simply give the above error message and won't start at all. There are bugs filed for this issue (bug 382972, bug 374212, etc.) and it is very disappointing that this problem has not been solved yet.

After some online search, here is the hacky way:

Download and install Eclipse, I got Eclipse 4.2.1

- Create a symbolic link with the name of the Java 6 to hoax Eclipse that you have Java 6 by doing the following:

sudo mkdir /System/Library/Java/JavaVirtualMachines
sudo ln -s /Library/Java/JavaVirtualMachines/jdk.1.7.0_07.jdk /System/Library/Java/JavaVirtualMachines/1.6.0.jdk
After this hack, Eclipse will start correctly (you will need to do the ctrl+click trick since it is not downloaded from App Store and considered as from an untrusted source).


The above screenshot shows the Installed JREs in Eclipse Preferences, so funny that the symbolic link also shows up ;-)

To be frank, I am really disappointed by this Eclipse installation process. First, it should be installable right from App Store. Second, it should be bundled with JDK/JRE 7 directly (they actually have bugs for that, e.g. bug 374791, but not fixed yet).

3. Install Netbeans.


The installation process for Netbeans is much smoother than Eclipse. You simple download and install it and it automatically picks up the Java 7 installed in /Library/Java/JavaVirtualMachines/ and works out of box! No wonder lots of people actually switched from Eclipse to Netbeans.

4. The Retina Fix.

Note: currently, this Retina fix only works for Apple Java 6, not Oracle Java 7. Hope similar fix for Java 7 will be released soon. However, if you are running Apple Java 6, you can try this fix now. This is the bug to track the progress of Oracle fixing this issue.

Maybe because the rMBP is too expensive, many applications are not yet supporting Retina display out of box. In the case of Eclipse and Netbeans, you need to hack their Info.plist file a bit, here is how:

- First install Apple JDK 6. You will need to remove the symbolic link created in Step 2. Then download it from Apple and install it.

- Locate Eclipse.app and Netbeans.app, they are the ones that you double click to run. Eclipse.app should be in the unzipped directory from the tar.gz Eclipse package you untarred. Netbeans by default installs into Applications/Netbeans.

- Right click on .app, and click on "show package contents". Use a text editor to edit Contents/Info.plist. Insert two lines before the closing </dict></plist> tags:

...
    <key><NSHighResolutionCapable</key>
    <true/>
  </dict>
</plist>

- Now, we need to make Eclipse and Netbeans to run using Apple JDK 6 instead of the default Oracle Java 7.

To make Eclipse using Apple Java 6 instead of Java 7, you need to update eclipse.ini. Using the same way to show contents of Eclipse.app, edit Contents/MacOS/eclipse.ini and add -vm option:

-vm
/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home/bin/java
An alternative is to edit Contents/Info.plist and add inside <key>Eclipse</key><array>...<array>
<string>-vm</string><string>/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java</string>
To verify Eclipse is started with the correct JDK/JRE, in Eclipse, go to "About Eclipse" -> "Installation Details" -> "Configuration" and see the "java.runtime.version".

Maybe due to the Mac caching the previous Eclipse.app, changing Contents/Info.plist does not work immediately. You need to make a copy of the Eclipse.app, name it like Eclipse-retina.app. Double click on Eclipse-retina.app and enjoy Eclipse in Retina.

Here is a comparison of the info for Eclipse.app and Eclipse-retina.app (right click then select "get info"), notice for Eclipse-retina.app, "open in low resolution" check box has been unchecked!




To make Netbeans using Apple Java 6, similarly, you can edit the Netbeans config file. Show contents of Netbeans.app, edit Contents/Resources/Netbeans/etc/netbeans.conf and add netbeans_jdkhome option:

netbeans_jdkhome="/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home"
After saving the edit, restart Mac and Netbeans. You should see the Retina display working with Netbeans now. No idea why Netbeans does not need to copy to create a new app to work, but it certainly is much easier than Eclipse.

References:


Monday, October 1, 2012

Use HTTP PATCH method for partial updates

Had an interesting problem at work to design the RESTful API for partial updates of resources. We want an API to partially update an existing resource, e.g. some existing properties get updated or deleted, some new properties get added, etc. The body is in JSON format.

There are a few choices:

  • Use POST, add a query parameter to capture the operation (delete, add, update), put the affected properties in the POST body.
  • Use PUT, similar to above POST design.
  • Use POST, but instead of using a new query parameter, design the body to capture the operation and the data for the operation. For example, use delete, add, and update as the first level properties, then put the actual properties under the corresponding operations.
  • Use PUT, similar to the above POST design.
Then, a colleague suggested using PATCH for partial updates to make it really RESTful. Yeah, I vaguely remember there is one HTTP method for PATCH, but what does it do and why should we care?

First, let's see the difference between POST and PUT. One misconception is that POST is used to create a resource, PUT is used to update a resource. Actually, both POST and PUT can be used for creation. Here are the major points based my study:
  • PUT is idempotent, POST is not. This means you can send the same PUT request multiple times and the result should remain the same as you send it only once.
  • PUT URL uniquely identifies the representation in the request body. POST URL identifies the service to process the request body. For example, PUT URL is like the address on each regular mail, which uniquely identifies the mail, POST URL is like the address of the post office, which identifies the service to process the mails. The result of POST request handling on the server side does not necessarily creates new representations/resources that can be identified by a URL.
  • PUT response is not cacheable, in addition, PUT response should invalidate the cached copies of the representation identified by the PUT URL in the intermediate caches when the response passes through the caches. POST response, however, is cacheable if it contains "freshness" cache control headers. A cached POST 303 response contains Content-Location header redirecting User Agent to fetch the cached copy.
So, PUT URL identifies a complete representation to be updated or created. For example, you can use PUT to overwrite an entire representation. How about partial updates, which is more common. If you only want to do partial update, according to HTTP spec, you need to use a different URL that identifies the partial content and send the partial content as request body. Or you need to use the PATCH method (defined in rfc5789).

Mark Norttingham has a nice piece explaining why PATCH is good for a RESTful design. He is also working on a JSON PATCH draft (rev 05 was updated days ago), which defines semantics in JSON format, exactly what we are looking for ;-) Note it has a new content type of "application/json-patch".

The PATCH request body should contain operations (such as add, remove, update, etc.), the relative path to the URL identifying the entire representation, and the value. Here is an example from the JSON PATCH draft:

[
       { "op": "test", "path": "/a/b/c", "value": "foo" },
       { "op": "remove", "path": "/a/b/c" },
       { "op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ] },
       { "op": "replace", "path": "/a/b/c", "value": 42 },
       { "op": "move", "path": "/a/b/c", "to": "/a/b/d" },
       { "op": "copy", "path": "/a/b/c", "to": "/a/b/e" }
]

In fact, PATCH is going to be the main method for updates starting in Rails 4.0. However, PATCH method may not be as well supported as POST or PUT. In case your framework or server does not support it, you will have to fall back to one of the choices mentioned in the beginning of the post, probably.

References: