Friday, December 27, 2013

hadoop shell commands auto-completion

For many users including myself, one of the nice features of BASH is its tab completion, which saves us so much typing. So, when I switched to the Hadoop Shell, it feels so inconvenient since there are many commands and options to remember.

Then I searched around and found one hadoop completion script from Facebook hadoop-20 github repo, but the script does not work for my hadoop installed using Homebrew.

So, I modified it to make it work. You can try it out from my hadoop-completion repo on github. All the installation instructions are there:

BTW, Bash-Completion includes a collection of similar auto-completion scripts, highly recommend to use it. You will find your life with git cli, ssh, etc. much easier.


- Programmable Bash Completion Buildins (for compgen and complete commands)

- Write your own Bash Completion Function (how to write a customized Bash completion script)

- Get Bash Completion for Mac OS X (a set of built-in scripts for commonly used tools, svn, make, gzip, ssh, git, etc., note that you will need to install git CLI using Homebrew to install the git completion scripts)

Thursday, June 13, 2013

Node.js debugging with Theseus and Brackets

Brackets is an open source Web development editor from Adobe. It is built with HTML, CSS and JavaScript with features designed to make Web app development enjoyable rather than pain:

- live edit and preview: changes are reflected in the browser immediately without reload, currently only supporting Chrome)

- inline view and edit referenced resources: you can open the CSS definition for a tag without leaving the HTML code you are working on

- smart code hint: for JavaScript, it is using the Tern engine, one of the most advanced code analysis engines for JavaScript

- quick docs from inline documentation for various elements such as tag, values, etc.

- extensible architecture: has a fast growing set of extensions already and you can use the same languages you write Web app to write extensions

- best of all, starting sprint 21 release, it has a built-in Node.js process, which opens up a whole new world of features for JavaScript development

So, if you are currently using TextMate, Sublime Text 2/3, Vim, you should really give Brackets a try, it is so pleasant to work with it on Web apps.

Here is a quick demo video from nettuts+ article "A Peeak At Brackets":

There is a nice extension "Theseus" which makes debugging JavaScript Web app and Node.js app so much easier. You can watch a demo for debugging JavaScript Web app first:

Follow installation guide from the project github README:

  • Install Brackets
  • Use Brackets Extension Manager (File->Extension Manager or the lego-like icon on the right hand toolbar) to install Theseus, Click "Install from URL", then enter the Amazon S3 zip file URL, currently
  • Install Node.js if you haven't got that yet, then use npm to install node-theseus: npm install -g node-theseus

Now, write your app.js file, then run it using node-theseus app.js, then open that file in Brackets and you should be able to see the debug info in a very nice visual way.

To see it in action, view author's screencast here.


Monday, March 18, 2013

Set up Python 3 development tools on Mac

The default Python shipped with OS X Mountain Lion 10.8.x is Python 2.7.2. There are many nice posts about setting up the Python tools for 2.7.2 and I had a hard time finding ones that cover Python 3. So, this post serves as the notes for my Python 3 setup on OS X 10.8.x.

As a side note, the Python tool chain seems to be really messed up. And it is quite difficult to find a latest coherent source of truth for the state of art developer tools, especially after Python 3. This definitely falls short of my experience with RubyGems for Ruby and NPM for Node.JS, where specifications are clearly defined, actively documented and well supported by community.

Anyways, here are the steps I went through:

1. Install latest Python3. As of this writing, the latest stable Python3 is v3.3.0.

Simply download directly from official Python website and install: download links.

Another way is to install using Homebrew, the best/missing package manager for OS X. However, the latest Python3 formula available to me is v3.2.x.
brew install python3
Type python3 and you will enter python3 REPL:
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

2. Install distribute and pip

Basically distribute is a replacement for setuptools, pip is a replacement for easy_install. Distribute and pip provide better Python3 support.
$ curl | python3
$ curl | python3
Update your PATH:
Now, you can use pip to install Python3 packages. See pip documentation for more information about how to use pip, especially useful is the cookbook section.

3. Install virtualenv and virtualenvwrapper

In practice, pip is most useful used together with virtualenv. virtualenv is handy when you need to have multiple development environment with different configurations.

To install virtualenv, just run:
$pip install virtualenv
To create a virtual environment, run the following. It creates a virtual environment named "py3" and installs distribute instead of setuptools in the environment (--distribute option tells it to install distribute). Under the hood, it creates a directory "py3" in the current directory with all the required tools and libraries.
$virtualenv --distribute py3
Using base prefix '/Library/Frameworks/Python.framework/Versions/3.3'
New python executable in py3/bin/python3
Also creating executable in py3/bin/python
Installing distribute......................................................................................................................................................................................................................................................................................................................................................................................................done.
Installing pip................done.
To use the newly create environment, just run the activate script installed in the environment. It basically changes your $PATH and put virtual environment bin directory in front of existing $PATH. It also changes the command line prompt to show the name of the environment. You can run deactivate to restore to the previous $PATH (it is a function defined and exported in activate script).
$ source py3/bin/activate
(py3) $ pip install pgmagick
(py3) $ deactivate
To remove the environment, simply remove its directory:
$ rm -rf py3
For more details about virtualenv, please see its documentation.

To create virtual environments with different Python interpreters, just use -p option.

virtualenvwrapper is a set of extensions to manage and work across different virtual environments easier. It is not necessary, but it makes working with virtual environments much comfortable and effective. However, according to its official project page, it does not support Python 3.3 yet. I could not successfully install it either. Let's wait for Python 3.3 support.

If you are running Python 3.2 and under, simply install it by:
$ pip install virtualenvwrapper

An alternative to virtualenv is pythonbrew.

4. Install boiler template project skeleton generation tool

Paste and pastescript provides a nice tool to generate the skeleton project based on different templates. However, it is not ported to Python3 yet. If you are running Python 2.x, you should try it out. The project is actively worked on, so Python3 support should be around the corner.

After digging around, there are several alternatives:
- skeleton (last commit was 3 years ago):
- mr.bob (active development):

5. Documentation with Sphinx

Sphinx is the de facto documentation system for Python code. It also starts to support C/C++ projects now. Many projects including Python itself uses Sphinx. See Sphinx documentation for how to use it.

To publish Sphinx online, you can use Read The Docs. It allows you to host your project on Github and every commit will trigger the build of your documentation and publish it to Read The Docs automatically.

6. Advanced build system and continuous integration

Buildout is an advanced build system recommended by many projects and discussions online. Its documentation is here.

For Continuous Integration support, Travis CI is highly recommended. You can simply hook it up with your Github project.


1. Sergey Karayev: Setting up a development environment on Mac OS X 10.8 Mountain Lion, it uses the default Python 2.7.2 installation.

2. Stackoverflow: how to use pip with Python 3.x besides Python 2.x

3. Stackoverflow: Python 3.2 import issue

4. Stackoverflow: Alternatives to Python Pastescript's paster create

5. The Hitchhiker's Guide to Python

Wednesday, January 23, 2013

Two books

1. Async JavaScript by Trevor Burnham

This book is very concise and focused on various Async topics in JavaScript. This is probably one of the best JavaScript books I ever read. And it has many pointers and links for the readers to explore and try out by themselves.

He also wrote a CoffeeScript book, definitely should check it out as well.

2. Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman

Just started reading, but from the table of contents, it covers many interesting areas in the data mining with real world applications. And it provides a free version that readers can download.