Utility classes to extend and generalize Spark's ML pipeline framework

SQL code is messy and quickly gets hard to manage, debug and interpret. Regardless of whether or not you're training a machine learning model, organizing your code into a modular, configurable framework makes it much easier to manage, generalize and share. These are all good things, especially for production-quality codebases.

In particular, it adds the following features:

* support for Transformer-only pipelines (i.e. ETL pipelines)
* support for aggregations as pipeline stages, and multi-aggregation pipelines
* support for windowing functions as pipeline stages, and pipelines of multiple windowing functions
* support for "exploding" transformers (ie. using the explode function to expand rows in a DataFrame)
* support for running multiple pipelines in parallel, and then re-joining their results based on common columns
* a few handy transformers that expand upon what's already provided in the Spark ML API, including:
* Column selection, dropping and renaming
* Wrap any function into a tranformer stage (current ML framework only provides a 1-to-1 (ie. Unary) transformer)


