Apache Crunch is a Java library for writing, testing, and running Hadoop MapReduce pipelines, based on Google's FlumeJava. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.
Homepage POM file JAR file Javadoc'org.apache.crunch:crunch-examples:0.10.0'