How to enable LZO compression on HDInsight
This blog post explains how to enable LZO compression on a HDInsight cluster.
You will need to modify the ARM template configuration and under the clusterDefinition, configuration section:
On the point of compression libraries, if you are using snappy you will need to install the snappy compression libraries with:
ARM Template
You will need to modify the ARM template configuration and under the clusterDefinition, configuration section:
- Add core-site section and specify the codecs and compression codec class
- Add a mapred-site enable map output compression and the compression codec class
"properties": { "clusterVersion": "[parameters('clusterVersion')]", "osType": "Linux", "clusterDefinition": { "kind": "spark", "configurations": { "gateway": { "restAuthCredential.isEnabled": true, "restAuthCredential.username": "[parameters('clusterLoginUserName')]", "restAuthCredential.password": "[parameters('clusterLoginPassword')]" }, "core-site": { "io.compression.codecs": "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzopCodec", "io.compression.codec.lzo.class": "com.hadoop.compression.lzo.LzoCodec" }, "mapred-site": { "mapreduce.map.output.compress": "true", "mapreduce.map.output.compression.codec": "com.hadoop.compression.lzo.LzoCodec" },
Install compression libraries on cluster nodes
You will also need to install the compression libraries on the cluster nodes.apt install -y liblzo2-2 liblzo2-dev hadooplzo hadoop-lzo hadooplzo-native
On the point of compression libraries, if you are using snappy you will need to install the snappy compression libraries with:
apt install -y libsnappy1 libsnappy-dev
Comments
Post a Comment