How to enable LZO compression on HDInsight

This blog post explains how to enable LZO compression on a HDInsight cluster.

ARM Template 

You will need to modify the ARM template configuration and under the clusterDefinition, configuration section:

  •  Add core-site section and specify the codecs and compression codec class
  • Add a mapred-site enable map output compression and the compression codec class

"properties": {
                "clusterVersion": "[parameters('clusterVersion')]",
                "osType": "Linux",
                "clusterDefinition": {
                    "kind": "spark",
                    "configurations": {
                        "gateway": {
                            "restAuthCredential.isEnabled": true,
                            "restAuthCredential.username": "[parameters('clusterLoginUserName')]",
                            "restAuthCredential.password": "[parameters('clusterLoginPassword')]"
                        "core-site": {
                            "io.compression.codecs": ",,,com.hadoop.compression.lzo.LzopCodec",
                            "io.compression.codec.lzo.class": "com.hadoop.compression.lzo.LzoCodec"
                        "mapred-site": {
                            "": "true",
                            "": "com.hadoop.compression.lzo.LzoCodec"

Install compression libraries on cluster nodes

You will also need to install the compression libraries on the cluster nodes.

apt install -y liblzo2-2 liblzo2-dev hadooplzo hadoop-lzo hadooplzo-native

On the point of compression libraries, if you are using snappy you will need to install the snappy compression libraries with:

apt install -y libsnappy1 libsnappy-dev