Building a Kerberised (via AD) and TLS/SSL Enabled Apache Kafka Cluster

I recently had to design an secure Apache Kafka cluster that was secured with Kerberos for user/service authentication, SSL/TLS for encryption of communications and ACLs. This wasn't exactly straightforward, especially the Kerberos part, as I was hoping it was going to be. This blog post documents how I designed and tested the cluster - hopefully this will be of use to others.
This article has been cross posted to the GI Architects Blog.

What is Kafka?

To quote the project website (https://kafka.apache.org/):
Apache Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
You may also see the following definition:
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log

Kafka Components

Kafa utilises Apache ZooKeeper for cluster co-ordination services; specifically it is used to store metadata about the Kafka cluster and consumer client information. In ZooKeeper terminology, a cluster of ZooKeeper nodes is known as an ensemble. A ZooKeeper ensemble must contain an odd number of nodes in order to avoid split-brain in the event of a network partition.

Kafka nodes themselves are referred to as brokers. Brokers receive messages from Producers, assigs offsets to the messages and commits the messages to disk storage. Brokers also service consumers, responding with the requested messages from a given offset.

In Kafka messages are categorised into topics. Topics can be further split into partitions that can reside on different servers. The default number of partitions for the Kafka cluster. Partitioning provides fault-tolerance as well as the ability to scale throughput.

It is important to note that Kafka only provides time ordering of messages within a single partition. Time ordering is not provided across an entire topic that has multiple partitions.

A Producer application reads data from a source system, then write the data to a Kafka topic. A Consumer application then subscribes to one or more topics and read messages from the topic partitions in the order they were produced.

Consumers in the same consumer group will read messages from the same topic but different partitions thereby increasing read throughput. Consumers keep track of offsets in either Kafka itself or ZooKeeper, which allows consumers to recover after a failure and continue from the last message read.

MirrorMaker is a tool that can be used to replicate data to a remote cluster. MirrorMaker essentially implements a consumer that reads data from a Kafka topic in one cluster and a producer to republish the messages to a topic in a different cluster. If high availability is required for this component, two servers can run one instance of MirrorMaker each configured in the same consumer group. Each instance will get half of the traffic mirroring to the other data centre since they have consumers in the same group. If a MirrorMaker becomes unavailable, the other instance will automatically pick up the load.

Design Considerations

ZooKeeper

A cluster of Zookeeper nodes, known as an ensemble, is used by Kafka for cluster coordination and to store metadata about brokers, topics and partitions. If you have a Hadoop cluster then you will already have ZooKeeper as many Hadoop ecosystem tools utilise it. However, there are a number of reasons why it may not be a good idea to use the ZooKeeper ensemble used by your Hadoop cluster, including: Using the Hadoop Zookeeper ensemble introduces an unnecessary coupling between Hadoop and Kafka.
  • Sharing an ensemble results in maintenance operations on a Hadoop cluster having the potential to impact Kafka.
  • Kafka can be negatively impacted by Zookeeper latency and timeouts; using the Hadoop Zookeeper ensemble increases the risk of increased latency and timeouts.
In order to avoid split-brain scenarios an odd number of nodes in the Zookeeper cluster are required. The minimum number is 3 nodes, which would allow for one failure, 5 nodes are deployed as this allows for failures and maintenance activities. However, 5 nodes increase write latency as more nodes are involved. If you plan to have multiple clusters across different data centres, it would be advisable to use separate ZooKeeper clusters -as per the recommendations in the ZooKeeper documentation: 

"Forming a ZK ensemble between two datacenters is a problematic endeavour as the high variance in latency between the datacenters could lead to false positive failure detection and partitioning. However, if the ensemble runs entirely in one datacenter, and the second datacenter runs only Observers, partitions aren't problematic as the ensemble remains connected. Clients of the Observers may still see and issue proposals"

Data formats

Kafka stores and transfers messages as a byte array and so the structure of the message is transparent to Kafka brokers. However, it is recommended that structure is imposed on messages as a consistent data format allows the writing and reading of messages to be decoupled. At present Apache Avro, a serialisation framework developed for Hadoop, is the most widely used with Kafka as it provides a number of features including:
  • A compact serialisation format
  • Separation of schemas from the message payload
  • Strong typing and schema evolution

Delivery Semantics

It should be noted that the default behaviour of Kafka provides at-least-once delivery semantics (one or more deliveries of the message). In this case Kafka guarantees that no messages are lost but duplicates are possible. By default the Kafka consumer auto-commits offsets (enable.auto.commit is set to true), whereby the Consumer will commit at intervals defined by auto.commit.interval.ms property (default is 60 seconds). The probability of duplicate messages can be reduced by lowering the auto.commit.interval.ms property and at the same time using consumer.commitSync to also commit offsets at intervals that make sense to the application.

Example of how to build the cluster (development environment)

To simplify things I'm going to show the steps for a 3 node development cluster where ZooKeeper and Kafka brokers are running on the same servers but you can easily extend this to an environment where they reside on separate infrastructure. The instructions assume the servers are domain joined (Active Directory).

ZooKeeper Install

Ensure the server has been domain joined and the krb5-workstation package is installed

Install JDK 1.8 u51

export JDK_VERSION="jdk-8u51-linux-x64" 
export JDK_RPM="$JDK_VERSION.rpm" 
wget -O /tmp/$JDK_RPM --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u51-b16/$JDK_RPM 
rpm -ivh /tmp/$JDK_RPM

Install the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files

wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip" 
unzip jce_policy-8.zip mv /usr/java/default/jre/lib/security/local_policy.jar /usr/java/default/jre/lib/security/local_policy.jar.backup mv /usr/java/default/jre/lib/security/US_export_policy.jar /usr/java/default/jre/lib/security/US_export_policy.jar.backup mv UnlimitedJCEPolicyJDK8/*.jar /usr/java/default/jre/lib/security/ rm -f jce_policy-8.zip

Download ZooKeeper 3.4.6 from http://zookeeper.apache.org/releases.html#download Extract the gzip file

tar -zxf zookeeper-3.4.6.tar.gz 

Move the ZooKeeper directory to /usr/local and create a symlink Creating a symlink makes it easier to switch between different ZooKeeper versions.

mv zookeeper-3.4.6 /usr/local/ 
ln -s /usr/local/zookeeper-3.4.6 /usr/local/zookeeper 

Create a user account under which ZooKeeper will run

export ZOOKEEPER_USER='zookeeper' 
export ZOOKEEPER_GROUP='zookeeper' 
/usr/sbin/groupadd -r $ZOOKEEPER_GROUP &>/dev/null 
id $ZOOKEEPER_USER &>/dev/null || useradd -s /sbin/nologin -r -M $ZOOKEEPER_USER -g $ZOOKEEPER_GROUP -c "Zookeeper Server" 
usermod $ZOOKEEPER_USER -d /zookeeper 

Note the /zookeeper directory should be a mount on a separate disk.
Create a file named myid under the path /zookeeper/data/ containing the id of the server (refer to the Blueprint for information on the zookeeper id). The id must be unique in the cluster.
Any easy way to generate an id is to use the suffix of the server hostname.

export VMSuffix=$(hostname -s | rev | cut -d"-" -f1 | rev) 
cat > /zookeeper/data/myid <<EOF 
$VMSuffix 
EOF 

Create a file named zoo.cfg under /usr/local/zookeeper/conf with the following content

touch /usr/local/zookeeper/conf/zoo.cfg 
cat > /usr/local/zookeeper/conf/zoo.cfg <<EOF 
tickTime=2000 
initLimit=10 
syncLimit=5 
dataDir=/zookeeper/data 
dataLogDir=/zookeeper/logs 
server.X=<replace with fqdn of node 1>:2888:3888 
server.X=<replace with fqdn of node 2>:2888:3888 
server.X=<replace with fqdn of node 3>:2888:3888 authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider requireClientAuthScheme=sasl 
jaasLoginRenew=3600000 
kerberos.removeHostFromPrincipal=true 
kerberos.removeRealmFromPrincipal=true 
EOF

This file contains the ZooKeeper server configuration information. The above configuration assumes a 3 node ZooKeeper ensemble. Note the "x" in server.x will need to be replaced with the id of the server.

Create chroot path for Kafka

mkdir -p /zookeeper/<country code>clst<nn>kakfa

You should use a naming scheme that makes sense in your environment, the one here assumes: Country code e.g. UK, US etc.
nn is a numeric cluster identifier, e.g. 01

Create a file named java.env under the path /usr/local/zookeeper/conf with the following content.
This file is used to pass in extra flags when starting ZooKeeper via environment variables. In this case we specify a Java Authentication and Authorization Service (JAAS) configuration file that tells ZooKeeper how to authenticate. The second parameter enables debug mode for Kerberos, which cause ZooKeeper to include information on Kerberos related errors (e.g. authentication failed etc.).

export SERVER_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_jaas.conf -Dsun.security.krb5.debug=true"

Modify the log4.properties file under the path /usr/local/zookeeper/conf, to replace the first 9 lines which are:

# Define some default values that can be overridden by system properties zookeeper.root.logger=INFO, CONSOLE 
zookeeper.console.threshold=INFO 
zookeeper.log.dir=. 
zookeeper.log.file=zookeeper.log 
zookeeper.log.threshold=DEBUG 
zookeeper.tracelog.dir=. 
zookeeper.tracelog.file=zookeeper_trace.log

With
# Define some default values that can be overridden by system properties zookeeper.root.logger=INFO, CONSOLE, ROLLINGFILE 
zookeeper.console.threshold=INFO 
zookeeper.log.dir=/var/log/zookeeper 
zookeeper.log.file=zookeeper.log 
zookeeper.log.threshold=INFO 
zookeeper.tracelog.dir=/var/log/zookeeper 
zookeeper.tracelog.file=zookeeper_trace.log

Once ZooKeeper is built and tested the logging can be turned down e.g. replace the INFO (in red above) with WARN.

Create a systemd unit file for ZooKeeper /etc/systemd/system/zookeeper.service

[Unit] 
Description=Zookeeper distributed coordination server Documentation=http://zookeeper.apache.org 
After=network.target
[Service]
Type=forking
User=zookeeper
Group=zookeeper
SyslogIdentifier=zookeeper
WorkingDirectory=/zookeeper/data
ExecStart=/usr/local/zookeeper/bin/zkServer.sh start /usr/local/zookeeper/conf/zoo.cfg ExecStop=/usr/local/zookeeper/bin/zkServer.sh stop ExecReload=/usr/local/zookeeper/bin/zkServer.sh restart
TimeoutSec=30
Restart=always
 [Install]
WantedBy=multi-user.target

The command below enables the ZooKeeper service

systemctl enable zookeeper

Do not start the service until you have created the AD user and Kerberos keytabs, otherwise the service will fail to start (the process to do this is described in the next section).

Create AD User and Kerberos Keytab for ZooKeeper Server

Each ZooKeeper server requires an account in AD to be created and a keytab created on the server itself.

Create accounts in Active Directory for ZooKeeper using the following format that takes elements from the server naming convention.
  • SamAccountName: zkCCDDDSSSNNNN, where
  • CC is the country code e.g. GB, US
  • DDD is the data centre identifier
  • SSS is the server type e.g. SVV, SVB, SVR
  • NNNN is the numeric suffix that
  • UserPrincipalName: zookeeper/<server fqdn>@<REALM> , where
  • Server FQDN is the fully qualified domain name of the server
  • REALM is the AD domain, in this example we use MYDOMAIN
The naming scheme should be modified to suit your needs.

$SecPaswd= ConvertTo-SecureString -String 'AComplex20CharPasswordWithNumLettersAndPunctuation' -AsPlainText -Force New-ADUser -Name "ZooKeeper Server GB DC1" -SamAccountName zkGBDCSVV0001 -UserPrincipalName zookeeper/gb-dc1-svv-0001.mydomain.co.uk@MYDOMAIN.CO.UK -Path "OU=kafka,DC=MYDOMAIN,DC=co,DC=uk" -Enabled $true -AccountPassword $SecPaswd

The password should be set to never expire:

Set-ADUser -Identity zkGBWATSVV0001 -PasswordNeverExpires $true

Then create a SPN, this step is important, if it is skipped it will lead to Kerberos errors stating that the server does not exist in the database.

get-aduser zkGBDC1SVV0001 | set-aduser -ServicePrincipalNames @{Add=zookeeper/ gb-dc1-svv-0001.mydomain.co.uk} -Server GB-DC1-SVV-1101.mydomain.co.uk

Ensure that there are forward (e.g. hostname to IP) and reverse (IP to hostname) DNS records in AD DNS. This is another common cause of Kerberos errors.

SSH to the ZooKeeper server

Create the directory keytabs under /etc/security

Note, since this is a development cluster where Kafka and ZooKeeper are installed on the same server, we create subdirectories under /etc/security/keytabs for Kafka and ZooKeeper; and modify the permissions accordingly.

E.g. the /etc/security/keytabs directory will need to be owned by the user kafka and the group zookeeper.
mkdir -p /etc/security/keytabs/ 
cd /etc/security/keytabs 
chown -R kafka:kafka /etc/security/keytabs 
chmod 750 /etc/security/keytabs

Create the keytab for the AD principal that you created in step 1, entering the password when prompted

ktutil addent -password -p zookeeper/gb-dc1-svv-0001.mydomain.co.uk@MYDOMAIN.CO.UK -k 1 -e rc4-hmac wkt zookeeper_server.keytab quit

Modify the permissions of the keytab

chown zookeeper:zookeeper zookeeper_server.keytab chmod 400 zookeeper_server.keytab

Create JAAS Configuration File for ZooKeeper Server

The Java Authentication and Authorization Service (JAAS) was introduced as an optional package (extension) to the Java 2 SDK. JAAS implements a Java version of the standard Pluggable Authentication Module (PAM) framework. Kafka and ZooKeeper use JAAS for authentication to authenticate against Kerberos.

After having created the user accounts in AD and a keytab file, we must now create a JAAS configuration file and configure ZooKeeper to use it.In the directory /usr/local/zookeeper/conf, create a file named zookeeper_jaas.conf, ensure the file is owned by the user and group zookeeper Enter the following in the file, changing the part in read (the hostname and realm) accordingly:

Server { 
com.sun.security.auth.module.Krb5LoginModule required debug=true 
useKeyTab=true 
storeKey=true 
doNotPrompt=true 
useTicketCache=false 
keyTab="/etc/security/keytabs/zookeeper/zookeeper_server.keytab" 
principal="zookeeper/gb-dc1-svv-0001.mydomain.co.uk@MYDOMAIN.CO.UK"; 
};

Note the semicolon should only appear on the last line within the curly braces and after the curly braces. Here is a brief explanation of the options, although most of these should be self-explanatory:
  • The first line specifies Kerberos is used for authentication and "debug=true" enables debug mode for Kerberos which is useful for troubleshooting authentication issues.
  • The line "useKeyTab=true" specifies that a keytab should be used to authenticate (as opposed to prompting for a username and password).
  • The next line specifies that the principal's key will be stored in the subject's private credentials.
  • The next line prevents prompting for credentials in the event the keytab does not work or the
  • credentials cannot be obtained from the Kerberos cache.
  • The next line specifies that the TGT should not be fetched from cache
  • The next line specifies the path to the keytab file.
  • The last line specifies the name of the principal that should be used for authentication

Create AD User and Kerberos Keytab for ZooKeeper Client

This user account's keytab will be used when using the command line tools to connect to ZooKeeper for administration and troubleshooting purposes - this is not for application use. Create one client account per ZooKeeper server.

Create accounts in Active Directory for ZooKeeper using the following format that takes elements from the server naming convention.
  • SamAccountName: ZkClientCCDDDNNNN, where
  • CC is the country code e.g. GB, US
  • DDD is the data centre identifier e.g. DC1, DC2, etc
  • NNNN is the numeric suffix that
  • UserPrincipalName: ZkClient/<server fqdn>@<REALM> , where
  • Server FQDN is the fully qualified domain name of the server
  • REALM is MYDOMAIN.CO.UK
The password should be set to never expire.

Then create a SPN, this step is important, if it is skipped it will lead to Kerberos errors stating that the server does not exist in the database.

$SecPaswd= ConvertTo-SecureString -String 'AComplex20CharPasswordWithNumLettersAndPunctuation' -AsPlainText -Force 

New-ADUser -Name "ZooKeeper Client GB DC1" -SamAccountName ZkClientGBDC10001 -UserPrincipalName ZkClient/gb-dc1-svv-0001.mydomain.co.uk@MYDOMAIN.CO.UK -Path "OU=kafka,DC=mydomain,DC=co,DC=uk" -Enabled $true -AccountPassword $SecPaswd 

Set-ADUser -Identity ZkClientGBDC10001 -PasswordNeverExpires $true 
get-aduser ZkClientGBDC10001 | set-aduser -ServicePrincipalNames @{Add=ZkClient/ gb-dc1-svv-0001.mydomain.co.uk} -Server GB-DC1-SVV-1101.mydomain.co.uk

Ensure that there are forward (e.g. hostname to IP) and reverse (IP to hostname) DNS records in AD DNS. This is another common cause of Kerberos errors.

SSH to the ZooKeeper server

Create the keytab for the AD principal that you created in step 1, entering the password when prompted

Ktutil addent -password -p ZkClient/gb-dc1-svv-0001.mydomain.co.uk@MYDOMAIN.CO.UK -k 1 -e rc4-hmac wkt zookeeper_client.keytab quit

Modify the permissions of the keytab
chown zookeeper:zookeeper zookeeper_client.keytab 
chmod 400 zookeeper_client.keytab

Modify the JAAS Configuration File for ZooKeeper Server

Having created the ZooKeeper client credentials we must modify the JAAS configuration file on the ZooKeeper server.

SSH to the ZooKeeper server

Edit the file zookeeper_jaas.conf

Add a section for the client configuration, an example is given below (don't delete the existing content), changing the part in read (the hostname and realm) accordingly:

Client {
com.sun.security.auth.module.Krb5LoginModule required debug=true
useKeyTab=true
storeKey=true
doNotPrompt=true
useTicketCache=false
keyTab="/etc/security/keytabs/zookeeper_client.keytab"
principal=" ZkClient/gb-dc1-svv-0001.mydomain.co.uk@MYDOMAIN.CO.UK";
};

Testing ZooKeeper after Installation and Enabling Kerberos Authentication

ZooKeeper is shipped with command line tool named zkCli.sh, this can be used to test connectivity to your Kerberised ZooKeeper ensemble.

Run the tool with the -server parameter and a comma separated list of your ZooKeeper and port.

/usr/local/zookeeper/bin/zkCli.sh -server ZooKeeperServer1:2181, ZooKeeperServer2:2181, ZooKeeperServer3:2181


Example output is shown with comments inserted.

[root@DSO-DHI-ZK-0001 config]# /usr/local/zookeeper/bin/zkCli.sh -server DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181 Connecting to DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181 2016-07-02 14:32:08,678 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2016-07-02 14:32:08,688 [myid:] - INFO [main:Environment@100] - Client environment:host.name=dso-dhi-zk-0001.dh.local 2016-07-02 14:32:08,688 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_51 2016-07-02 14:32:08,697 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2016-07-02 14:32:08,697 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_51/jre 2016-07-02 14:32:08,697 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/local/zookeeper/bin/../build/classes:/usr/local/zookeeper/bin/../build/lib/*.jar:/usr/local/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/local/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/local/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper/bin/../zookeeper-3.4.6.jar:/usr/local/zookeeper/bin/../src/java/lib/*.jar:/usr/local/zookeeper/bin/../conf: 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.0-327.18.2.el7.x86_64 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root 2016-07-02 14:32:08,698 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/usr/local/kafka_2.11-0.10.0.0/config 2016-07-02 14:32:08,706 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@3eb07fd3 Welcome to ZooKeeper! Debug is true storeKey true useTicketCache true useKeyTab true doNotPrompt true ticketCache is null isInitiator true KeyTab is /etc/security/keytabs/zookeeper_client.keytab refreshKrb5Config is false principal is ZkClient/dso-dhi-zk-0001.dh.local tryFirstPass is false useFirstPass is false storePass is false clearPass is false JLine support is enabled Acquire TGT from Cache Credentials are no longer valid Principal is ZkClient/dso-dhi-zk-0001.dh.local@DH.LOCAL null credentials from Ticket Cache [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CONNECTING) 0] principal is ZkClient/dso-dhi-zk-0001.dh.local@DH.LOCAL Will use keytab Commit Succeeded 2016-07-02 14:32:09,351 [myid:] - INFO [main-SendThread(dso-dhi-zk-0001.dh.local:2181):Login@293] - successfully logged in. 2016-07-02 14:32:09,353 [myid:] - INFO [Thread-1:Login$1@127] - TGT refresh thread started. 2016-07-02 14:32:09,363 [myid:] - INFO [main-SendThread(dso-dhi-zk-0001.dh.local:2181):ZooKeeperSaslClient$1@285] - Client will use GSSAPI as SASL mechanism. 2016-07-02 14:32:09,388 [myid:] - INFO [Thread-1:Login@301] - TGT valid starting at: Sat Jul 02 14:32:09 UTC 2016 2016-07-02 14:32:09,388 [myid:] - INFO [Thread-1:Login@302] - TGT expires: Sun Jul 03 00:32:09 UTC 2016 2016-07-02 14:32:09,388 [myid:] - INFO [Thread-1:Login$1@181] - TGT refresh sleeping until: Sat Jul 02 22:53:43 UTC 2016 2016-07-02 14:32:09,392 [myid:] - INFO [main-SendThread(dso-dhi-zk-0001.dh.local:2181):ClientCnxn$SendThread@975] - Opening socket connection to server dso-dhi-zk-0001.dh.local/10.1.3.4:2181. Will attempt to SASL-authenticate using Login Context section 'Client' 2016-07-02 14:32:09,403 [myid:] - INFO [main-SendThread(dso-dhi-zk-0001.dh.local:2181):ClientCnxn$SendThread@852] - Socket connection established to dso-dhi-zk-0001.dh.local/10.1.3.4:2181, initiating session 2016-07-02 14:32:09,426 [myid:] - INFO [main-SendThread(dso-dhi-zk-0001.dh.local:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server dso-dhi-zk-0001.dh.local/10.1.3.4:2181, sessionid = 0x15597ee807f02fd, negotiated timeout = 30000 [COMMENT: At this point we have a SASL/Kerberos authenticated session.] WATCHER:: WatchedEvent state:SyncConnected type:None path:null WATCHER:: WatchedEvent state:SaslAuthenticated type:None path:null [ COMMENT: You may have to press enter at this point in order to see the line below] [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CONNECTED) 0] [ COMMENT: List znodes] [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CONNECTED) 1] ls / [zookeeper, gbclst01kakfa;DSO-DHI-ZK-0002:2181, gbclst01kakfa] [controller_epoch, controller, brokers, admin, isr_change_notification, consumers, config] [ COMMENT: List gbclst01kakfa znode] [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CONNECTED) 3] ls /gbclst01kakfa/brokers [ids, topics, seqid] [ COMMENT: List the broker ids - we have three Kafka brokers connected to this ZooKeeper ensemble] [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CONNECTED) 4] ls /gbclst01kakfa/brokers/ids [1, 2, 3] [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CONNECTED) 5] [COMMENT: type "help" to see a list of commands.] [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CONNECTED) 12] help ZooKeeper -server host:port cmd args stat path [watch] set path data [version] ls path [watch] delquota [-n|-b] path ls2 path [watch] [COMMENT: type "close" to close your session.] [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CONNECTED) 13] close 2016-07-02 14:42:53,105 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x15597ee807f02fd closed [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CLOSED) 14] 2016-07-02 14:42:53,105 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down [COMMENT: type "quit" to exit the zkCli shell.] [zk: DSO-DHI-ZK-0001:2181,DSO-DHI-ZK-0002:2181,DSO-DHI-ZK-0003:2181(CLOSED) 14] quit Quitting... [root@DSO-DHI-ZK-0001 config]#

Kafka Brokers

Once we have ZooKeeper configured we can install and configure Kafka brokers
Kafka software dependencies:
  • Java Development Kit
  • Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files

Installation


Ensure the server has been domain joined and the krb5-workstation package is installed
Install JDK 1.8 u51

export JDK_VERSION="jdk-8u51-linux-x64"
export JDK_RPM="$JDK_VERSION.rpm"
wget -O /tmp/$JDK_RPM --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u51-b16/$JDK_RPM
rpm -ivh /tmp/$JDK_RPM


Install the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip"
unzip jce_policy-8.zip
mv /usr/java/default/jre/lib/security/local_policy.jar /usr/java/default/jre/lib/security/local_policy.jar.backup
mv /usr/java/default/jre/lib/security/US_export_policy.jar /usr/java/default/jre/lib/security/US_export_policy.jar.backup
mv UnlimitedJCEPolicyJDK8/*.jar /usr/java/default/jre/lib/security/
rm -f jce_policy-8.zip


Download Kafka 0.10.0.0 from http://kafka.apache.org/downloads.html wget http://www-eu.apache.org/dist/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
wget https://dist.apache.org/repos/dist/release/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz.md5
cat kafka_2.11-0.10.0.0.tgz.md5 | cut -d":" -f2 | tr -d ' ' | awk '{print tolower($0)}'
md5sum kafka_2.11-0.10.0.0.tgz


Extract the gzip file
tar -zxf kafka_2.11-0.10.0.0.tgz

Move the Kafka directory to /usr/local and create a symlink
mv kafka_2.11-0.10.0.0 /usr/local/
ln -s /usr/local/kafka_2.11-0.10.0.0 /usr/local/kafka


Creating a symlink makes it easier to switch between different Kafka versions.

Create the Kafka data directories as per the blueprint, in this example we are create 3 data directories.

These should be mount points for separate disks to improve performance.
export KAFKA_DIR='/kafka'
mkdir -p $KAFKA_DIR/data{1,2,3}
mkdir -p $KAFKA_DIR/logs


Create a user account under which Kafka will run
export KAFKA_USER='kafka'
export KAFKA_GROUP='kafka'
export KAFKA_HOME='/usr/local/kafka'
/usr/sbin/groupadd -r $KAFKA_GROUP &>/dev/null
id $KAFKA_USER &>/dev/null || useradd -s /sbin/nologin -r -M $KAFKA_USER -g $KAFKA_GROUP -c "Kafka Broker"
usermod $KAFKA_USER -d "$KAFKA_DIR"


Ensure the OS tuning guidelines are applied as per the blueprint, an example is provided below for how to apply these settings.
touch /etc/sysctl.d/kafka.conf
echo 0 > /proc/sys/vm/swappiness
sysctl -w vm.swappiness=0
echo vm.swappiness=0 >> /etc/sysctl.d/kafka.conf
echo "vm.dirty_ratio=5" >> /etc/sysctl.d/kafka.conf
sysctl -w vm.dirty_ratio=5
echo "vm.dirty_background_bytes=60" >> /etc/sysctl.d/kafka.conf
sysctl -w vm.dirty_background_bytes=60
echo "Setting Auto-tuning read/write TCP buffer limits"
sysctl -w net.ipv4.tcp_rmem='4096 65536 2048000'
echo 'net.ipv4.tcp_rmem = 4096 65536 2048000' >> /etc/sysctl.d/kafka.conf
sysctl -w net.ipv4.tcp_wmem='4096 65536 2048000'
echo 'net.ipv4.tcp_rmem = 4096 65536 2048000' >> /etc/sysctl.d/kafka.conf
echo "Setting Socket buffer defaults and limits"
sysctl -w net.core.wmem_default=131072
echo 'net.core.wmem_default=131072' >> /etc/sysctl.d/kafka.conf
sysctl -w net.core.rmem_default=131072
echo 'net.core.rmem_default=131072' >> /etc/sysctl.d/kafka.conf
sysctl -w net.core.wmem_max=2097152
echo 'net.core.wmem_max=2097152' >> /etc/sysctl.d/kafka.conf
sysctl -w net.core.rmem_max=2097152
echo 'net.core.rmem_max=2097152' >> /etc/sysctl.d/kafka.conf
echo "Setting net.ipv4.tcp_window_scaling to 1 (enabling)"
sysctl -w net.ipv4.tcp_window_scaling=1
echo 'net.ipv4.tcp_window_scaling = 1' >> /etc/sysctl.d/kafka.conf
sysctl -w net.ipv4.tcp_wmem='4096 65536 2048000'
echo 'net.ipv4.tcp_rmem = 4096 65536 2048000' >> /etc/sysctl.d/kafka.conf
echo "Setting the TCP SYN (half-open) connection backlog to 4096"
sysctl -w net.ipv4.tcp_max_syn_backlog=4096
echo 'net.ipv4.tcp_max_syn_backlog = 4096' >> /etc/sysctl.d/kafka.conf
sysctl -w net.core.netdev_max_backlog=30000
echo 'net.core.netdev_max_backlog = 30000' >> /etc/sysctl.d/kafka.conf


In the directory /usr/local/kafka/config/, create a file named server.properties, this file will contain the Kafka configuration. Refer to the blueprint for the contents of the configuration file and the properties to be set, an example is provided below.

This example uses the numeric suffix of the VM for the broker id. Note that fully qualified server names are used at all times. In addition, we specify SASL_PLAINTEXT, which signifies an authenticated but unencrypted connection. Later in this document we will outline how to enable SSL.
export VMFQDN=$(hostname -f)
export VMSuffix=$(hostname -s | rev | cut -d"-" -f1 | rev)
cat > /usr/local/kafka/config/server.properties <<EOF
broker.id=$VMSuffix
listeners = SASL_PLAINTEXT://$VMFQDN:9092
advertised.listeners=SASL_PLAINTEXT://$VMFQDN:9092
security.inter.broker.protocol=SASL_PLAINTEXT
sasl.kerberos.service.name=kafka
zookeeper.connect=<zookeeperServerFQDN1>:2181, <zookeeperServerFQDN2>:2181, <zookeeperServerFQDN3>:2181/gbclst01kakfa
log.dirs=/kafka/data1,/kafka/data2,/kafka/data3
num.reocvery.threads.per.data.dir=4
auto.create.topics=false
num.partitions=1
log.retention.ms=86400000
log.segment.bytes=1073741824
max.message.bytes=1000000
message.max.bytes=1000000
replica.fetch.max.bytes=1000000
delete.topic.enable=true
controlled.shutdown.enable=true
secure=false
reserved.broker.max.id=3000
EOF
Create a krb5.conf file for your domain and place it under /usr/local/kafka/config. Ensure the file contains a section named "[domain_realm]" [domain_realm]
mydomain.co.uk = MYDOMAIN.CO.UK
Where the AD domain on the left must be in lowercase and the realm in upper case on the right. Modify the log4.properties file under the path /usr/local/kafka/config, and after the comments add the line in red below to modify the location of Kafka log files. # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

kafka.logs.dir=/var/log/kafka
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n

Once Kafka is built and tested the logging can be turned down e.g. replace the INFO with WARN in the file above.
Create a systemd unit file for Kafka etc/systemd/system/kafka.service
[Unit] Description=Kafka publish-subscribe messaging system Documentation=http://kafka.apache.org/documentation.html Requires=zookeeper.service After=network.target zookeeper.service [Service] User=kafka Group=kafka SyslogIdentifier=kafka WorkingDirectory=/kafka ExecStart=/usr/bin/java \ -Xmx1G -Xms1G -server \ -XX:+UseCompressedOops \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSClassUnloadingEnabled \ -XX:+CMSScavengeBeforeRemark \ -XX:+DisableExplicitGC \ -Djava.awt.headless=true \ -Xloggc:/var/log/kafka/kafkaServer-gc.log \ -verbose:gc \ -XX:+PrintGCDetails \ -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.authenticate=false \ -Dcom.sun.management.jmxremote.ssl=false \ -Djava.security.auth.login.config=/usr/local/kafka/config/kafka_jaas.conf \ -Dkafka.logs.dir=/var/log/kafka \ -Dlog4j.configuration=file:/usr/local/kafka/config/log4j.properties \ -Djava.security.krb5.conf=/usr/local/kafka/config/krb5.conf \ -cp /usr/local/kafka/libs/* \ kafka.Kafka \ /usr/local/kafka/config/server.properties ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh TimeoutSec=15 Restart=on-failure [Install] WantedBy=multi-user.target
NOTE: The dependency on ZooKeeper in the systemd unit file is only required on servers where you host Kafka and ZooKeeper on the same servers.
The command below enables the Kafka service
Do not start the service until you have created the AD user and Kerberos keytabs, otherwise the service will fail to start (the process of this is described in the next section).
systemctl enable kafka
Create AD User and Kerberos Keytab for Kafka Broker
Each Kafka server requires an account in AD to be created and a keytab created on the server itself.
Create accounts in Active Directory for Kafka using a naming scheme that is appropriate for your environment.

  • SamAccountName: kfCCDDDSSSNNNN, where
  • CC is the country code e.g. GB, US
  • DDD is the data centre identifier e.g. DC1, DC2, etc
  • SSS is the server type e.g. SVV, SVB, SVR
  • NNNN is the numeric suffix that
  • UserPrincipalName: kafka/<server fqdn>@<REALM> , where
  • Server FQDN is the fully qualified domain name of the server
  • REALM is MYDOMAIN.CO.UK
The password should be set to never expire: $SecPaswd= ConvertTo-SecureString -String 'AComplex20CharPasswordWithNumLettersAndPunctuation' -AsPlainText -Force
New-ADUser -Name "Kafka Server GB DC1" -SamAccountName kfGBDC1SVV0001 -UserPrincipalName kafka/gb-dc1-svv-0001.mydomain.co.uk@MYDOMAIN.CO.UK -Path "OU=kafka,DC=mydomain,DC=co,DC=uk" -Enabled $true -AccountPassword $SecPaswd
Set-ADUser -Identity kfGBDC1SVV0001 -PasswordNeverExpires $true
Then create a SPN, this step is important, if it is skipped it will lead to Kerberos errors stating that the server does not exist in the database.
get-aduser kfGBDC1SVV0001 | set-aduser -ServicePrincipalNames @{Add=kafka/ gb-dc1-svv-0001.mydomain.co.uk} -Server GB-DC1-SVV-1101.mydomain.co.uk
Ensure that there are forward (e.g. hostname to IP) and reverse (IP to hostname) DNS records in AD DNS. This is another common cause of Kerberos errors.
SSH to the Kafka server
Create the directory keytabs under /etc/security
mkdir -p /etc/security/keytabs/
cd /etc/security/keytabs
chown -R kafka:kafka /etc/security/keytabs
chmod 750 /etc/security/keytabs
Note, since this is a development cluster where Kafka and ZooKeeper are installed on the same server, we create subdirectories under /etc/security/keytabs for Kafka and ZooKeeper; and modify the permissions accordingly.
Create the keytab for the AD principal that you createad in step 1, entering the password when prompted
ktutil
addent -password -p kafka/gb-dc1-svv-0001.mydomain.co.uk@MYDOMAIN.CO.UK -k 1 -e rc4-hmac
wkt kafka_server.keytab
quit

Modify the permissions of the keytab
chown kafka:kafka kafka_server.keytab
chmod 400 kafka_server.keytab

Create JAAS Configuration File for Kafka
After having created the user accounts in AD and a keytab file, we must now create a JAAS configuration file and configure Kafka to use it.
In the directory /usr/local/kafka/config, create a file named kafka_jaas.conf, ensure the file is owned by the user and group kafka.
Enter the following in the file, changing the part in red (the hostname and realm) accordingly:
KafkaServer {
com.sun.security.auth.module.Krb5LoginModule required debug=true
useKeyTab=true
storeKey=true
doNotPrompt=true
useTicketCache=false
serviceName=kafka
keyTab="/etc/security/keytabs/kafka/kafka_server.keytab"
principal="kafka/<KafkaServerFQDN>@MYDOMAIN.CO.UK ";
};
// ZooKeeper client authentication
Client {
com.sun.security.auth.module.Krb5LoginModule required debug=true
useKeyTab=true
storeKey=true
doNotPrompt=true
useTicketCache=false
serviceName=kafka
keyTab="/etc/security/keytabs/kafka/kafka_server.keytab"
principal="kafka/<KafkaServerFQDN>@MYDOMAIN.CO.UK";
};



Testing Kafka after Installation and Enabling Kerberos Authentication


Kafka is shipped with command line tool named kafka-topics.sh, this can be used to test Kafka (and therefore Kafka - ZooKeeper communications).
Run the tool with the -zookeeper parameter and specify one of the zookeeper servers and port, the name after the slash is the chroot path; and lastly add the -list parameter, to list all topics. In a new cluster there will be no topics but this will still test that Kafka and ZooKeeper are functioning. You can specify a comma separated list of zookeeper servers, if you do the script will try to connect to each zookeeper in turn if it cannot connect to the first.
/usr/local/kafka/bin/kafka-topics.sh --zookeeper <ZooKeeperServer>:2181/<chroot path> --list,
In the next example we use the same tool to create topic named test, use the describe parameter to view its properties and then delete it.
[root@DSO-DHI-ZK-0001 config]# /usr/local/kafka/bin/kafka-topics.sh --create --zookeeper DSO-DHI-ZK-0001:2181/gbclst01kakfa --replication-factor 1 --partitions 1 --topic test Created topic "test". [root@DSO-DHI-ZK-0001 config]# /usr/local/kafka/bin/kafka-topics.sh --zookeeper DSO-DHI-ZK-0001:2181/gbclst01kakfa --describe --topic test Topic:test PartitionCount:1 ReplicationFactor:1 Configs: Topic: test Partition: 0 Leader: 2 Replicas: 2 Isr: 2 [root@DSO-DHI-ZK-0001 config]# [root@DSO-DHI-ZK-0001 config]# /usr/local/kafka/bin/kafka-topics.sh --zookeeper DSO-DHI-ZK-0001:2181/gbclst01kakfa --delete --topic test Topic test is marked for deletion. Note: This will have no impact if delete.topic.enable is not set to true.

Enabling SSL for Kafka


SSL certificates will need to be created for each Kafka server.
SSH to the Kafka server
Create a directory to store the SSL certificates and change to this directory
mkdir -p /kafka/ssl/certs/
cd /kafka/ssl/certs/

Create a keystore using the keytool command. Choose a long and complex password.
export VMFQDN=$(hostname -f)
echo $VMFQDN

# validity is in days, 730 days = 2 years, 1095 days = 3 years
keytool -keystore kafka.server.keystore.jks -alias $VMFQDN -validity 1095 -genkey -dname "CN= ${VMFQDN}, OU=TechDept, O=acme ltd, L=London, ST=London, C=GB"

Generate a Certificate Signing Request (CSR) using the keytool

export VMFQDN=$(hostname -f)
keytool -certreq -keystore kafka.server.keystore.jks -alias $VMFQDN -keyalg rsa -file "${VMFQDN}_kafka_broker.csr" -dname "CN=${VMFQDN}, OU=TechDept, O=acme ltd, L=London, ST=London, C=GB"
The CSR then must be provided to your CA to generate a certificate
Import the certificate into the keystore
keytool -trustcacerts -keystore kafka.server.keystore.jks -alias $VMFQDN -import -file <certificate file> You will need the Root and Intermediate certificates that are used to generate the certificates for the Kafka broker. These certificates will need to be imported into the truststore.

The command below shows how to export the certificate from the Root CA:
certutil -ca.cert mydomain-ca-cert.crt
The next command shows how to export the certificate from the Intermediate CA:
certutil -ca.cert mydomain-intermediate-ca-cert.crt

The next example shows how to import the Root CA and Intermediate CA into the truststore.

# Import the CA & Intermediate certificate into the truststore
cp mydomain-ca-cert.crt /kafka/ssl/certs/
cp mydomain-intermediate-ca-cert.crt /kafka/ssl/certs/
keytool -keystore kafka.server.truststore.jks -alias CARoot -import -file /kafka/ssl/certs/mydomain-ca-cert.crt
keytool -keystore kafka.server.truststore.jks -alias CAItermediate -import -file /kafka/ssl/certs/mydomain-intermediate-ca-cert.crt

You will be prompted to set a password, choose a long and complex password.

Running the commands below should show the CARoot, CAIntermediate and the broker certificate in the truststore and keystore respectively.

keytool -list -keystore kafka.server.truststore.jks
keytool -list -keystore kafka.server.keystore.jks

The Kafka broker configuration must now be updated. First create a backup of the previous configuration file.

export DATETS=$(date +'%F-%R' | tr -d ':')
cp /usr/local/kafka/config/server.properties /usr/local/kafka/config/server.properties.${DATETS}.backup
Make the changes indicated to configuration file indicated below.

Change
listeners = SASL_PLAINTEXT://$VMFQDN:9092
To
listeners = SASL_SSL://$VMFQDN:9092

Where $VMFQDN is the fully qualified domain name of the server
Change
advertised.listeners=SASL_PLAINTEXT://$VMFQDN:9092
To
advertised.listeners=SASL_SSL://$VMFQDN:9092

Where $VMFQDN is the fully qualified domain name of the server

Change
security.inter.broker.protocol=SASL_PLAINTEXT
To
security.inter.broker.protocol=SASL_SSL


Add the following properties
ssl.keystore.location=/kafka/ssl/certs/kafka.server.keystore.jks
ssl.keystore.password=<password>
ssl.key.password==<password>
ssl.truststore.location=/kafka/ssl/certs/kafka.server.truststore.jks
ssl.truststore.password=<password>

Replace <password> with the actual keystore, truststore or SSL cert password as appropriate.

Once ALL Kafka brokers been configured restart them using the systemctl command.
systemctl stop kafka
systemctl status kafka -l
systemctl start kafka
systemctl status kafka -l

To check quickly if the server keystore and truststore are setup properly you can run the following command on each Kafka broker
openssl s_client -debug -connect "$(hostname -I)":9092 -tls1

There may be warnings or errors in the openssl command output about the use of self-signed certificates, this is because openssl does not use the Java truststore we created for verifying certificates. However, you can solve this on CentOS 7.x by placing the certificates to be trusted (in PEM format) in /etc/pki/ca-trust/source/anchors/ and run sudo update-ca-trust.

cd /kafka/ssl/certs
cp mydomain-ca-cert.crt /etc/pki/ca-trust/source/anchors/
cp mydomain-intermediate-ca-cert.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust


Enabling Kafka ACLs


In order to enable Kafka ACLs we must modify the server.properties file on each Kafka broker.
Backup the configuration file first

export DATETS=$(date +'%F-%R' | tr -d ':')
cp /usr/local/kafka/config/server.properties /usr/local/kafka/config/server.properties.${DATETS}.backup


To enable ACLs, we need to configure an authorizer. Kafka provides a simple authorizer implementation, and to use it, you can add the following to server.properties:

authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
zookeeper.set.acl=true

#Change the line
secure=false
#To
secure=true

The line "zookeeper.set.acl=true" ensures that only brokers will be able to modify the corresponding znodes for the Kafka metadata stored in ZooKeeper.

One can also add super users in server.properties like the following (note that the delimiter is semicolon since SSL user names may contain comma):
super.users=User:Bob;User:Alice

Refer to the documentation on the Kafka website here http://kafka.apache.org/documentation.html#security_authz_cli for more information on how to manage ACLs.

Comments