In this tutorial, you will learn how to create your first Kafka cluster. You will learn how to start three Kafka servers in a cluster using KRaft mode. If you want to run just one server in a cluster, then read “Start Apache Kafka Server in KRaft mode” tutorial.
Starting more than one server in a cluster allows you to take advantage of Kafka’s features such as replication, fault tolerance, and load balancing. You will also learn how to generate a cluster UUID, format the log directories, test the cluster functionality, and stop the servers. This tutorial is intended for absolute beginners who want to get started with Kafka and KRaft mode on a local machine for development and testing purposes.
Before you start this tutorial, you should have the following requirements:
- A basic knowledge of Kafka and its concepts, such as topics, partitions, brokers, and controllers. If you are new to Kafka, you can check out this Kafka tutorials for beginners page.
- Java 8 or higher installed.
- Apache Kafka installed. You can download and install Kafka from this page: Download Apache Kafka.
Step 1: Prepare Server Configuration Properties
To run a cluster of Kafka servers, you need to create and customize a separate configuration file for each server. In this section, you will learn how to do that.
In this tutorial, I will start Kafka servers using Kraft mode. To find server configuration file for KRaft mode you will need to work with folder that is located in
<Kafka Home Folder>/config/kraft.
The configuration file you will need to work with is called
server.properties. This file is a text file that contains various settings and parameters for the Kafka server, such as the node id, the listeners, the log directories, the number of controllers, and the ports. You can modify these settings to change the behavior and performance of the Kafka server.
To start a cluster of three Kafka servers, you will need to create three configuration files:
server-3.properties. Each file will have a different node id and listener, but the rest of the settings will be the same as the default configuration file.
To create server configuration files, follow these steps:
- Open terminal window and change directory to
<Kafka Home Folder>/config/kraftfolder.
- Duplicate server.properties three times.
cp server.properties server-1.properties cp server.properties server-2.properties cp server.properties server-3.properties
Configure node.id property
Now, let’s tailor each server’s properties starting with
node.id property must be unique for each server to identify them within the cluster.
Each server must have a different
node.id, regardless of its role as a broker or a controller. The node.id is important for the core Kafka algorithms, such as leader election, replication, and partition assignment. The node.id replaces the broker.id property, which is used when operating in ZooKeeper mode.
Repeat the process for
node.id=3 respectively. This ensures each Kafka server has a distinct identity in the cluster.
Next, update the
listeners property. This defines the network addresses the Kafka server uses for communication. For server 1, you’ll set it to listen on ports 9092 for Kafka broker and 9093 for the controller.
A listener is a combination of a protocol, a host, and a port, such as PLAINTEXT://:9092. The protocol defines the security mechanism for the connection, such as PLAINTEXT, SSL, SASL, etc. The host is the IP address or hostname of the server, and the port is the number of the socket that the server listens on.
In this configuration, I have two listeners: PLAINTEXT://:9092 and CONTROLLER://:9093. This means that your Kafka server listens on two ports: 9092 and 9093. The PLAINTEXT listener uses the PLAINTEXT protocol, which means that the connection is not encrypted or authenticated. The CONTROLLER listener uses the CONTROLLER protocol, which is a special protocol for the controllers in the KRaft mode. The controllers are the servers that store and manage the metadata of the cluster, and they use the Raft consensus protocol to elect a leader and synchronize their state. The CONTROLLER protocol is optimized for the communication between the controllers and the brokers, and it is not intended for external clients.
For server 2 and server 3, increment the ports by one for each successive server, so they don’t clash:
# For server-2.properties listeners=PLAINTEXT://:9094,CONTROLLER://:9095 # For server-3.properties listeners=PLAINTEXT://:9096,CONTROLLER://:9097
You can have multiple listeners in your Kafka cluster, but you need to make sure that they have different ports and protocols. You also need to configure the advertised.listeners property, which specifies the externally visible addresses for the listeners, which may be different from the actual listener addresses due to network topology or security considerations. For example, if your Kafka server is behind a firewall or a proxy, you need to advertise the address that the clients can reach, not the internal address of the server.
Controller Quorum Voters
controller.quorum.voters property specifies the “quorum voters” in the cluster. You’ll list all server nodes that act as controllers using the format
This property is used to specify the list of controllers in the Kafka cluster, which are the servers that store and manage the metadata of the cluster. The controllers use the Raft consensus protocol to elect a leader and synchronize their state. The leader controller is responsible for handling requests from brokers and clients, such as creating topics, assigning partitions, and changing configurations.
controller.quorum.voters property is a comma-separated list of quorum voters, where each voter is identified by a node id and a network address. You need to include all the controllers in the list, and make sure that each controller has a different node id and port.
This means that the cluster has three quorum voters, with node ids 1, 2, and 3, and network addresses localhost:9093, locahost:9095, and localhost:9097. All the servers (controllers and brokers) in the cluster will use this property to discover and communicate with the controllers.
advertised.listeners property is what the Kafka broker will communicate to clients. This needs to reflect the corresponding
listeners ports for each server:
# For server-1.properties advertised.listeners=PLAINTEXT://localhost:9092 # For server-2.properties advertised.listeners=PLAINTEXT://localhost:9094 # For server-3.properties advertised.listeners=PLAINTEXT://localhost:9096
advertised.listeners property is used to specify the externally visible addresses for the listeners, which may be different from the actual listener addresses due to network topology or security considerations. For example, if your Kafka server is behind a firewall or a proxy, you need to advertise the address that the clients can reach, not the internal address of the server.
advertised.listeners property is a comma-separated list of listeners with their host/IP and port. This is the metadata that is passed back to clients when they request information about the cluster. The clients will use these addresses to connect to the brokers and read/write data.
You need to make sure that the advertised.listeners property matches the listeners property, except for the host/IP part, which may be different. You also need to make sure that the advertised.listeners property is reachable by the clients, and that the ports are open and not blocked by any firewall or network rules.
Finally, each server should store its logs in a separate directory to avoid conflicts. Assign unique log directories for each server:
# For server-1.properties log.dirs=/tmp/server-1/kraft-combined-logs # For server-2.properties log.dirs=/tmp/server-2/kraft-combined-logs # For server-3.properties log.dirs=/tmp/server-3/kraft-combined-logs
log.dirs property is used to specify the location of the folder/directory within a broker where the topic messages are stored. This directory has a well-defined structure that captures the storage architecture of Kafka.
Each topic is divided into one or more partitions, and each partition is represented by a log file on the disk. The log file consists of segments, which are chunks of data that are appended sequentially. Each segment has an index file and a time index file that store the offsets and timestamps of the messages in the segment.
The log.dirs property can be a comma-separated list of directories, in which case Kafka will distribute the partitions across the directories. This can improve the performance and reliability of the Kafka cluster by balancing the disk usage and avoiding single point of failure.
Step 2: Generate a Cluster UUID
In the previous section, you prepared the configuration files for the three Kafka servers in the cluster. In this section, you will learn how to generate a cluster UUID and format the storage with the generated UUID.
A cluster UUID is a unique and permanent identifier for the Kafka cluster. It is used to prevent accidental configuration changes and ensure compatibility with KRaft mode. KRaft mode is a new way of running Kafka without ZooKeeper, which simplifies the Kafka architecture and improves its performance and stability.
To generate a cluster UUID, you can use the kafka-storage tool, which is one of the command-line tools that Kafka provides. This tool can also format the storage with the cluster UUID, which means to initialize the metadata for the partitions and assign the cluster UUID to them.
To use the kafka-storage tool, you need to open a terminal window and navigate to the bin directory where Kafka is installed. For example, if Kafka is installed in the /Users/me/kafka folder, you can use the following command:
Then, you can use the following command to generate a cluster UUID and store it in a variable named KAFKA_CLUSTER_ID:
This command will create a cluster UUID that consists of 16 bytes of a base64-encoded UUID, such as p8fFEbKGQ22B6M_Da_vCBw. You can print the cluster UUID using the echo command, such as:
Next, you can use the following command to format the storage with the cluster UUID and the configuration file for the first server:
./kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c ../config/kraft/server-1.properties
This command will format the storage using the cluster UUID stored in the KAFKA_CLUSTER_ID variable and the configuration file located in the …/config/kraft/server-1.properties path. You can change the path to the configuration file if you have a different location.
You need to repeat this command for the other two servers, using their respective configuration files. For example, you can use the following commands:
./kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c ../config/kraft/server-2.properties ./kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c ../config/kraft/server-3.properties
You have now generated a cluster UUID and formatted the storage with the cluster UUID for the three Kafka servers in the cluster. In the next section, you will learn how to start the servers using the configuration files and the cluster UUID.
Step 3: Start the Kafka Servers
In the previous sections, you generated a cluster UUID and formatted the storage with the cluster UUID for the three Kafka servers in the cluster. In this section, you will learn how to start the servers using the configuration files and the cluster UUID.
To start the Kafka servers, you can use the kafka-server-start tool, which is one of the command-line tools that Kafka provides. This tool can start the Kafka server as both a broker and a controller in a single process.
A broker is a server that stores and serves the data to the clients, and a controller is a server that stores and manages the metadata of the cluster. This mode is suitable for local development and testing, but not for production environments.
Starting first Kafka server
To start first Kafka server use the configuration file for the first server:
This will start the Kafka server using the configuration file located in the …/config/kraft/server-1.properties path. You can change the path to the configuration file if you have a different location.
Starting other Kafka servers
You need to repeat this command for the other two servers, using their respective configuration files. However, you need to open a new terminal window for each server, so that you can run them simultaneously. For example, you can use the following commands in different terminal windows:
./kafka-server-start.sh ../config/kraft/server-2.properties ./kafka-server-start.sh ../config/kraft/server-3.properties
You have now started three Kafka servers in a cluster using KRaft mode. In the next section, you will learn how to test the cluster functionality and verify its performance.
In this tutorial, you learned how to start three Kafka servers in a cluster using KRaft mode. You learned how to generate a cluster UUID, format the storage with the cluster UUID, start the servers using the configuration files and the cluster UUID, and test the cluster functionality and performance. You also learned some basic concepts and terms related to Kafka and KRaft mode, such as topics, partitions, brokers, controllers, listeners, and node ids.
This tutorial was intended for absolute beginners who want to get started with Kafka and KRaft mode on a local machine for development and testing purposes. However, there is much more to learn about Kafka and its features, such as replication, fault tolerance, load balancing, security, and streaming. If you want to learn more about Apache Kafka and how to use it for various scenarios, you can check out this page: Apache Kafka tutorials for beginners. There, you will find many helpful and practical tutorials that will guide you through different aspects and applications of Kafka.
Thank you for following this tutorial and I hope you enjoyed it. If you have any feedback or questions, please feel free to leave a comment below.
Happy learning! 🙌