Tuesday, November 3, 2015

Fun with Apache Kafka

0 lượt xem comment 0 comments

Apache Kafka is a high throughput messaging system created by LinkedIN in 2011 using Scala. Kafka is a high throughput system pretty much, in other words it means will handle the load you have, so you can use it as a buffer for back pressure or spooling mechanism.

Now a days people use Kafka a lot with other Big Data Technologies like Apache Storm, Apache Hadoop or even with Apache Spark, so there is a common pattern like you read data from Kafka, process in spark/storm/hadoop and store it in the end into a NoSQL database like Cassandra for instance.  People end up using Kafka for Analytics (that's what i meant by big data), monitoring, log activity and sometimes for building block as part of bigger architectures / systems. You can see more on LinkedIN view on this post. Kafka is Durable(can be persisted on DISK) and very FAST and can scale great deal of loads like 800 billions messages per day at LinkedIN.  Besides LinkedIN, Twitter, Netflix, Spotify, Mozilla and others also use Kafka.


Kafka Overview

Kafka is pretty simple to understand, not so simple to tune :-) It pretty much has the concept of messaging Producers: Components that write data into Kafka and Have Message Consumers: Components that read data from Kafka. Kafka does not implement Java Messaging System Spec(a.k.a JMS).

Kafka has a cluster as well, Kafka uses Apache Zookeeper in order to distribute and coordinate the cluster work. Apache Zookeeper is mature, well tested, totally battle tested.  If you use zookeper by your self you its recommended you use recipes like Curator project in order to avoid commons issues.

Data is stored into Topics in Apache Kafka, this topics are split into multiple partitions. The Partitions are replicated across the cluster.

When you write into a Topic you can write into multiple Partitions are at same time.

There is one great thing is you can read from the beginning of the Partition or the current moment of the Partition.  Its possible to have consumers in different offsets of the partition as well.

Kafka makes the Partition ORDERED and IMMUTABLE and the Sequence of messages is APPENDED in the END. Its possible to configure the partition of the topic as well this give you the control of max group parallelism for a consumer group. 

Replicas are pretty much backups of the partitions. This concept exist to prevent Data Loss. So you never read or write direct to a replica. 


Having Fun with Kafka 

Having fun with Kafka

Installing Kafka
sudo wget http://www.eu.apache.org/dist//kafka/0.8.2.0/kafka_2.10-0.8.2.0.tgz
tar -xzf kafka_2.10-0.8.2.0.tgz
cd kafka_2.10-0.8.2.0
START Zookeper and KAFKA SERVER
sudo nohup bin/zookeeper-server-start.sh config/zookeeper.properties &
sudo nohup bin/kafka-server-start.sh config/server.properties &

Create a Topic and List all Topics

sudo bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test &
sudo bin/kafka-topics.sh --list --zookeeper localhost:2181 &
Create and send message to kafka topic
sudo bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Start a consumer and consume all messages from begining
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
view raw kafka.md hosted with ❤ by GitHub


Cheers,
Diego Pacheco


Tags: no keyword

comment 0 comments

Chuyên mục văn hoá giải trí của VnExpress

.

© 2017 www.blogthuthuatwin10.com

Tầng 5, Tòa nhà FPT Cầu Giấy, phố Duy Tân, Phường Dịch Vọng Hậu, Quận Cầu Giấy, Hà Nội
Email: nguyenanhtuan2401@gmail.com
Điện thoại: 0908 562 750 ext 4548; Liên hệ quảng cáo: 4567.