Skip to content

vraja2/flink-sessionization

Repository files navigation

flink-sessionization

Description

A streaming pipeline using Apache Flink to process an input visitor clickstream from Kafka and output sessions back into Kafka. A session is terminated after 30 minutes of inactivity, but we flush incomplete sessions every minute to support real-time analysis on event data. Session state is maintained using RocksDB which is natively supported by Flink.

Running Locally

  1. Install Kafka
    brew install kafka

  2. Run Kafka and Zookeeper
    zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties & kafka-server-start /usr/local/etc/kafka/server.properties

  3. Install Flink
    brew install apache-flink

  4. Run the Flink cluster
    /usr/local/Cellar/apache-flink/1.6.0/libexec/bin/start-cluster.sh

  5. Build a shadowJar
    ./gradlew shadowJar

  6. Start the Sessionization streaming job
    flink run -c com.vigneshraja.sessionization.Sessionization ./build/libs/flink-sessionization-all.jar

  7. Verify the job is running with the Flink dashboard alt tag

  8. Use the GenerateEvents utility to generate clickstream data into the Kafka input topic raw-events

  9. Run Kafka console consumer on the sessionized-events topic
    kafka-console-consumer --topic sessionized-events --bootstrap-server localhost:9092

    Every minute, you will see open sessions being flushed to the sessionized-events topic. After 30 minutes of inactivity, you will see the sessions transition to CLOSED

    ...
    {"events":[{"timestamp":1537744379953,"userId":"userId14","eventType":"PAGE_VIEW"},{"timestamp":1537744865639,"userId":"userId14","eventType":"PAGE_VIEW"},{"timestamp":1537744868709,"userId":"userId14","eventType":"PAGE_VIEW"},{"timestamp":1537744871440,"userId":"userId14","eventType":"PAGE_VIEW"}],"lastEventTimestamp":1537744871440,"status":"CLOSED","id":"f0ba0425-5c48-43c1-ac6a-c994731b6753"}
    ...
    

About

A real-time sessionization pipeline written using Apache Flink

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy