Building Python Real-Time Applications with Storm
By Bhatnagar Kartik and Barry Hart
()
About this ebook
About This Book
- Learn to use Apache Storm and the Python Petrel library to build distributed applications that process large streams of data
- Explore sample applications in real-time and analyze them in the popular NoSQL databases MongoDB and Redis
- Discover how to apply software development best practices to improve performance, productivity, and quality in your Storm projects
Who This Book Is For
This book is intended for Python developers who want to benefit from Storm’s real-time data processing capabilities. If you are new to Python, you’ll benefit from the attention to key supporting tools and techniques such as automated testing, virtual environments, and logging. If you’re an experienced Python developer, you’ll appreciate the thorough and detailed examples
What You Will Learn
- Install Storm and learn about the prerequisites
- Get to know the components of a Storm topology and how to control the flow of data between them
- Ingest Twitter data directly into Storm
- Use Storm with MongoDB and Redis
- Build topologies and run them in Storm
- Use an interactive graphical debugger to debug your topology as it’s running in Storm
- Test your topology components outside of Storm
- Configure your topology using YAML
In Detail
Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.”
At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily.
You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Style and approach
This book takes an easy-to-follow and a practical approach to help you understand all the concepts related to Storm and Python.
Related to Building Python Real-Time Applications with Storm
Related ebooks
Expert Python Programming - Third Edition: Become a master in Python by learning coding best practices and advanced programming concepts in Python 3.7, 3rd Edition Rating: 0 out of 5 stars0 ratingsFlask Blueprints Rating: 0 out of 5 stars0 ratingsPython Web Scraping - Second Edition Rating: 5 out of 5 stars5/5Large Scale Machine Learning with Python Rating: 2 out of 5 stars2/5Mastering matplotlib Rating: 0 out of 5 stars0 ratingsPython for Secret Agents - Volume II Rating: 0 out of 5 stars0 ratingsJasmine JavaScript Testing - Second Edition Rating: 0 out of 5 stars0 ratingsDistributed Computing with Python Rating: 0 out of 5 stars0 ratingsMastering Python Regular Expressions Rating: 5 out of 5 stars5/5NumPy Cookbook Rating: 5 out of 5 stars5/5Flask By Example Rating: 0 out of 5 stars0 ratingsLearning NumPy Array Rating: 0 out of 5 stars0 ratingsModular Programming with Python Rating: 0 out of 5 stars0 ratingsPython In - Depth: Use Python Programming Features, Techniques, and Modules to Solve Everyday Problems Rating: 0 out of 5 stars0 ratingsPython for Secret Agents Rating: 0 out of 5 stars0 ratingsPython High Performance - Second Edition Rating: 0 out of 5 stars0 ratingsMatplotlib for Python Developers Rating: 3 out of 5 stars3/5Learning OpenCV 3 Computer Vision with Python - Second Edition Rating: 0 out of 5 stars0 ratingsPython for Google App Engine Rating: 0 out of 5 stars0 ratingsMastering Objectoriented Python Rating: 5 out of 5 stars5/5Advance Core Python Programming: Begin your Journey to Master the World of Python (English Edition) Rating: 4 out of 5 stars4/5Microsoft .NET Framework 4.5 Quickstart Cookbook Rating: 0 out of 5 stars0 ratingsApplied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1 Rating: 0 out of 5 stars0 ratingsThe Quick Python Book Rating: 0 out of 5 stars0 ratingsParallel Programming with Python Rating: 0 out of 5 stars0 ratingsMastering Large Datasets with Python: Parallelize and Distribute Your Python Code Rating: 0 out of 5 stars0 ratingsExpert Python Programming - Second Edition Rating: 2 out of 5 stars2/5Pro Spring Boot 2: An Authoritative Guide to Building Microservices, Web and Enterprise Applications, and Best Practices Rating: 0 out of 5 stars0 ratingsFunctional Python Programming Rating: 0 out of 5 stars0 ratings
Programming For You
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast! Rating: 5 out of 5 stars5/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsBeginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5Coding with JavaScript For Dummies Rating: 0 out of 5 stars0 ratingsNarrative Design for Indies: Getting Started Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5C# 7.0 All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsExcel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python Data Structures and Algorithms Rating: 5 out of 5 stars5/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5C All-in-One Desk Reference For Dummies Rating: 5 out of 5 stars5/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5C Programming for Beginners: Your Guide to Easily Learn C Programming In 7 Days Rating: 4 out of 5 stars4/5
Reviews for Building Python Real-Time Applications with Storm
0 ratings0 reviews
Book preview
Building Python Real-Time Applications with Storm - Bhatnagar Kartik
Table of Contents
Building Python Real-Time Applications with Storm
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Acquainted with Storm
Overview of Storm
Before the Storm era
Key features of Storm
Storm cluster modes
Developer mode
Single-machine Storm cluster
Multimachine Storm cluster
The Storm client
Prerequisites for a Storm installation
Zookeeper installation
Storm installation
Enabling native (Netty only) dependency
Netty configuration
Starting daemons
Playing with optional configurations
Summary
2. The Storm Anatomy
Storm processes
Supervisor
Zookeeper
The Storm UI
Storm-topology-specific terminologies
The worker process, executor, and task
Worker processes
Executors
Tasks
Interprocess communication
A physical view of a Storm cluster
Stream grouping
Fault tolerance in Storm
Guaranteed tuple processing in Storm
XOR magic in acking
Tuning parallelism in Storm – scaling a distributed computation
Summary
3. Introducing Petrel
What is Petrel?
Building a topology
Packaging a topology
Logging events and errors
Managing third-party dependencies
Installing Petrel
Creating your first topology
Sentence spout
Splitter bolt
Word Counting Bolt
Defining a topology
Running the topology
Troubleshooting
Productivity tips with Petrel
Improving startup performance
Enabling and using logging
Automatic logging of fatal errors
Summary
4. Example Topology – Twitter
Twitter analysis
Twitter's Streaming API
Creating a Twitter app to use the Streaming API
The topology configuration file
The Twitter stream spout
Splitter bolt
Rolling word count bolt
The intermediate rankings bolt
The total rankings bolt
Defining the topology
Running the topology
Summary
5. Persistence Using Redis and MongoDB
Finding the top n ranked topics using Redis
The topology configuration file – the Redis case
Rolling word count bolt – the Redis case
Total rankings bolt – the Redis case
Defining the topology – the Redis case
Running the topology – the Redis case
Finding the hourly count of tweets by city name using MongoDB
Defining the topology – the MongoDB case
Running the topology – the MongoDB case
Summary
6. Petrel in Practice
Testing a bolt
Example – testing SplitSentenceBolt
Example – testing SplitSentenceBolt with WordCountBolt
Debugging
Installing Winpdb
Add Winpdb breakpoint
Launching and attaching the debugger
Profiling your topology's performance
Split sentence bolt log
Word count bolt log
Summary
A. Managing Storm Using Supervisord
Storm administration over a cluster
Introducing supervisord
Supervisord components
Supervisord installation
Configuration of supervisord.conf
Configuration of supervisord.conf on 172-31-19-62
Summary
Index
Building Python Real-Time Applications with Storm
Building Python Real-Time Applications with Storm
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2015
Production reference: 1261115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78439-285-7
www.packtpub.com
Credits
Authors
Kartik Bhatnagar
Barry Hart
Reviewers
Oscar Campos
Pavan Narayanan
Commissioning Editor
Usha Iyer
Acquisition Editor
Larissa Pinto
Content Development Editor
Anish Sukumaran
Technical Editor
Tanmayee Patil
Copy Editor
Vikrant Phadke
Project Coordinator
Izzat Contractor
Proofreader
Safis Editing
Indexer
Rekha Nair
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
About the Authors
Kartik Bhatnagar loves nature and likes to visit picturesque places. He is a technical architect in the big data analytics unit of Infosys. He is passionate about new technologies. He is leading the development work of Apache Storm and MarkLogic NoSQL for a leading bank. Kartik has a total 10 years of experience in software development for Fortune 500 companies in many countries. His expertise also includes the full Amazon Web Services (AWS) stack and modern open source libraries. He is active on the StackOverflow platform and is always eager to help young developers with new technologies. Kartik has also worked as a reviewer of a book called Elasticsearch Blueprints, Packt Publishing. In the future, he wants to work on predictive analytics.
Barry Hart began using Storm in 2012 at AirSage. He quickly saw the potential of Storm while suffering from the limitations of the basic storm.py that it provides. In response, he developed Petrel, the first open source library for developing Storm applications in pure Python. He also contributed some bug fixes to the core Storm project.
When it comes to development, Barry has worked on a little of everything: Windows printer drivers, logistics planning frameworks, OLAP engines for the retail industry, database engines, and big data workflows.
Barry is currently an architect and senior Python/C++ developer at Pindrop Security, helping fight phone fraud in banking, insurance, investment, and other industries.
I want to thank my wonderful wife, Beth, for all her love and support. I would also like to thank my two little boys, who keep me young and make every day special.
About the Reviewers
Oscar Campos has been working with Python since early 2007. He is the author of the famous Anaconda Python IDE package for Sublime Text 3, available as free software at http://github.com/DamnWidget/anaconda.
He currently works as a senior software engineer on EXADS, programming high-concurrency backend system applications in Golang.
Oscar has also reviewed PySide GUI Application Development, Packt Publishing.
I want to thank my wife, Lydia, for all her support in every aspect of my life—without you, nothing could be possible.
Pavan Narayanan is a blogger at DataScience Hacks (https://datasciencehacks.wordpress.com), experienced in developing mathematical programming and data analytics solutions. He has utilized Apache Storm for developing real-time analytics prototype and his interests are exploring problem solving techniques, from industrial mathematics to machine learning. He can be reached at
Pavan has also reviewed Apache Mahout Essentials, Learning Apache Mahout Classification, and Mastering Machine Learning with R, all by Packt Publishing.
I would like to thank my family and God almighty for all the strength and endurance, and the folks at Packt Publishing for the opportunity to work on this book.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and