- From www.udacity.com
Real-Time Analytics with Apache Storm
- Self-paced
- Free Access
- 2 Sequences
- Introductive Level
Course details
Syllabus
Lesson 1
Join instructor Karthik Ramasamy and the first Udacity-Twitter Storm Hackathon to cover the motivation and practice of real-time, distributed, fault-tolerant data processing. Dive into basic Storm Topologies by linking to a real-time d3 Word Cloud Visualization using Redis, Flask, and d3.Lesson 2
Explore Storm basics by programming Bolts, linking Spouts, and finally connecting to the live Twitter API to process real-time tweets. Explore open source components by connecting a Rolling Count Bolt to your topology to visualize Rolling Top Tweeted Words.Lesson 3
Go beyond Storm basics by exploring multi-language capabilities to download and parse real-time Tweeted URLs in Python using Beautiful Soup. Integrate complex open source bolts to calculate Top-N words to visualize real-time Top-N Hashtags. Finally, use stream grouping concepts to easily create streaming join to connect and dynamically process multiple streams.Lesson 4
Work on your final project and we cover additional questions and topics brought up by Hackathon participants. Explore Vagrant, VirtualBox, Redis, Flask, and d3 further if you are interested!Final Project: Construct a Storm Topology
Design a Storm Topology and new bolt that uses streaming joins to dynamically calculate Top-N Hashtags and display real-time tweets that contain trending Top Hashtags. Post your visualization to the forum and tweet them to your Twitter followers.Project Extensions
Use additional features of the real-time Twitter sample stream or use any data source to drive your real-time d3 visualization.Prerequisite
Instructors
- Karthik Ramasamy - Karthik is the engineering manager and technical lead of Storm, the real-time analytics engine@Twitter. He has two decades of experience working in parallel databases, big data infrastructure and networking. He co-founded Locomatix, a company that specializes in real time streaming processing on Hadoop and Cassandra using SQL that was acquired by Twitter. Prior to Locomatix, Karthik was at Juniper Networks and Greenplum. At the University of Wisconsin, he worked extensively in parallel database systems, query processing, scale out technologies, storage engine and online analytical systems. Several of these research were spun as a company later acquired by Teradata. He is the author of several publications, patents and one of the best selling book "Network Routing: Algorithms, Protocols and Architectures". He has a Ph.D. in Computer Science from University of Wisconsin, Madison.
Editor
Twitter is an online social networking service that enables users to send and read short 140-character messages called "tweets". Registered users can read and post tweets, but those who are unregistered can only read them. Users access Twitter through the website interface, SMS or mobile device app.Twitter Inc. is based in San Francisco and has more than 25 offices around the world.
Twitter was created in March 2006 by Jack Dorsey, Evan Williams, Biz Stone, and Noah Glass and launched in July 2006. The service rapidly gained worldwide popularity, with more than 100 million users posting 340 million tweets a day in 2012. The service also handled 1.6 billion search queries per day. In 2013, it was one of the ten most-visited websites and has been described as "the SMS of the Internet". As of March 2016, Twitter has more than 310 million monthly active users.
Platform
Udacity is a for-profit educational organization founded by Sebastian Thrun, David Stavens, and Mike Sokolsky offering massive open online courses (MOOCs). According to Thrun, the origin of the name Udacity comes from the company's desire to be "audacious for you, the student". While it originally focused on offering university-style courses, it now focuses more on vocational courses for professionals.