Model Building and Validation
link Source:
list 8 sequences
assignment Level : Intermediate
chat_bubble_outline Language : English
card_giftcard 1 point
Users' reviews
0 reviews

Key Information

credit_card Free access

About the content

This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions. All of these things are equally important and model building is a crucial skill to acquire in every field of science. The process stays true to the scientific method, making what you learn through your models useful for gaining an understanding of whatever you are investigating as well as make predictions that hold true to test. We will take you on a journey through building various models. This process involves asking questions, gathering and manipulating data, building models, and ultimately testing and evaluating them.

more_horiz Read more
more_horiz Read less


Lesson 1 - Introduction to the QMV Process

Learn about the Question, Modeling, and Validation (QMV) process of data analysis. Understand the basics behind each step and apply the QMV process to analyze on how Udacity employees choose candies!

Lesson 2 - Question Phase

We will drill in on the questioning phase of the QMV process. We’ll teach you how to turn a vague question into a statistical one that can be analyzed with statistics and machine learning. You will also analyze a Twitter dataset and try to predict when a person will tweet next!

Lesson 3 - Modeling Phase

Building upon lesson 2, you will learn how to build rigorous mathematical, statistical, and machine learning models so you can make accurate predictions. You look through the recently released U.S. medicare dataset for anomalous transactions.

Lesson 4 - Validation Phase

So how do you tell if your model is doing well? In this lesson, we will teach you some of the fundamental and important metrics that you can use to grade the performance of the models that you’ve build. You will analyze the AT&T connected cars data set and see if you can tell which driver is which by analyzing their driving patterns.

Final Project - Identify Hacking Attempts from Network Flow Logs

You will create a program that examines log data of net flow traffic, and produces a score, from 1 to 10, describing the degree to which the logs suggest a brute force attack is taking place on a server.


  • Don Dini - Don M. Dini has been practicing, teaching, and writing about data science for over ten years. He studied computer science and artificial intelligence at University of Illinois at Urbana-Champaign and University of Southern California. While at USC he was a lecturer in computer science and worked on applying AI to various real world problems, such as understanding city populations through simulation, and systems to provide security against unknown attackers, which have since been used at LAX, the US coast guard, among other institutions. Today Don is a data scientist at AT&T where he works on creating the next generation of communication networks, and creating models to understand human communication. In addition Don is an instructor of Kung Fu, and teaches classes in Palo Alto, CA.
  • Rishi Pravahan - Rishiraj Pravahan is a data scientist working for AT&T. Prior to joining AT&T. Rishiraj worked for the ATLAS experiment at CERN where he was part of the team that discovered the Higgs Boson. While at CERN, he worked on constructing, commissioning and calibrating the ATLAS detector as well as on software techniques to analyze the massive dataset from the Large Hadron Collider to search for new physics. He has also been a passionate teacher and advocate for science through public talks and seminars in the US, Europe, India and Latin America. His current work involves, understanding networks, privacy and security of customer data, collection, storage and analysis of sensor data and making advances in the frontiers of statistics and machine learning. In his spare time he loves to read, play pool, cook, travel and learn about other cultures.



Udacity is a for-profit educational organization founded by Sebastian Thrun, David Stavens, and Mike Sokolsky offering massive open online courses (MOOCs). According to Thrun, the origin of the name Udacity comes from the company's desire to be "audacious for you, the student". While it originally focused on offering university-style courses, it now focuses more on vocational courses for professionals.

You are the designer of this MOOC?
What is your opinion on this resource ?