Key Information
About the content
Learn both theory and application for basic methods that have been invented either for developing new concepts – principal components or clusters, or for finding interesting correlations – regression and classification. This is preceded by a thorough analysis of 1D and 2D data.
Syllabus
Week 1. Intro: Examples of data and data analysis problems; visualization.
Week 2. 1D analysis. Feature scales. Histogram. Two common types of histograms: Gaussian and Power Law. Central values. Minkowski distance and data recovery view. Validation with Bootstrap.
Week 3-4. 2D analysis cases:
(Both quantitative: Scatter-plot, linear regression, correlation and determinacy coefficients: meaning and properties. Both nominal: Contingency table, Quetelet index, Pearson chi-squared coefficient, its double meaning and visualization).
Week 5-6. Learning multivariate correlations
(Bayes approach and Naïve Bayes classifier with a Bag-of-words text model; Decision trees and criteria for building them.)
Week 7. Principal components (PCA) and SVD
(SVD model behind PCA: student marks as the product of subject factor scores and subject loadings. Application to deriving a hidden underlying factor. Data visualization with PCA. Conventional PCA and data normalization issues.)
Week 8. Clustering with k-means
(K-Means iterations and K-Means features
K-Means criterion. Anomalous clusters and intelligent K-Means.)
Instructors
- Boris Mirkin - Department of Data Analysis and Artificial Intelligence
Content Designer

Platform

Coursera is a digital company offering massive open online course founded by computer teachers Andrew Ng and Daphne Koller Stanford University, located in Mountain View, California.
Coursera works with top universities and organizations to make some of their courses available online, and offers courses in many subjects, including: physics, engineering, humanities, medicine, biology, social sciences, mathematics, business, computer science, digital marketing, data science, and other subjects.