57  Introduction

The material in this module covers some basic topics that are relevant to understanding the mathematical and statistical aspects of data science. The topics are considered more review than exposition and are brought together here as a reference.

For a graduate-level course in data science, the material in this module is considered a prerequisite.

Chapter 58 discusses the concept of probability and random variables. Expected values of random variables and important univariate probability distributions are covered here.

In probability, the mechanism that generates events in sample spaces is presumed known and we calculate the probabilities of certain sets of outcomes. In statistics, we start with the data and ask questions about possible data-generating mechanisms (describing, estimating, testing). Chapter 59 reviews descriptive and inferential statistics (point and interval estimation), hypothesis testing, basic linear and logistic regression.

Linear algebra is key in representing statistical models and machine learning models. Chapter 60 introduces basic operations with vectors and matrices that are important in manipulating models. The expected value concept from Chapter 58 is extended to vectors of random variables. Projection matrices are used to demonstrate basic operations in linear models.

When building a statistical model we make decisions about the model structure and about how to estimate the model parameters. Chapter 61 discusses important estimation principles such as ordinary, weighted, and generalized least squares and maximum likelihood.

The final chapter, Chapter 62, brings together the basic concepts in this module in the context of the classical linear model \[ \textbf{Y} = \textbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]

under the usual assumptions about \(\boldsymbol{\epsilon}\).