The Art of Chance

A Beginner’s Guide to Probability

Author

Dennis Sun, Gene Kim, and Anav Sood

Published

October 29, 2024

Preface

This textbook was designed for a one-quarter undergraduate course in probability at Stanford University. Its goal is to introduce the concepts and applications of probability to a broad audience of scientists, social scientists, engineers, mathematicians, statisticians, and data scientists.

Why this Book?

There are already many introductory probability textbooks. Why another one?

The main reason we wrote this book was to make the subject accessible to more people. Most probability textbooks assume multivariate calculus and linear algebra, putting the subject out of reach for many practitioners who want to learn probability. This book only assumes single-variable calculus, which means that it is accessible to even an advanced high-schooler.

Another reason was the realities of the quarter system. We found it difficult to cover in 10 weeks the same material that schools on the semester system cover in 15 weeks. We needed to skip some topics, while still ensuring that the curriculum was coherent. Many existing textbooks are designed to be covered linearly, so it is not easy to skip around. This book represents a set of topics that we felt were essential to a first course on probability. We have tried to write this book in a modular way so that other instructors could skip around, if they wished.

Although we have omitted some technical topics, such as moment generating functions and limit theorems, for the sake of accessibility and time, we believe that this book still conveys the essence of probability. The intuition, the problem solving techniques, and the many applications are all here, as are the puzzles and paradoxes that make the subject lively.

We also made specific pedagogical decisions in the presentation of the material:

  • We first cover all of discrete probability, followed by all of continuous probability (although the book does not need to be read this way, see below). This means that every concept is treated twice, once for discrete random variables and again for continuous random variables.
    • We find that difficult concepts, such as joint distributions and conditional expectation, are easier to grasp if they are first introduced for discrete random variables, without the added complication of calculus.
    • One concern is that by separating discrete and continuous random variables, learners may fail to see the connections between them. To help make these connections, the structure of the continuous chapters mirror the structure of the discrete chapters, with explicit signposting in the continuous chapters directing the reader to the corresponding results for discrete random variables. Also, the final chapters, 26  From Conditionals to Marginals and 27  Conditional Expectations, feature examples that mix discrete and continuous random variables, preparing learners to apply concepts in both settings.
  • Calculus is de-emphasized in favor of arguments that offer more statistical intuition. For example:
    • The famous continuous families (uniform, exponential, normal) can all be derived from location-scale transformations of a single representative. So we only need to derive the expectation and variance of a single representative, and the general formula can be obtained using properties of expectation.
    • Although we write some double integrals for the sake of completeness, we show how geometry, symmetry, and conditioning allow us to avoid double integrals (or calculus altogether)! See 23  Joint Distributions and 24  Expectations Involving Multiple Random Variables for examples.
  • We favor models that are specified hierarchically (i.e., by specifying first the distribution of \(X\), then the conditional distribution of \(Y | X\)), rather than jointly. We dedicate two entire chapters, 16  From Conditionals to Marginals and 26  From Conditionals to Marginals, to the use of Law of Total Probability for such models, which are omitted in many textbooks or treated as an afterthought.
    • Hierarchical specifications are more common in statistics, especially in Bayesian statistics.
    • Hierarchical models provide a more natural way to describe “mixed” distributions that are neither discrete nor continuous.
  • Code snippets in the programming language R are integrated into the exposition of the book.
    • R is used to do simulations that motivate particular concepts.
    • R is used to perform calculations that are impractical to do by hand. In the online version of this book, the R snippets can even be run in the browser.

This book was written to prepare readers to think like statisticians. However, this book is purely a probability textbook. Unlike some probability textbooks at this level, we do not discuss estimation, sample moments, Bayesian priors and posteriors, Q-Q plots for checking goodness of fit, or linear regression. In our experience, introducing statistical concepts at this stage leads to confusion about the relationship between probability and statistics. However, we lay the probabilistic groundwork for students to pick up these statistical concepts later.

How to Use this Book?

For Instructors

Each chapter can be covered thoroughly in a 90-minute lecture or outlined in a 60-minute lecture. At Stanford, we cover this material in a 10-week quarter with three 50-minute lectures per week. Schools on a 15-week semester system would be able to cover this material more completely.

We have designed the book to be modular so that chapters can be read in any (reasonable) order. For example, one pedagogical decision is whether joint distributions should be covered before or after expected value. We have written those chapters so that they can be read in either order. The diagram below illustrates the dependencies between chapters in the discrete and continuous sections, in case you wish to cover the chapters in an order different from ours.

For Students

This book is meant to be read. We have tried to choose examples that you might find interesting.

Definitions, theorems, and examples are all in boxes. Proofs are included inside the box for the corresponding theorem. When a proof is not important, it is collapsed in the online version. We do not recommend that you read proofs that are collapsed, especially on a first reading, unless you are interested.

If you are reading this book online, you should run the code that is provided. Try modifying the code to see what it does.

Acknowledgements

Our colleagues provided feedback which improved this book, including John Duchi, Trevor Hastie, and Timothy Sun.

Several students also provided useful feedback, including Ricky Rojas.

The influence of teachers and colleagues who have shaped the way we teach probability is unmistakable. Thank you to Joe Blitzstein, Matt Carlton, Kevin Ross, and Allan Rossman.

We also acknowledge the support of a Curriculum Transformation Seed Grant from the Stanford Vice Provost for Undergraduate Education and Center for Teaching and Learning, which funded the writing of this book.