The Art of Chance
A Beginner’s Guide to Probability
Preface
This textbook was designed for a one-quarter undergraduate course in probability at Stanford University. Its goal is to introduce the concepts and applications of probability to a broad audience of scientists, social scientists, engineers, mathematicians, statisticians, and data scientists.
Why this Book?
There are already many introductory probability textbooks. Why another one?
The main reason we wrote this book was to make the subject more accessible to more students. Most probability textbooks assume multivariate calculus and linear algebra, putting the subject out of reach for many practitioners who want to learn probability. This book only assumes single-variable calculus, which means that it is accessible to even an advanced high-schooler.
Another reason was the realities of the quarter system. We found it difficult to cover in 10 weeks the same material that schools on the semester system cover in 15 weeks. We needed to cut some topics, while still ensuring that the curriculum was coherent. This book represents a consensus curriculum that could be comfortably covered in one quarter.
Although we have omitted some technical aspects of probability for the sake of accessibility, we have not sacrificed any of its essence. The intuition, the problem solving techniques, and the many applications are here, as are the puzzles and paradoxes that make the subject lively.
We also made specific pedagogical decisions in the presentation of the material:
- We first cover all of discrete probability, followed by all of continuous probability (although the book does not need to be read this way, see our comments below). This means that every concept is treated twice, once for discrete random variables and again for continuous random variables.
- We find that difficult concepts, such as joint distributions and conditional expectation, are easier to grasp if they are first introduced for discrete random variables, without the added complication of calculus.
- One concern is that by separating discrete and continuous random variables, learners may fail to see the connections between them. To help make these connections, the structure of the continuous chapters mirror the structure of the discrete chapters, with explicit signposting in the continuous chapters directing the reader to the corresponding results for discrete random variables. Also, the final chapters, 26 From Conditionals to Marginals and 27 Conditional Expectations, feature examples that mix discrete and continuous random variables, preparing learners to apply concepts in both settings.
- Calculus is de-emphasized in favor of arguments that offer more statistical intuition. For example:
- The famous continuous families (uniform, exponential, normal) can all be derived from location-scale transformations of a single representative. So we only need to derive the expectation and variance of a single representative, and the general formula can be obtained using properties of expectation.
- Although we write some double integrals for the sake of completeness, we show how geometry, symmetry, and conditioning allow us to avoid double integrals (or calculus altogether)! See 23 Joint Distributions and 24 Expectations Involving Multiple Random Variables for examples.
- We favor models that are specified hierarchically (i.e., by specifying first the distribution of \(X\), then the conditional distribution of \(Y | X\)), rather than jointly. We dedicate two entire chapters, 16 From Conditionals to Marginals and 26 From Conditionals to Marginals, to the use of Law of Total Probability for such models, which are omitted in many textbooks or treated as an afterthought.
- Hierarchical specifications are more common in statistics, especially in Bayesian statistics.
- Hierarchical models provide a more natural way to describe “mixed” distributions that are neither discrete nor continuous.
- Code snippets in the programming language R are integrated into the exposition of the book.
- R is used to do simulations that motivate particular concepts.
- R is used to perform calculations that are impractical to do by hand. In the online version of this book, the R snippets can even be run in the browser.
This book was written to prepare readers to think like statisticians. However, this is purely a probability textbook. Unlike some probability books at this level, we do not discuss estimation, Bayesian priors and posteriors, or linear regression. In our experience, this increases the already considerable confusion about the relationship between probability and statistics. However, we lay the groundwork for students to understand these statistical concepts later.
How to Use this Book?
For Instructors
Each chapter can be covered thoroughly in a 90-minute lecture or sketched in a 60-minute lecture. At Stanford, we cover this material in a 10-week quarter with three 50-minute lectures per week. Schools on a 15-week semester system would be able to cover this material more thoroughly.
We have designed the book to be modular so that chapters can be read in any (reasonable) order. For example, one pedagogical decision is whether joint distributions should be covered before or after expected value. We have written the chapters so that they can be read in either order. The diagram below illustrates the dependencies between chapters in the discrete and continuous sections, if you wish to cover the chapters in an order other than the one we have laid out.
Acknowledgements
Our colleagues provided feedback which improved this book, including Trevor Hastie and Timothy Sun.
Several students also provided useful feedback, including Ricky Rojas.
The influence of teachers and colleagues who have shaped the way we teach probability is unmistakeable. Thank you to Joe Blitzstein, Matt Carlton, Kevin Ross, and Allan Rossman.
We also acknowledge the support of a Curriculum Transformation Seed Grant from the Stanford Vice Provost for Undergraduate Education and Center for Teaching and Learning, which funded the writing of this book.