Readings for the Data Science Seminar

Project maintained by dlsun Hosted on GitHub Pages — Theme by mattgraham

This is a seminar for the Cal Poly Data Science Fellows. We will meet weekly over Zoom on Thursdays. There are two sessons: at 10 AM and at 3 PM. You only need to attend one of these sessions. Each week there will be some readings relevant to data science, which we will discuss for 30 minutes. Afterwards, individual fellows will give updates on their research projects.

Please enroll in STAT 400 for 1 unit, on a CR/NC grading basis.

To earn a CR grade, you are expected to:

- attend all seminars
- complete readings and post a question or comment about the reading to the Slack channel

The theme of the readings for Fall 2020 is **the past**.

No reading

Hypothesis testing is a core part of many statistics classes. But where did the ideas such as the p-value, Type I error, and power come from? This reading reviews the chaotic history of hypothesis testing in the 20th century.

- Chapters 10 and 11 from Salsburg, D. (2001).
*The lady tasting tea: How statistics revolutionized science in the twentieth century*. Macmillan.

We will look at how hypothesis testing has influenced other disciplines and the controversy this has caused.

- Cohen, J. (1994). The earth is round (p < .05). American psychologist, 49(12), 997.
- Gill, J. (1999). The insignificance of null hypothesis significance testing.
*Political research quarterly*, 52(3), 647-674.

We will look at the Frequentist vs. Bayes debate. Next week, you will be randomly assigned to defend either frequentism or Bayesianism. Please come prepared to defend both, although we want you to take a side in your Slack post this week.

- New York Times Article: The Odds, Continually Updated
- Frequentist and Bayesian Approaches in Statistics
- Efron, B. (2005). Bayesians, frequentists, and scientists.
*Journal of the American Statistical Association*, 100(469), 1-5. - xkcd comic on Frequentists vs. Bayesians

Here are some additional readings that might be of interest.

- MIT Lecture Notes: Comparison of frequentist and Bayesian inference.
- Efron, B. (1986). Why isn’t everyone a Bayesian?
*The American Statistician*, 40(1), 1-5. - Andrew Gelman’s blog post: Why I Don’t Like Bayesian Statistics. (This is an April Fools’ Joke written by a famous Bayesian statistician. However, it contains some good ideas.)

A survey of the history of the connection between early statistics and the eugenics movement.

- Scientific Priestcraft From “Superior: The Return of Race Science” chapter 3

Bonus reading from last week: A metaphor for the difference between randomness and unknown-ness, and the consequences in Bayesian analysis.

- The Boxer, the Wrestler, and the Coin Flip: A Paradox of Robust Bayesian Inference and Belief Functions Andrew Gelman, The American Statistician, May 2006, Vol. 60, No. 2

We will examine the history of AI winters.

- Analyzing the Prospect of an Approaching AI Winter Sebastian Schuchman, May 3, 2019

We will discuss the Turing Test and whether machines can be intelligent.

- Computing Machinery and Intelligence Alan Turing, Mind, October 1950, Vol. 59, No. 236
- Is the Brain a Good Model for Machine Intelligence? Rodney Brooks, Demis Hassabis, Dennis Bray, and Amnon Shashua, Nature, Feburary 2012, Vol 482

Questions to consider:

- Is the Turing Test a good test of whether a machine is intelligent?
- Are today’s systems (like IBM Watson, Google’s image recognition systems, etc.) intelligent? Could a convolutional neural network be intelligent?
- In trying to achieve machine intelligence, should we try to mimic the brain or should we apply a pure engineering approach?
- Is the brain just a computer – could its functionality be replicated by a machine?