Skip to the content.

Data Science Seminar

Grading

Please enroll in DATA 472 for 1 unit, on a CR/NC grading basis.

To earn a CR grade, you are expected to:

Readings

The theme of the readings for Fall 2021 is the past.

Week 1: September 22

Week 2: September 29

Hypothesis testing is a core part of many statistics classes. But where did the ideas such as the p-value, Type I error, and power come from? This reading reviews the turbulent history of hypothesis testing in the 20th century.

Required Reading:

Additional Resources:

Assignment:

In your assigned group of 6 students:

Week 3: October 6

In the early to mid 1900s, the field of eugenics - the idea that some groups or people are inherently genetically inferior - was a mainstream and well-respected scientific pursuit. Many of the foundational ideas of classic statistics were developed in conjunction with eugenics applications. In the modern era, now that these ideas have been rejected as racist/classist/etc, how should we regard the influential people and ideas that came out of that movement?

Required Reading:

Assignment:

We ask you to think carefully about the practice of re-contextualizing scientific contributions in light of modern ethics. Questions to consider:

In your assigned group of 6 students:

Week 4: October 13

We will look at the Frequentist vs. Bayes debate! Not only does this debate pertain to how you think and do your statistics and data science, but also to some of your ways of thinking every day! Please come prepared to defend both, although we want you will be asked to take a side in class.

Required Reading:

Here are some additional readings that might be of interest.

Assignment:

We ask you to think carefully about these two sides: Frequentist and Bayes. Questions to consider:

In your assigned group of 6 students:

Week 5: October 20

We will read a classic paper by Leo Breiman entitled “Statistical Modeling: The Two Cultures.” The PDF linked below also contains comments by leading statisticians and data scientists which will give you more ideas as you prepare your presentations.

Required Reading:

This paper by Efron can be seen as an update to Breiman’s paper 20 years later – it is not required reading but might be interesting for you to read over:

Assignment:

In your assigned group of 6 students:

Week 6: October 27

We will discuss the past and future of Artificial Intelligence (AI).

Required Reading:

Assignment:

In your assigned group of 6 students:

Week 7: November 3

We are starting a three-week stretch devoted to the ideas and technologies behind working with big data. Our first discussion is about relational databases, their history and their role in bringing forth our ability to work with large quantities of data. To that end, you will read two sets of articles.

The first set of articles gives you some historic perspective on the development of the relational data model and relational databases. The articles in this set are:

The Codd papers provide historic descriptions of the ideas behind the modern relational databases. The blog post puts some of the information contained in these papers in the overall context.

The second set of papers comes from a sequence of meetings conducted by the database research and industry community over the course of the late 20th and early 21st century. The meetings served as the community reflection points on the progress of the field of relational databases (and the field of databases in general) over the years. They also attempted to identify future challenges that the database technology and the database community needed to meet. The papers are co-authored by who’s who in the area of database management systems. The papers in this series are:

This is a lot of reading. Please, read the instructions below carefully.

The roles for this week’s assignment are:

Week 8: November 10

We will discuss the rise and “fall” of Hadoop… and future of Hadoop.

Required Reading:

Everyone should read in detail the following:

Assignment:

The “decline” of Hadoop is documented, and so it is not my intention to have you discuss why this happened. It is an interesting topic but not our focus. I’ll share a bit of my own experiences with Hadoop at the beginning of seminar since it came about during my graduate school days, and I’ve been along for the ride ever since.

In your assigned group of 6 students:

Week 9: November 17

Everyone needs to read the following:

In your assigned group of 6:

Everyone: be sure to look up terms and products that you’re reading about which you may not be familiar with (too old or discontinued). Teams need to be explain the terms in the papers that have to do with their subject, if asked during their presentations.