DATASCI 112: Principles of Data Science

Dennis Sun, Stanford University, Winter 2023


Course Info

Staff

Professor: Dennis Sun (dlsun@stanford)
  • Office: Sequoia 124
  • Office Hours (in person and on Zoom):
    • Mon 3 - 4 PM
    • Wed 11:30 AM - 12:30 PM
Teaching Assistants:
  • Amber Hu (amberhu@stanford)
    • Section (02/03): Tues, Thurs 9:30 - 10:20 AM in 380-380Y
    • Office Hour: Wed 1:30 - 2:30 PM in Sequoia 207 (Bowker)
  • Rahul Kanekar (rkanekar@stanford)
    • Section (04): Tues, Thurs 10:30 - 11:20 AM in Gates B12
    • Office Hour: Tues 1:30 - 2:30 PM in Sequoia 105 (Library)
  • Ran Xie (ranxie@stanford)
    • Section (05/08): Tues, Thurs 10:30 - 11:20 AM in 380-380Y
    • Office Hour: Mon 9:30 - 10:30 AM in Sequoia 207 (Bowker)
  • Ben Seiler (bbseiler@stanford)
    • Section (06): Tues, Thurs 4:30 - 5:20 PM in 160-326
    • Office Hour: Fri 1 - 2 PM in Sequoia 220 (Fishbowl)
  • Michael Howes (mhowes@stanford)
    • Section (07/09): Tues, Thurs 4:30 - 5:20 PM in 50-51P
    • Office Hour: Tues 3:15 - 4:15 PM in Sequoia 220 (Fishbowl)
  • Sophia Lu (sophialu@stanford)
    • Office Hours:
      • Thurs 1:30 - 2:30 PM in Sequoia 207 (Bowker)
      • Fri 9:30 - 10:30 AM in Sequoia 220 (Fishbowl)

Course Description

A hands-on introduction to the principles and methods of data science. This course is designed to equip you with tools to begin extracting insights and making decisions from data in the real world, as well as to prepare you for further study in statistics, machine learning, and artificial intelligence. We will analyze and visualize data of different shapes and sizes (e.g., tabular, textual, hierarchical, geospatial). We will discuss common patterns and pitfalls of data analysis. We will build and evaluate machine learning models, focusing on general concepts (rather than specific methods), such as supervised vs. unsupervised learning, training vs. testing error, hyperparameter tuning, and ensemble methods. The focus will be on intuition and implementation, rather than theory and math. Implementation will be in Python and Jupyter notebooks, using libraries such as pandas and scikit-learn. This course culminates in a project where you apply the ideas to a data science problem of your choosing.

This course satisfies the WAY-AQR requirement.

Who Should Take This Class?

This course is designed for undergraduates, particularly freshmen and sophomores.

  • This is a hands-on class where you will be analyzing real data. To do this, you need to know how to code (preferably in Python). So CS 106a is a prerequisite. You don't need to be a very experienced programmer; if you know how to write a for loop and use a dict in Python, you'll likely be fine. But we won't review basic programming concepts.
  • You need to be comfortable doing basic math. We'll be aggregating, transforming, and comparing numbers every day in this class. But there will be no calculus, no linear algebra. We will focus understanding the intuition behind data science methods, rather than the math.

This course is a good fit for the following audiences:

  • freshmen or sophomores who are considering majoring in Data Science
  • students who want to know what Data Science is and how it applies to the real world
  • students needing to pick up practical Data Science skills for an internship
  • undergraduates looking to fulfill the WAY-AQR requirement

This course is likely not a good fit for students with experience in data science and/or machine learning already. You'll almost certainly pick up some useful skills, but you might find the pace of the class slow. Consider taking STATS 216 instead.

Class Structure

There will be class every weekday:

  • On Mondays, Wednesdays, and Fridays, we will meet as an entire class for lecture. In lecture, the professor will introduce data science concepts, do some coding demos, and assign some exercises for you to try at home.
  • On Tuesdays and Thursdays, you will meet with a TA in small sections. In section, you will present and discuss solutions to the exercises posed in lecture the day before.
    • If you have a one-time conflict with your section and want to attend another section, please e-mail both your section leader and the leader of the section you want to attend.
    • If you are unable to come to campus one day (e.g., illness, sports travel), please e-mail your section leader with documentation.

Attendance is an essential part of the learning experience. Therefore, it counts towards your participation grade (see below).

Grading

  • Weekly Labs: 15%
  • 2 Exams: 35% (15% each, with 5% added to your higher exam)
  • Final Project: 40%
  • Participation: 10%

About the Participation Grade: There are several ways to earn this participation grade:

  1. presenting your solutions in section
  2. answering other students' questions on the Ed Discussion board