Courtesy of Amit Mizrahi

August 29, 2017

Cornell Data Science Updates its Student-Led Training Course

Print More

Cornell Data Science’s formerly unofficial, completely student-led training course is now a fully accredited class with enticing updates.

Last January, the Cornell Data Science project team launched an unofficial student-taught and student-conceived statistical method and programming course. This fall, the course returns with major improvements.

The course puts students first — both as teachers and learners — and offers an immersive experience in the “buzz word” world of data science, Chase Thomas ’19 told The Sun.

Keeping with its student-centered focus, course evaluations generated a new wave of progress for this semester’s training course, now formally titled INFO 1998: Introduction to Data Science and Machine Learning.

According to Thomas, the Cornell Data Science student president, the first training course was “a huge success.”

“We expected no more than 20 or 30 enrollments and almost 150 students added the course,” he said. “There’s clearly a lot of data science demand that wasn’t being met.”

Despite its impressive student interest and positive feedback, however, Thomas and his team wanted to do better for Cornellians.

At its core, the Cornell Data Science team works with the “real” — real data, real programs, real problems — on a digital level. Its course updates reflect this effort to stay in touch with the data science industry and its students needs.

“Data Science is such a huge and vast topic,” Thomas said, “We tried to tackle too much in one semester. We weren’t able to answer every question. We’re students too, we got a bit overwhelmed.”

For the second semester, the Data Science team reimagined its approach and format.

Foremost, the student group increased teacher and tutor access: the course now employs two lecturers rather than one and six TAs rather than four.

“Mainly, we realized we needed to offer a lot more support,” said Dae Won Kim, grad, a course instructor. “We needed to expand office hours.”

Last semester’s session passed a dedicated 120-student group — as a result, the Data Science team will quadruple office hours this fall.

“The main goal of the updates is to help students intuitively understand data science in an approachable way,” Kim said. “We want students to leave with a starter tool kit to implement programming models.”

In another major change, the Data Science project team reassigned its programming language from R to Python.

The switch makes the course basics a bit more accessible to its typical enrollee — computer science majors — Kim said.

“We found that a huge proportion of our students came from computer science backgrounds where they’re more familiar with an object oriented system like Python rather than the R programming language,” Kim told The Sun.

According to Jared Lim ’20, Cornell Data Science’s Education lead, “Python is more popular and extendable.”

Lim hopes to help students emerge from the program prepared to take on real-world projects.

Beyond simple slide rearrangements and increased office hours, the updated course offers a new path to the data science study.

“We’ve reduced the overall course material so that students can really know what they learn,” Lim said. “We go more in depth into the when and why of programming techniques rather than focusing only on the how. We want students to know how to do it but it’s important to understand why we do it.”

As with last semester, students — even those with no prior experience — begin coding on day one. After a semester long practice with manipulating data, debriefing trends, and implementing algorithms, each student gains what Kim calls “data science intuition.”

The Cornell Data Science Training Program is a one-credit, 12-week course. It meets every Wednesday night in Gates Hall.