Google Developers Codelabs provide a guided, tutorial, hands-on coding experience. Most codelabs will step you through the process of building a small application, or adding a new feature to an existing application.
Data

Teaching Data Science

Data science is an interdisciplinary field that combines (advanced) maths, statistics, programming, and specific domain knowledge. Besides knowledge in these areas, successful data scientists should also possess the ability to think with data — asking the right questions, framing problems properly, breaking down complex problems into manageable analyses, and more generally, having good data intuition.

How might we help students master critical skills for data scientists?

The Rise of Active Learning

Let’s face it. Traditional lectures aren’t always effective.

Most of us took a statistics class in college, but can you still explain important concepts like 95% confidence intervals? Let alone using them in your daily job. (and by the way, you should!) Many aspiring data scientists have completed a number of online machine learning courses taught by world-renowned professors, yet they struggle to produce useful models from real data that provide valuable insights, address important business questions, or create promising data products.

Research shows that students learn better by doing (aka Active Learning).

“[…] students must do more than just listen: They must read, write, discuss, or be engaged in solving problems. Most important, to be actively involved, students must engage in such higher-order thinking tasks as analysis, synthesis, and evaluation”

— Active Learning: Creating Excitement in the Classroom

It’s been almost two years since I quitted my job as a data scientist at Facebook and started teaching data science in Thailand. I have been experimenting with my instructional strategies in search of approaches that best meet the learning needs of the students.

In this blog post, I’d like to discuss some of the active learning activities I have tried or created for my classes and workshops.

1. Codelabs

As a Google Developer Expert, I use Google Codelabs quite a lot in my workshops.

Google Developers Codelabs provide a guided, tutorial, hands-on coding experience. Most codelabs will step you through the process of building a small application, or adding a new feature to an existing application.

To create my own poor man’s codelabs, I usually use Jupyter Notebook which allows adding Markdown text to describe syntax or provide instructions. Jupyter Notebook is already considered the gold standard in the data science community. It supports both Python and R. You can find tons of tutorials in Jupyter Notebook format.

codelab practicum class

A codelab I prepared for Data Science Practicum class at Chulalongkorn Business School

These codelabs are great when they are short and not too difficult. Students can work at their own pace and will feel accomplished as they finish something all by themselves.

However, students need to be highly motivated and dedicated. Because it requires a lot of reading, it could get boring very quickly. If the solutions are provided in the notebooks, they will start running all the code to see the final output without following instructions nor trying to understand how the code works.

Lastly, if the code is somewhat complicated, make sure you have TAs or facilitators walking around to assist with any issues that may come up. It could be anything from unclear instructions to not having the environment properly set up (which could be really complicated to resolve especially for Windows users).

Pro tip: Recently, Google has launched Google Colaboratory, a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud. It even lets you run TensorFlow computation on GPU! Most the commonly used packages are readily available, and I no longer have to worry about my students not having a proper environment set up before class.

codelab Google Colaboratory

A codelab on Google Colaboratory I prepared for Data Science and Data Engineering class at Chulalongkorn Engineering School

2. Live Coding Lessons

Live coding lessons are great for learning new commands or syntax. Students can play around and get real-time feedback on whether they got the right answers or what they might have missed. Through those hints, they can learn from their mistakes. Students usually find these live coding lessons very enjoyable.

There are many online platforms providing live coding lessons, but DataCamp stands out when it comes to data-related courses.

datacamp r course

DataCamp’s free R course that I translated into Thai for my students to study before class. https://www.datacamp.com/courses/2589/ (Update: DataCamp has updated their system, and this class is no longer functioning.)

3. Interactive Data Visualizations

A picture is worth a thousand words, but it’s not easy to draw and animate things in PowerPoint or Keynote presentation. (Well, it’s not too hard I’d say, but teachers simply don’t have the time!) There are many interactive visualizations explaining complex concepts in statistics and machine learning available on the Internet. I provide here two examples. (Let me know your favourites in the comment!)

Students usually don’t read instructions carefully and may miss some key learning steps, so make sure you have a debrief session or give them some quizzes to complete while playing with the visualizations.

4. Interactive Playgrounds

Long text and equations generally discourage students from learning, and one way to help students establish intuition is to let them explore the problem themselves. Interactive playgrounds are great for this. You give your students a mission to complete and let them go through a process of trial and error.

One of the best examples is Google’s TensorFlow playground, which lets you tinker with a neural network right in your web browser. You can find a list of guided playground exercises from Google’s Machine Learning Crash Course here.

For my class, I built a simple command-line tool that encourages students to think about feature engineering.

Drama addicted, Jeab

How might we distinguish posts from two popular pages in Thailand, อีเจี๊ยบ เลียบด่วน and Drama-addict?

A playground I created for Computer Engineering Essentials class at Chulalongkorn Engineering School

5. Group Activities

Group activities are fun! It helps students internalize the concepts through hands-on experience. Students also benefit from discussions with their teammates along the way.

To teach students without statistics background about sampling and confidence intervals, I asked them to estimate the proportion of red M&M’s from samples of various sizes and calculate the 95% confidence intervals. In the end, they would find that roughly 95% of all the confidence intervals generated contain the true parameter and (hopefully) remember this correct interpretation of the 95% confidence interval.

M&M counting activity

M&M counting activity for Intro to Data Science class, a general elective at Chulalongkorn University

In another class, to emphasize the importance of “good data” in machine learning, students were asked to collect data (as video clips), prepare data (as image files), label images, and use transfer learning to build an image classifier. Most would find that their classifiers didn’t work well the first time and need to collect more images from the missing angles to further improve the performance.

collecting-data

Collecting data

preparing-data-for-traning-model

Preparing data for training a model

Making predictions from the trained image classifier

Making predictions from the trained image classifier

The only drawback of group activities is that they could be quite time-consuming.


Besides the active learning activities presented above, working on actual projects is definitely a must. It’s the only way for students to connect all the dots and gain expertise. Just make sure you are a good “coach” helping students think through their problems, as opposed to simply telling them what to do.

If you teach data science, let me know what you think or share any fun activities you use in your classes.

[1] Bonwell, C. & Eison, J. (1991). Active Learning: Creating Excitement in the Classroom, ERIC Clearinghouse on Higher Education, Washington, D.C.http://ericae.net/db/edo/ED340272.htm

Ta Virot Chiraphadhanakul
Google Developer Expert in Machine Learning. A data nerd. A design geek. A changemaker.  —  Chula Intania 87, MIT Alum, Ex-Facebooker

    You may also like

    data-driven-with-moneyball-theory
    Data

    MoneyBall Theory ถอดบทเรียนชัยชนะแห่งศตวรรษด้วย Data

    สำหรับใครหลายๆ คนมักจะคิดว่าการใช้ Data นั้นจะถูกจำกัดไว้อยู่เพียงแค่กับการทำธุรกิจ หรือการทำวิจัยเท่านั้น แต่ความเป็นจริงแล้วข้อมูลสามารถใช้ในการวิเคราะห์ได้หลากหลายสิ่งมากๆ และอยู่ได้ในแทบทุกวงการ แม้กระทั่งกับวงการกีฬาเองที่การวิเคราะห์ข้อมูล และใช้ Data ก็สามารถทำให้ทีมได้ชัยชนะได้ไม่ยาก ย้อนกลับไปในช่วงก่อนปี 2002 วงการเบสบอลในสหรัฐอเมริกาจะมีแมวมองไปดูตามโรงเรียนมัธยมต่าง ๆ ...
    Data

    สร้าง Profile สาย Data ยังไงดี เมื่อบริษัทไม่ได้มองหาแค่คนมีสกิล?

    อยากเป็น Data Analyst แต่ไม่มีประสบการณ์ จะเก็บโปรไฟล์ยังไงดี? หลายคนที่กำลังเรียน หรือกำลังสนใจจะเรียน Data Analytics อาจมีความกังวล เพราะแม้เราจะมีสกิลครบตามตำแหน่งงาน (SQL, Spreadsheets, Business Intelligence ...

    More in:Data

    อัปสกิล Data เปลี่ยน HR ธรรมดาสู่ “People Analyst” | Skooldio Blog Data

    อัปสกิล Data เปลี่ยน HR ธรรมดาสู่ “People Analyst”

    ในแต่ละบริษัท ฝ่าย Human Resource ถือเป็นอีกทีมหนึ่งที่เป็นกุญแจสำคัญสู่ความสำเร็จขององค์กร เพราะถือเป็นฝ่ายที่ดูแลทรัพยากรที่สำคัญที่สุดขององค์กร นั่นก็คือ “ทรัพยากรมนุษย์” เพราะฉะนั้นหากการตัดสินใจของทีม HR เป็นการตัดสินใจด้วยความรู้สึก หรือใช้ Gut Feeling อยู่ละก็ ...
    รู้จักอาชีพสุดฮอตปี 2022 Sale Analyst ตำแหน่งที่ทุกบริษัทตามหา | Skooldio Blog Data

    รู้จักอาชีพสุดฮอตปี 2022 Sales Analyst ตำแหน่งที่ทุกบริษัทตามหา

    ทุกคนเคยสังเกตไหมว่า  Sales  ที่ประสบความสำเร็จไม่ได้เสนอขายสินค้าให้กับทุกคนที่เดินผ่าน  ไม่ได้โทรหาทุกคนที่เขามีเบอร์โทรศัพท์ และไม่ได้ส่งอีเมลหาลูกค้าทุกคน เพราะอะไรเขาถึงปล่อยโอกาสการเข้าถึงลูกค้าไปแบบนั้น?  จริงๆ แล้ว การเสนอขายสินค้าหรือบริการนั้น ถ้าทำแบบไม่มีแบบแผนก็คงจะเหนื่อย และสิ่งที่แย่ไปกว่านั้นคือ อาจจะขายได้ไม่คุ้มเหนื่อยก็ได้  เพราะเวลา และแรงของ Sales ...
    5 Python Libraries ที่คนทำงานสาย Data ควรรู้จัก | Skooldio Blog Data

    5 Python Libraries ที่คนทำงานสาย Data ควรรู้จัก

    Python คือหนึ่งในภาษาการเขียนโปรแกรมที่ได้รับความนิยมไปทั่วโลก เพราะเป็นภาษาที่มีความสามารถรอบด้านไม่ว่าจะเป็นภาษาที่ใช้พัฒนา Web Application, การพัฒนา Backend และอื่นๆ รวมถึงการทำ Data Analytics และ Machine Learning เอง ...

    Comments are closed.