Diary of a Data Science Student

How to Avoid Cabin Fever in the Age of COVID-19

Motivation + Primary Questions of Interest

It is difficult to reflect upon the age of COVID-19, as we have not yet left it. Regardless, this time will surely be known as a global turning point. As a college student, my semester was abruptly disrupted when I was told to leave campus and finish the semester remotely from home. Since I attend Amherst College, a small liberal arts institution that emphasizes the importance of small, in-person classes for fostering an intimate, intellectual environment, this was a difficult transition for various reasons. I function in my school environment and home environment very differently, so I knew that I would have to make some changes to my home environment in order to retain my motivation and focus for my schoolwork. I decided that creating a strong sense of structure would be most effective for my learning, since my days on campus were very structured. In order to maintain this structure, I decided that I would create a schedule every morning when I wake up, and try my best to stick to it. I was also curious about how well I would stick to this “intended schedule”, so I created an “actual schedule”, as well, in which I logged all of my actual activity after it had been completed. I did this for one week. Afterwards, I studied how I intended to spend my time and how I actually spent my time.

Data collection

I collected data by . . .

Creating two separate Google calendars, one titled “Intended Schedule”, and one titled “Actual Schedule”.
Logging the activity for my “intended schedule” for each day every morning when I woke up.
Logging the activity for my “actual schedule” every time I finished completing an activity.

I wrangled the data by . . .

Importing the calendars as data frames from Google calendar into R Studio
Creating variables for length of time spent on activities, category of activity, etc.
Assigning specific activities to broader categories for classification.

Results

And here are my results . . .

Initial round of visualizations

How much time did I intend to spend each day on each activity?

How much time did I actually spend each day on each activity?

During my initial round of visualizations, I was mainly concerned with the breakdown of time spent on each of the categories for every day of the week.

I first created bar charts for each data set with date on the x-axis, and total time in minutes on the y-axis. I set fill to equal the category variable so that the colors of the bars would correspond to the amount of time spend on each kind of activity.

I then did the same thing with the addition of the coord_polar() function, to turn the bar charts into circles.

The third group of visualizations at this step also consisted of bar charts, but they were now faceted by category, rather than “filling” by category. This allows you to see the time spent on individual categories alone.

Intended Time Spent on Individual Categories as
barplots Actual Time Spent on Individual
Categories as barplots

Second Round of Visualizations

For the next set of visualizations, I decided to calculate and graph proportions. I wanted to see the proportion of time spent on certain activities out of the total time in a week (for both the intended schedule and the actual schedule). I calculated these proportions and placed them into new data sets. I then combined the intended and actual data sets by using the bind_rows() function. This is how I generated visuals comparing the proportions for the intended schedule vs. the actual schedule. I first generated two different data tables and corresponding bar charts to show the proportions within each category (for intended vs. actual). I was mostly interested in the proportion of time spent on school/work and exercise, so these are the categories I decided to focus on. For these bar plots, the x-axis is the type of schedule and the y-axis is the proportion of time spent on the designated category of activity. I chose to group the bars instead of stacking them because I felt that this was easier to view.

Summary

For this project, I was most interested in studying how I intend to spend my time during quarantine compared to how I actually spend my time during quarantine. To collect this data, I first created two separate calendars using Google Calendar. I logged my activity for one week. Every morning during this week, I would create my intended schedule for the day. Then, whenever I actually completed an activity, I would log this activity onto the “actual schedule”. The title of each activity was eventually listed under the “summary” variable in the data set. When the week was over, I imported the two calendar data sets into Studio by using the ical package. After importing the data sets, I began to wrangle them. Because I was interested in studying how I spend my time across many categories, this was the most important step of the wrangling. I placed the observations into enough categories so that the categories were specific (and distinguishable from each other), but also so that there were not too many. The case_when() function and str_detect() functions were the most helpful during this process. I used to str_detect() function to identify keywords in the “summary” variable, like “HW”, and the case_when() function was used to place the observations with these criteria into a category, like “School/work”. The category “School/Work” encompasses activities like classes, homework, and work for on-campus jobs that I still hold. Once the data was wrangled, I began to generate proportions and visualizations. Since my main question is generally “How do I actually spend my time compared to how I intend to spend my time?”, I first created a simple bar chart with date on the x-axis, time (in minutes) on the y-axis, and fill by category, for the intended data set and the actual data set separately. I then compared them side by side. I saw the breakdown of my time spent for each day of the week across all of the categories (told apart by color). I then decided to turn these two bar charts into circles using the coord_polar() function. Just by studying these charts, it appeared that I spent more time creating art than I intended to, and that I spent less time doing school/work than I intended to. Regardless, I was not satisfied by just the bar charts, especially because the categories were stacked, making them a bit harder to understand. Therefore, I decided to compute proportions. Since I was mostly interested in analyzing the difference in amount of time spent on exercise and school/work (among all of the categories) for the intended schedule compared to the actual schedule, these are the categories I chose to focus on. I generated these proportions of interest by creating new data sets and them joining them by binding their rows. For example, in order to compute the school/work proportions, I first filtered the category data sets that I was previously working with (intended and actual) so that only the observations that were classified as “school/work” were included. I then divided the length_min_daily variable (how much time per day was being devoted to this activity in minutes) by 1440, the total number of minutes in a day. I followed the same procedure for the exercise observations. Eventually, I generated a data set of the school/work proportions for both the intended schedule and the actual schedule, and I generated another data set for the exercise proportions. I then joined the actual and intended data sets. I used these data sets to plot bar charts with date on the x-axis, proportion on the y-axis, and fill by schedule type. After studying these tables, it seems that for most days of the week, I spent less time on school/work than I meant to. As for exercise, there are some days in which I exercised more than I intended to, and some days in which I exercised less than I intended to.
After studying my calculations and visualizations, I began to think about potential reasons for the results that I found. For example, the data revealed that I actually spent more time on art than I intended to, but less time on school/work than I intended to. Given the stress of the global pandemic and remote learning, I believe that I unintentionally spent more time creating art because it was more stress-relieving and therapeutic than completing school/work. Nonetheless, school/work was still the most represented category in the actual schedule data set. If I were to continue this study, I would be interested in seeing how I spend my time when I am not trying to follow an intended schedule. I think that because I was creating an intended schedule every morning, I felt more inclined to follow this schedule in order to feel productive and accomplished. Therefore, I would predict that if I were to fill out another calendar (with no prior planning) it would differ greatly from the intended and actual calendars used for this project. I would be very interested to see these results.

References

Thank you to Albert Kim (Smith College) and Johanna Hardin (Pomona College) for the Google Calendar project idea. They credit Roger Peng’s and Hilary Parker’s Not So Standard Deviations podcast titled “Compromised Shoe Situation” (http://nssdeviations.com/size/5/?search=shoe), in which they discuss a data science design challenge on getting to work on time, for the inspiration.
Thank you to Professor Correia (Amherst College) who inspired this project and helped me throughout the course of its completion!

Welcome to my Data Science Diary!

By Stephanie Masotti