Description

Goals

The goals of the course project are to:

  1. Collaborate with peers to demonstrate mastery of course goals by crafting a complete data story from research questions and datasets of your choosing.
  2. Seek out a challenge, eg, explore new territory or delve deeper into a concept you’ve already encountered.

Expectations

Below are some expectations for the project:

Team Size

You are expected to work in a team of 3 students (or 2 if not possible).

Domain Area

Each team is free to select any domain area for their project, ranging from agriculture and biology to education and e-sport.

Report Structure

Your report should include the following sections, all within context.

Motivation

  • What is the story behind the project?
  • Why were you interested in the project?
  • What were you motivation?
  • Why is the project important?

Research Question

  • What research question(s) are you trying to answer?
  • What are the key goals of the project?
  • What ideas do you most want to communicate?

Background

  • What background information is necessary for target audience to understand?
  • What assumptions, terms, and/or acronyms need to be clarified?

Data

  • Data collection
    • What was collected?
    • When was it collected?
    • Why was it collected?
    • How was it collected originally?
    • Who collected it?
  • Data acquisition
    • Where / how did you get the data?
    • What is the source?
  • Data understanding
    • How much data do you have?
    • What types of measurements?
    • Anything you needed to clean before getting started?

Data Insights

It’s your job to explicitly identify and discuss key insights. Don’t simply present the audience with some code and output and expect them to do that work. Specifically address the following questions:

  • What are the important takeaways from the data? What was interesting?
  • Why do these takeaways matter?
  • Was there anything surprising?
  • Overall, what do you want the audience to walk away with? What do you want them to understand about your data and research questions?

Conclusions / Big Picture

  • How do the insights connect to answer your research question(s)?

Limitations and Future Work

  • What limitations or weaknesses does your data / analysis suffer from?
  • What improvements might someone make to your analysis?

Due Dates

Check the class schedule linked on the main page.

Where to Submit

Most of the deliverables are submitted via project GitHub Classroom Assignment.

ImportantProject Group GitHub Classroom Assignment

The project is a group GitHub Classroom Assignment, which means all team members share a single online repository. This setup is prone to merge conflicts when multiple people edit the same file and attempt to push changes simultaneously. To prevent conflicts, I strongly recommend that each team member work in a separate file, eg, amin.qmd–few empty files are provided. The team must then coordinate to integrate the necessary pieces from each individual file into the final project submission. The GitHub collaboration process can sometimes be challenging. If your team encounters any issues you cannot quickly resolve on your own, please notify the instructor immediately to prevent project delays.

ImportantPublishing Website

Once you publish your project for the first time, the manual re-publishing procedure via quarto publish gh-pages --no-browser is no longer necessary. GitHub Actions will automatically re-publish your website every time you push your code to GitHub. However, you might need to return to the manual procedure if the automatic publishing fails. For automatic publishing to function, you must ensure your project’s virtual environment (the lockfile) is up-to-date (RStudioPackages pane → Snapshot). In addition to enabling automatic publishing, the virtual environment allows teammates and replicators to easily install missing packages (RStudioPackages pane → Restore).

Milestones

Milestone 0.0: Brainstorming Project Ideas

Students brainstorm ideas in the Project Notebook, categorize them, and declare their specific interests.

Milestone 0.1: Team Formation

Students browse the brainstormed ideas and categories in the Project Notebook and connect with classmates who share the same interests. After forming a team, the team members record their basic information in the Project Notebook.

Milestone 1: Proposal + Dataset

After creating their GitHub repository using the link in Where to Submit section, each team should use the proposal.qmd file to submit their project proposal as an appendix in their project website. The proposal should include the followings

  • a title
  • team member names
  • a short description of the project
  • 2-3 broad research questions the team is interested in exploring
  • the reasons/inspirations behind choosing the topic
  • one potential data set that can be used to answer the research questions
  • communication plan, ie, primary communication channel (eg, Google Chat, Slack, Email), expected response time (eg, 24 hours during the week), meeting time (eg, “we will meet 7-8pm every Wednesday”)
  • roles–everyone is expected to contribute to the project but having leads helps with accountability. In particular, determine (1) point of contact–the main person to email the instructor/preceptors, (2) technical lead–the person who will oversee the code base and website, (3) style lead–the person who will make sure the code is following the COMP/STAT 112/212: Code Styling Guide
  • conflict resolution, ie, if a team member misses a deadline, how will the team handle it? How will you make decisions if the team is split 50/50 on a something?
  • a rough implementation plan, ie, list of tasks with due date and roles–refer to the list of deliverables when building your list. The plan should be added as GitHub Issues within the project repository–see the callout box below for details.
TipTracking Project Progress

To track your project progress, utilize the GitHub Issues feature by creating a set of milestones and tasks (issues in GitHub terminology) inside each. Below are the instructions for how to add a task into a milestone.

  • Go to your project GitHub repository on the web
  • Select Issues tab
  • Click Milestones –> click New milestone –> type a title, pick a due date, and type a description then hit Create milestone
  • Click New issue –> type a tile, add a description, assign members (from the right list), and hit Create

Repeat the steps above to add new milestones and new tasks.

Milestone 2: Effective Teamwork

As individuals, not as a team, each member should read an article, listen to a podcast, or watch a video about effective teamwork then summarize it in the corresponding sheet in their progress tracker.

Milestone 3: Case Study

As a team, members should study one of the professional reports linked in the Expectation section (or something in a comparable quality) then record their observations about its data story structure, ie, how the report unfolds the story. The analysis should be added to appx/case-study.qmd file. Make sure to cite the referenced resources.

Milestone 4: EDA

As individuals, not as a team, each member should perform an exploratory data analysis (EDA) on one of the project dataset and add the results into their respective Quarto file under the eda folder of the project repository.

TipPackages for Quick EDA

In a blog post, Michael Clark demos eight R packages(html) that can help explore data quickly and effectively. The demoed packages are: arsenal, DataExplorer, dataReporter (dataMaid previously), gtsummary, janitor, SmartEDA, summarytools, and visdat.

Milestone 5: Progress Presentation

As a team, members should present their preliminary findings to class to solicit feedback.

Milestone 6: Initial Presentations

Before recording the final presentation video as described below, teams should present their (almost) complete data story to the class to solicit feedback. The teams must then incorporate that feedback into their final submission. The presentation should include a demonstration of the key findings followed by explanation of the corresponding technical details.

Milestone 7: Deliverables + Evaluation

Code

Each team should push the code of their project to GitHub. Each repository should include a README.md file that answers the following questions:

  • What is this GitHub repository all about?
  • What software (with the version numbers) need to be installed to run the code contained in this GitHub repository, eg, R version 4.4.3+ and RStudio 2024.12.1 Build 563?
  • What steps need to be taken to run the code contained in this GitHub repository? Think about the steps you did at the beginning of the semester to prepare your machine for class.
  • What does the expected output look like? You can use one or more screenshots of the main features.
TipExamples of Well-Strcutured README files

The README.md file is the landing page for your GitHub repository which must be inviting and informative. The README.md file is a dynamic document that you need to keep up-to-date. See the GitHub basic writing and formatting syntax page for tips on how to format your README file. For inspirational README.md examples and a list of useful resources, see the awesome README GitHub pageAimeos TYPO3 extension project and SegwayJump-Game are good examples.

Data Story Report

The data story report should be the rendered version of the landing page of the project website, ie, the file named index.qmd. When rendering the data story report make sure to

  • Hide the code chunks
  • Hide the warning and error messages
  • Avoid printing any unnecessary output
  • Format printed tables, if any, nicely. There are many packages such as gt and kableExtra that can help achieve this.

See the Structure section for details about what to include in the data story report.

Presentation Video

After soliciting feedback from peers, teams are expected to adjust their data story report accordingly and then record a video presentation (5-10 minutes if possible) explaining the main finding in their data story and pointing the interested viewers to the full report if more information is needed. The presentation video should be uploaded to the project GitHub repository (if possible) or a video streaming service such as YouTube or Loom then embedded in the report.

TipEmbedding Videos

See this Quarto webpage for how-to embed videos.

Presentation Slides

The presentation slides used in the presentation video (or an altered version) should be embedded in the report, preferably after uploading it to the project GitHub repository.

TipEmbedding Presentations

Evaluation and Reflection

Each team member will be asked to evaluate and reflect on their own performance, as well as the performance of each of their teammates. Additionally, each student will be asked to evaluate the projects of other teams.

Important Notes

Teamwork

Working in a team is a chance to improve one’s communication skills as well as know their teammates better. However, working in a team sometimes poses some challenges. To ensure successful project outcome, below are few expectations:

  • Active participation–be present, attend classes, do your work, keep your team informed about any unexpected events
  • Active listening–show interest in other team members’ ideas
  • Inclusive environment–invite teammates to participate
  • Each member should be in all main aspects of the project, including coding, reporting, and presentation. It is not acceptable for a single member to solely handle one aspect, such as coding, while another focuses solely on the report, and another solely on the presentation.

Code Backup

When working on your data analysis, ensure that you commit your code frequently to GitHub, accompanied by meaningful commit messages, and push your changes regularly. This practice helps prevent any unforeseen issues or loss of progress.

Inspirations

For inspirational data science projects, see the Data Engineering Zoomcamp list of projects for 2022: cohort #1 and cohort #2 and 2023 cohort #1 and cohort #2