Datasets
This appendix provide a guidance on how to select a dataset for the course project.
Collect Data
Unfortunately, collecting you own data in his course via for example surveys is not feasible–it requires a lot of paperwork, time, and effort.
Have Dataset
If you learned about or collected a dataset in another course, internship, etc., you can use it as long as you have not used it in any other courses.
No Project Idea
If you are looking for a project idea, explore the dataset community/aggregation repositories below for inspiration:
- Tidy Tuesday
- Awesome Public Datasets (includes links to other data repositories at the end)
- Kaggle
- Data World, must sign-in
- Data is Plural
- Our World in Data
- Pro Publica
- Sports Data Sets by Ohio State U
- Trello board with large collection of repositories and datasets
- Data Engineering Zoomcamp collection of repositories and datasets
- Preserving Public Access to Government Datasets blog
- Police Data
- DataHub Dataset Collection
- Project Gutenberg
- Common Crawl
- Stanford Large Network Dataset Collection
- arXiv Dataset Collections
Have Project Idea
If you already have a project idea but would like to find a suitable dataset, check the search engines below:
- Google Dataset Search
- U.S. Government’s Open Data
- U.S. Census Bureau
- U.S. National Institutes of Health (NIH)
- U.S. Centers for Disease Control and Prevention (CDC)
- U.S. Department of Education
- U.S. Environmental Protection Agency (EPA)
- U.S. Institute of Education Sciences (IES)
- Kaggle
- Data World, must sign-in
- Our World in Data
- Gapminder
- OpenML
- Five Thirty Eight Dataset Repository: GitHub, Website
- BuzzFeedNews Dataset Repository
- Open Corporates
- Typical search engines with search operators such as
filetype:csv
: return onlycsv
files that match the keywords searched–common file types that can be easily processed in R arecsv
,.tsv
,.xls
,.xlsx
, and.rds
.site:data.gov
: limit results to those fromdata.gov
- European Data Search
- Nature Research Data
- Harvard Dataverse
- Open Science Framework (OSF)
- FigShare
- CERN Open Data Portal