ABSTRACT Formulae display:?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom. ABSTRACT Monte Carlo simulations (MCSs) provide important information about statistical phenomena that would be impossible to assess otherwise. This article introduces MCS methods and their applications to research and statistical pedagogy using a novel software package for the R Project for Statistical Computing constructed to lessen the often steep learning curve when organizing simulation code. A primary goal of this article is to demonstrate how well-suited MCS designs are to classroom demonstrations, and how they provide a hands-on method…

Abstract Abstract This article describes a collaborative project across three institutions to develop, implement, and evaluate a series of tutorials and case studies that highlight fundamental tools of data science—such as visualization, data manipulation, and database usage—that instructors at a wide-range of institutions can incorporate into existing statistics courses. The resulting materials are flexible enough to serve both introductory and advanced students, and aim to provide students with the skills to experiment with data, find their own patterns, and ask their own questions. In this article, we discuss a tutorial on data visualization and a case study synthesizing data wrangling and visualization skills in detail, and provide references to additional class-tested materials. R and R Markdown are used for all…

ABSTRACT ABSTRACT This article describes a dataset on pop songs that charted on the Billboard Top 40 and/or at one or more of five radio stations, three in Chicago, Illinois, and two in Grand Rapids, Michigan, from the early 1960s through 1970. The dataset includes 5746 observations and 26 variables. In the body of the paper article, we describe how the cleaned version of the dataset can be used in an introductory or second-level statistics course to investigate questions of race and gender bias and the role of radio consultants in Top 40 radio airplay in the 1960s. The richness of the dataset requires students to think about relationships among multiple variables. In an appendix, we briefly describe how a…

Abstract Formulae display:?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom. Abstract We performed an empirical study of the perceived quality of scientific graphics produced by beginning R users in two plotting systems: the base graphics package ("base R") and the ggplot2 add-on package. In our experiment, students taking a data science course on the Coursera platform were randomized to complete identical plotting exercises using either base R or ggplot2. This exercise involved creating two plots: one bivariate scatterplot and one plot of a multivariate relationship that necessitated using color…

Abstract Abstract We designed a sequence of courses for the DataCamp online learning platform that approximates the content of a typical introductory statistics course. We discuss the design and implementation of these courses and illustrate how they can be successfully integrated into a brick-and-mortar class. We reflect on the process of creating content for online consumers, ruminate on the pedagogical considerations we faced, and describe an R package for statistical inference that became a by-product of this development process. We discuss the pros and cons of creating the course sequence and express our view that some aspects were particularly problematic. The issues raised should be relevant to nearly all statistics instructors. Supplementary materials for this article are available online.

ABSTRACT ABSTRACT The proliferation of vast quantities of available datasets that are large and complex in nature has challenged universities to keep up with the demand for graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data. To keep up with this demand, attracting students early on to data science as well as providing them a solid foray into the field becomes increasingly important. We present a case study of an introductory undergraduate course in data science that is designed to address these needs. Offered at Duke University, this course has no prerequisites and serves a wide audience of aspiring statistics and data science…

Abstract As the demand for skilled data scientists has grown, university level statistics and data science courses have become more rigorous in training students to understand and utilize the tools that their future careers will likely require. However, the mechanisms to assess students' use of these tools while they are learning to use them are not well defined. As such, a framework to assess statistical computing actions was created. Using task-based interviews of students who completed a second course in statistics, the framework was used to determine the ways in which students utilize statistical computing tools, specifically R, while going through problem solving phases. Patterns that emerged are discussed.

Abstract Abstract Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum," the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and…