Coding

R Programming Basics for Data Science

R is one of the most powerful and widely used programming languages for statistical computing and data analysis. If you're studying statistics, learning R will make your life significantly easier.

Having trouble debugging your R code?
Our experienced tutors can fix your scripts, help you clean your data, and write up your assignment for you. Get a free quote.

1. Getting Started: Vectors and DataFrames

In R, everything revolves around objects. The most basic object is a vector, and a collection of vectors makes a DataFrame (similar to an Excel spreadsheet).

# Creating a vector of ages
ages <- c(22, 25, 21, 23)

# Creating a basic DataFrame
my_data <- data.frame(
  Name = c("Alice", "Bob", "Charlie", "David"),
  Age = ages,
  Passed = c(TRUE, FALSE, TRUE, TRUE)
)

2. Data Wrangling with dplyr

The dplyr library is the industry standard for manipulating data in R. Once you install it (install.packages("dplyr")), you can easily filter rows, select columns, and create new variables.

library(dplyr)

# Filter for students older than 22 who passed
filtered_data <- my_data %>%
  filter(Age > 22 & Passed == TRUE)

3. Plotting with ggplot2

Visualizing data is crucial. The ggplot2 library uses a system of "layers" to build stunning, publication-ready graphs.

library(ggplot2)

# Create a simple bar plot
ggplot(my_data, aes(x = Name, y = Age, fill = Passed)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Student Ages", y = "Age in Years")

Next Steps

Once you are comfortable with these basics, you can start running actual statistical tests (like t-tests and ANOVAs) directly in R using functions like t.test() and aov().