R Language Data Frames And Tibbles Complete Guide
Understanding the Core Concepts of R Language Data Frames and Tibbles
R Language Data Frames and Tibbles: Explained in Detail with Important Info
Overview
Data Frames: The Traditional Tabular Structure
Construction
- Function:
data.frame()
- Columns: Can have different data types (numeric, character, factor, etc.)
- Rows: Each row is a distinct record or observation
- Example:
df <- data.frame( Name = c("Alice", "Bob"), Age = c(25, 30), stringsAsFactors = FALSE )
Key Features
- Subsetting: You can access subsets of data frames using indexing or logical conditions.
- Combining: Use
rbind()
to add rows andcbind()
to add columns. - Factors: Automatically converts string columns to factors unless
stringsAsFactors = FALSE
. - Printing: Displays in a default tabular format.
Common Manipulations
- Viewing Structure:
str(df)
- Summarizing Data:
summary(df)
- Filtering Rows: Logical statements within square brackets
[ ]
df[df$Age > 25, ]
- Selecting Columns: Use
$
or[]
df$Name df[, "Name"]
Tibbles: A Modern Enhancement
Introduction
Tibbles are a modern version of data frames introduced by the tibble
package in R. They offer an enhanced user experience by providing more intuitive behavior and better defaults.
Construction
- Function:
tibble()
from thetibble
package - Columns: Similar to data frames, supports different data types
- Rows: Represents observations and works identically in terms of indexing and subsetting
- Example:
Online Code run
Step-by-Step Guide: How to Implement R Language Data Frames and Tibbles
Introduction
R is a powerful language for statistical computing and data analysis. The two most common data structures for storing datasets in R are Data Frames and Tibbles. While they have similarities, tibbles (from the tidyverse package) offer some enhancements that make them more user-friendly for data manipulation.
Example 1: Creating Data Frames
Step 1: Create Vectors
First, we'll create some vectors which will be used as columns in our Data Frame.
# Create vectors
name <- c("Alice", "Bob", "Charlie")
age <- c(25, 30, 45)
weight <- c(55.75, 60.50, 85.75)
Step 2: Combine Vectors into a Data Frame
Now we can combine these vectors using data.frame()
function.
# Combine vectors into a dataframe
people_df <- data.frame(name = name, age = age, weight = weight)
# Print the dataframe
print(people_df)
Output:
name age weight
1 Alice 25 55.75
2 Bob 30 60.50
3 Charlie 45 85.75
Step 3: Add Row Names to the Data Frame (Optional)
You can also set row names if needed:
# Set row names
row.names(people_df) <- c("Person_1", "Person_2", "Person_3")
# Print dataframe with row names
print(people_df)
Output:
name age weight
Person_1 Alice 25 55.75
Person_2 Bob 30 60.50
Person_3 Charlie 45 85.75
Example 2: Creating Tibbles
Tibbles are an enhanced version of Data Frames provided by the tidyverse package in R. They can be created using the tibble()
function from the tibble
package.
Step 1: Install and Load Tidyverse Package
We first need to install and load the tidyverse
package which includes the tibble
package and other useful packages.
# Install the tidyverse package
install.packages("tidyverse")
# Load the tidyverse package
library(tidyverse)
Step 2: Create Vectors
Similar to Example 1, we'll create some vectors.
# Create vectors
name <- c("Alice", "Bob", "Charlie")
age <- c(25, 30, 45)
weight <- c(55.75, 60.50, 85.75)
Step 3: Combine Vectors into a Tibble
We can use the tibble()
function now to create a Tibble from our vectors.
# Combine vectors into a tibble
people_tbl <- tibble(name = name, age = age, weight = weight)
# Print the tibble
print(people_tbl)
Output:
# A tibble: 3 × 3
name age weight
<chr> <dbl> <dbl>
1 Alice 25 55.8
2 Bob 30 60.5
3 Charlie 45 85.8
Note how Tibbles have additional helpful information displayed by default such as the dimension and types of variables. It also prints only a select number of rows and columns by default.
Example 3: Manipulating Data Frames
Step 1: Adding a New Column to a Data Frame
Suppose you want to add a new column indicating whether each person's weight is below the average weight of all people in the dataset.
# Calculate the average weight
average_weight <- mean(weight)
# Add a new column - is weight below average?
people_df$bellow_avg_wt <- ifelse(weight < average_weight, "Yes", "No")
# Print the updated dataframe
print(people_df)
Output:
name age weight bellow_avg_wt
Person_1 Alice 25 55.75 Yes
Person_2 Bob 30 60.50 No
Person_3 Charlie 45 85.75 No
Step 2: Filtering Rows in a Data Frame
We can filter the rows based on certain conditions using the subset()
function for Data Frames.
# Filter rows where age is greater than 30
older_than_30 <- subset(people_df, age > 30)
# Print the filtered dataframe
print(older_than_30)
Output:
name age weight bellow_avg_wt
Person_3 Charlie 45 85.75 No
Alternatively, the dplyr package from the tidyverse makes filtering much easier with filter()
function.
# using dplyr to filter
older_than_30_dplyr <- people_df %>% filter(age > 30)
# Print the filtered dataframe
print(older_than_30_dplyr)
Output:
name age weight bellow_avg_wt
1 Charlie 45 85.75 No
Example 4: Manipulating Tibbles
Step 1: Adding a New Column to a Tibble
Adding columns to Tibbles works similarly to Data Frames but is often easier to read, especially when chaining operations. Here’s how you would do it using dplyr's mutate()
function.
# Load dplyr library
library(dplyr)
# Calculate the average weight
average_weight <- mean(weight)
# Add a new column
people_tbl <- people_tbl %>% mutate(bellow_avg_wt = if_else(weight < average_weight, "Yes", "No"))
# Print the updated tibble
print(people_tbl)
Output:
# A tibble: 3 × 4
name age weight bellow_avg_wt
<chr> <dbl> <dbl> <chr>
1 Alice 25 55.8 Yes
2 Bob 30 60.5 No
3 Charlie 45 85.8 No
Step 2: Filtering Rows in a Tibble
Filtering rows in a Tibble using dplyr is straightforward and easy to read.
# Filter rows where age is greater than 30
older_than_30_tbl <- people_tbl %>% filter(age > 30)
# Print the filtered tibble
print(older_than_30_tbl)
Output:
# A tibble: 1 × 4
name age weight bellow_avg_wt
<chr> <dbl> <dbl> <chr>
1 Charlie 45 85.8 No
Example 5: Inspecting Data Frames and Tibbles
You can inspect your data using functions like str()
, summary()
, and head()
.
Step 1: Inspect the Structure of a Data Frame
The str()
function helps you understand the structure of the data.
# Inspect structure of data frame
str(people_df)
Output:
'data.frame': 3 obs. of 4 variables:
$ name : chr "Alice" "Bob" "Charlie"
$ age : num 25 30 45
$ weight : num 55.7 60.5 85.8
$ bellow_avg_wt: Factor w/ 2 levels "No","Yes": 2 1 1
Step 2: Get a Summary of a Data Frame
The summary()
function provides basic statistics about each variable in your data frame.
# Get summary statistics about data frame
summary(people_df)
Output:
name age weight bellow_avg_wt
Length:3 Min. :25.00 Min. :55.75 No :2
Class :character 1st Qu.:30.00 1st Qu.:60.12 Yes:1
Mode :character Median :35.00 Median :70.14
Mean :33.33 Mean :67.37
3rd Qu.:40.00 3rd Qu.:80.66
Max. :45.00 Max. :85.75
Step 3: Display the First Few Rows of a Data Frame
The head()
function displays the first few rows and is useful during exploratory analysis.
# Display first few rows of data frame
head(people_df)
Output:
name age weight bellow_avg_wt
1 Alice 25 55.75 Yes
2 Bob 30 60.50 No
3 Charlie 45 85.75 No
Step 1: Inspect the Structure of a Tibble
Again, str()
is handy for understanding the structure.
# Inspect structure of tibble
str(people_tbl)
Output:
Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 4 variables:
$ name : chr "Alice" "Bob" "Charlie"
$ age : num 25 30 45
$ weight : num 55.7 60.5 85.8
$ bellow_avg_wt: chr "Yes" "No" "No"
Step 2: Get a Summary of a Tibble
Just like Data Frames, you can get a summary of a Tibble.
# Get summary statistics about tibble
summary(people_tbl)
Output:
name age weight bellow_avg_wt
Length:3 Min. :25.00 Min. :55.75 Length:3
Class :character 1st Qu.:30.00 1st Qu.:60.12 Class :character
Mode :character Median :35.00 Median :70.14 Mode :character
Mean :33.33 Mean :67.37
3rd Qu.:40.00 3rd Qu.:80.66
Max. :45.00 Max. :85.75
Step 3: Display the First Few Rows of a Tibble
Again, head()
displays the first few rows.
Top 10 Interview Questions & Answers on R Language Data Frames and Tibbles
1. What are Data Frames in R?
Answer: Data frames in R are used to store data tables. They are essentially lists of vectors of equal length. Each vector represents a column which may be of a different mode (numeric, character, etc.), and each row represents an observation or record. Data frames are particularly useful for data analysis and statistical modeling.
Example:
df <- data.frame(name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
salary = c(50000, 60000, 70000))
2. What are Tibbles in R?
Answer: Tibbles are a modern take on data frames, part of the tibble
package within the tidyverse
. They print in a more user-friendly format, never adjust the variable names, and preserve column types. Tibbles also provide a more predictable behavior during data manipulation.
Example:
library(tibble)
tibble_df <- tibble(name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
salary = c(50000, 60000, 70000))
3. What is the difference between a data frame and a tibble?
Answer: While both store tabular data, the primary differences are:
- Printing: Tibbles print a limited number of rows and columns and show the data types of each column.
- Column Names: Tibbles do not adjust column names if they contain special characters or spaces.
- Recycling: Tibbles do not recycle shorter vectors in a data frame, which prevents silent recycling errors.
- Subsetting: Tibbles return single columns as a tibble, unlike data frames which may return a vector.
4. How do you create a data frame in R?
Answer: You can create a data frame using the data.frame()
function by passing vectors of equal length.
Example:
df <- data.frame(name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
salary = c(50000, 60000, 70000))
5. How do you create a tibble in R?
Answer: Create a tibble using the tibble()
function, from the tibble
package.
Example:
library(tibble)
tibble_df <- tibble(name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
salary = c(50000, 60000, 70000))
6. How do you select specific columns from a data frame or tibble?
Answer: You can select columns using the dollar sign ($) and square brackets [] for more complex selections.
Example for Data Frame:
# Select just the age column from a data frame
age_col <- df$age
Example for Tibble:
# Select just the age column from a tibble
age_col <- tibble_df[["age"]]
7. How do you add a new column to a data frame or a tibble?
Answer: You can add a new column by simply assigning a value to a new column name.
Example:
# Adding a new column to a data frame
df$bonus <- df$salary * 0.1
# Adding a new column to a tibble
tibble_df <- tibble_df %>%
mutate(bonus = salary * 0.1)
8. How can you filter rows in a data frame or tibble?
Answer: Use the subset()
function for data frames or the filter()
function from dplyr
for tibbles.
Example for Data Frame:
# Filter rows where age is greater than 28
subset_df <- subset(df, age > 28)
Example for Tibble:
library(dplyr)
# Filter rows where age is greater than 28
filtered_tibble <- tibble_df %>%
filter(age > 28)
9. How do you combine two data frames or tibbles by rows?
Answer: Use rbind()
to combine by rows.
Example:
# Combine two data frames by rows
new_df <- rbind(df1, df2)
# Combine two tibbles by rows
library(dplyr)
new_tibble <- bind_rows(tibble_df1, tibble_df2)
10. How do you combine two data frames or tibbles by columns?
Answer: Use cbind()
to combine by columns.
Example:
Login to post a comment.