R Language Data Frames And Tibbles Complete Guide

 Last Update:2025-06-22T00:00:00     .NET School AI Teacher - SELECT ANY TEXT TO EXPLANATION.    10 mins read      Difficulty-Level: beginner

Understanding the Core Concepts of R Language Data Frames and Tibbles

R Language Data Frames and Tibbles: Explained in Detail with Important Info

Overview

Data Frames: The Traditional Tabular Structure

Construction
  • Function: data.frame()
  • Columns: Can have different data types (numeric, character, factor, etc.)
  • Rows: Each row is a distinct record or observation
  • Example:
    df <- data.frame(
      Name = c("Alice", "Bob"),
      Age = c(25, 30),
      stringsAsFactors = FALSE
    )
    
Key Features
  • Subsetting: You can access subsets of data frames using indexing or logical conditions.
  • Combining: Use rbind() to add rows and cbind() to add columns.
  • Factors: Automatically converts string columns to factors unless stringsAsFactors = FALSE.
  • Printing: Displays in a default tabular format.
Common Manipulations
  • Viewing Structure: str(df)
  • Summarizing Data: summary(df)
  • Filtering Rows: Logical statements within square brackets [ ]
    df[df$Age > 25, ]
    
  • Selecting Columns: Use $ or []
    df$Name
    df[, "Name"]
    

Tibbles: A Modern Enhancement

Introduction

Tibbles are a modern version of data frames introduced by the tibble package in R. They offer an enhanced user experience by providing more intuitive behavior and better defaults.

Construction
  • Function: tibble() from the tibble package
  • Columns: Similar to data frames, supports different data types
  • Rows: Represents observations and works identically in terms of indexing and subsetting
  • Example:

Online Code run

🔔 Note: Select your programming language to check or run code at

💻 Run Code Compiler

Step-by-Step Guide: How to Implement R Language Data Frames and Tibbles

Introduction

R is a powerful language for statistical computing and data analysis. The two most common data structures for storing datasets in R are Data Frames and Tibbles. While they have similarities, tibbles (from the tidyverse package) offer some enhancements that make them more user-friendly for data manipulation.

Example 1: Creating Data Frames

Step 1: Create Vectors

First, we'll create some vectors which will be used as columns in our Data Frame.

# Create vectors
name <- c("Alice", "Bob", "Charlie")
age <- c(25, 30, 45)
weight <- c(55.75, 60.50, 85.75)

Step 2: Combine Vectors into a Data Frame

Now we can combine these vectors using data.frame() function.

# Combine vectors into a dataframe
people_df <- data.frame(name = name, age = age, weight = weight)

# Print the dataframe
print(people_df)

Output:

     name age weight
1   Alice  25  55.75
2     Bob  30  60.50
3 Charlie  45  85.75

Step 3: Add Row Names to the Data Frame (Optional)

You can also set row names if needed:

# Set row names
row.names(people_df) <- c("Person_1", "Person_2", "Person_3")

# Print dataframe with row names
print(people_df)

Output:

           name age weight
Person_1   Alice  25  55.75
Person_2     Bob  30  60.50
Person_3 Charlie  45  85.75

Example 2: Creating Tibbles

Tibbles are an enhanced version of Data Frames provided by the tidyverse package in R. They can be created using the tibble() function from the tibble package.

Step 1: Install and Load Tidyverse Package

We first need to install and load the tidyverse package which includes the tibble package and other useful packages.

# Install the tidyverse package
install.packages("tidyverse")

# Load the tidyverse package
library(tidyverse)

Step 2: Create Vectors

Similar to Example 1, we'll create some vectors.

# Create vectors
name <- c("Alice", "Bob", "Charlie")
age <- c(25, 30, 45)
weight <- c(55.75, 60.50, 85.75)

Step 3: Combine Vectors into a Tibble

We can use the tibble() function now to create a Tibble from our vectors.

# Combine vectors into a tibble
people_tbl <- tibble(name = name, age = age, weight = weight)

# Print the tibble
print(people_tbl)

Output:

# A tibble: 3 × 3
  name      age weight
  <chr>   <dbl>  <dbl>
1 Alice      25   55.8
2 Bob        30   60.5
3 Charlie    45   85.8

Note how Tibbles have additional helpful information displayed by default such as the dimension and types of variables. It also prints only a select number of rows and columns by default.

Example 3: Manipulating Data Frames

Step 1: Adding a New Column to a Data Frame

Suppose you want to add a new column indicating whether each person's weight is below the average weight of all people in the dataset.

# Calculate the average weight
average_weight <- mean(weight)

# Add a new column - is weight below average?
people_df$bellow_avg_wt <- ifelse(weight < average_weight, "Yes", "No")

# Print the updated dataframe
print(people_df)

Output:

           name age weight bellow_avg_wt
Person_1   Alice  25  55.75          Yes
Person_2     Bob  30  60.50          No
Person_3 Charlie  45  85.75          No

Step 2: Filtering Rows in a Data Frame

We can filter the rows based on certain conditions using the subset() function for Data Frames.

# Filter rows where age is greater than 30
older_than_30 <- subset(people_df, age > 30)

# Print the filtered dataframe
print(older_than_30)

Output:

           name age weight bellow_avg_wt
Person_3 Charlie  45  85.75          No

Alternatively, the dplyr package from the tidyverse makes filtering much easier with filter() function.

# using dplyr to filter
older_than_30_dplyr <- people_df %>% filter(age > 30)

# Print the filtered dataframe
print(older_than_30_dplyr)

Output:

           name age weight bellow_avg_wt
1 Charlie    45  85.75             No

Example 4: Manipulating Tibbles

Step 1: Adding a New Column to a Tibble

Adding columns to Tibbles works similarly to Data Frames but is often easier to read, especially when chaining operations. Here’s how you would do it using dplyr's mutate() function.

# Load dplyr library
library(dplyr)

# Calculate the average weight
average_weight <- mean(weight)

# Add a new column 
people_tbl <- people_tbl %>% mutate(bellow_avg_wt = if_else(weight < average_weight, "Yes", "No"))

# Print the updated tibble
print(people_tbl)

Output:

# A tibble: 3 × 4
  name      age weight bellow_avg_wt
  <chr>   <dbl>  <dbl> <chr>        
1 Alice      25   55.8 Yes          
2 Bob        30   60.5 No           
3 Charlie    45   85.8 No           

Step 2: Filtering Rows in a Tibble

Filtering rows in a Tibble using dplyr is straightforward and easy to read.

# Filter rows where age is greater than 30
older_than_30_tbl <- people_tbl %>% filter(age > 30)

# Print the filtered tibble
print(older_than_30_tbl)

Output:

# A tibble: 1 × 4
  name      age weight bellow_avg_wt
  <chr>   <dbl>  <dbl> <chr>        
1 Charlie    45   85.8 No           

Example 5: Inspecting Data Frames and Tibbles

You can inspect your data using functions like str(), summary(), and head().

Step 1: Inspect the Structure of a Data Frame

The str() function helps you understand the structure of the data.

# Inspect structure of data frame
str(people_df)

Output:

'data.frame':	3 obs. of  4 variables:
 $ name         : chr  "Alice" "Bob" "Charlie"
 $ age          : num  25 30 45
 $ weight       : num  55.7 60.5 85.8
 $ bellow_avg_wt: Factor w/ 2 levels "No","Yes": 2 1 1

Step 2: Get a Summary of a Data Frame

The summary() function provides basic statistics about each variable in your data frame.

# Get summary statistics about data frame
summary(people_df)

Output:

       name                age            weight      bellow_avg_wt
 Length:3           Min.   :25.00   Min.   :55.75   No :2      
 Class :character   1st Qu.:30.00   1st Qu.:60.12   Yes:1      
 Mode  :character   Median :35.00   Median :70.14            
                    Mean   :33.33   Mean   :67.37            
                    3rd Qu.:40.00   3rd Qu.:80.66            
                    Max.   :45.00   Max.   :85.75            

Step 3: Display the First Few Rows of a Data Frame

The head() function displays the first few rows and is useful during exploratory analysis.

# Display first few rows of data frame
head(people_df)

Output:

          name age weight bellow_avg_wt
1       Alice  25  55.75           Yes
2         Bob  30  60.50            No
3     Charlie  45  85.75            No

Step 1: Inspect the Structure of a Tibble

Again, str() is handy for understanding the structure.

# Inspect structure of tibble
str(people_tbl)

Output:

Classes 'tbl_df', 'tbl' and 'data.frame':	3 obs. of  4 variables:
 $ name      : chr  "Alice" "Bob" "Charlie"
 $ age       : num  25 30 45
 $ weight    : num  55.7 60.5 85.8
 $ bellow_avg_wt: chr  "Yes" "No" "No"

Step 2: Get a Summary of a Tibble

Just like Data Frames, you can get a summary of a Tibble.

# Get summary statistics about tibble
summary(people_tbl)

Output:

       name               age            weight     bellow_avg_wt   
 Length:3           Min.   :25.00   Min.   :55.75   Length:3          
 Class :character   1st Qu.:30.00   1st Qu.:60.12   Class :character  
 Mode  :character   Median :35.00   Median :70.14   Mode  :character  
                    Mean   :33.33   Mean   :67.37                     
                    3rd Qu.:40.00   3rd Qu.:80.66                     
                    Max.   :45.00   Max.   :85.75                     

Step 3: Display the First Few Rows of a Tibble

Again, head() displays the first few rows.

Top 10 Interview Questions & Answers on R Language Data Frames and Tibbles

1. What are Data Frames in R?

Answer: Data frames in R are used to store data tables. They are essentially lists of vectors of equal length. Each vector represents a column which may be of a different mode (numeric, character, etc.), and each row represents an observation or record. Data frames are particularly useful for data analysis and statistical modeling.

Example:

df <- data.frame(name = c("Alice", "Bob", "Charlie"),
                 age = c(25, 30, 35),
                 salary = c(50000, 60000, 70000))

2. What are Tibbles in R?

Answer: Tibbles are a modern take on data frames, part of the tibble package within the tidyverse. They print in a more user-friendly format, never adjust the variable names, and preserve column types. Tibbles also provide a more predictable behavior during data manipulation.

Example:

library(tibble)
tibble_df <- tibble(name = c("Alice", "Bob", "Charlie"),
                     age = c(25, 30, 35),
                     salary = c(50000, 60000, 70000))

3. What is the difference between a data frame and a tibble?

Answer: While both store tabular data, the primary differences are:

  • Printing: Tibbles print a limited number of rows and columns and show the data types of each column.
  • Column Names: Tibbles do not adjust column names if they contain special characters or spaces.
  • Recycling: Tibbles do not recycle shorter vectors in a data frame, which prevents silent recycling errors.
  • Subsetting: Tibbles return single columns as a tibble, unlike data frames which may return a vector.

4. How do you create a data frame in R?

Answer: You can create a data frame using the data.frame() function by passing vectors of equal length.

Example:

df <- data.frame(name = c("Alice", "Bob", "Charlie"),
                 age = c(25, 30, 35),
                 salary = c(50000, 60000, 70000))

5. How do you create a tibble in R?

Answer: Create a tibble using the tibble() function, from the tibble package.

Example:

library(tibble)
tibble_df <- tibble(name = c("Alice", "Bob", "Charlie"),
                     age = c(25, 30, 35),
                     salary = c(50000, 60000, 70000))

6. How do you select specific columns from a data frame or tibble?

Answer: You can select columns using the dollar sign ($) and square brackets [] for more complex selections.

Example for Data Frame:

# Select just the age column from a data frame
age_col <- df$age

Example for Tibble:

# Select just the age column from a tibble
age_col <- tibble_df[["age"]]

7. How do you add a new column to a data frame or a tibble?

Answer: You can add a new column by simply assigning a value to a new column name.

Example:

# Adding a new column to a data frame
df$bonus <- df$salary * 0.1

# Adding a new column to a tibble
tibble_df <- tibble_df %>%
  mutate(bonus = salary * 0.1)

8. How can you filter rows in a data frame or tibble?

Answer: Use the subset() function for data frames or the filter() function from dplyr for tibbles.

Example for Data Frame:

# Filter rows where age is greater than 28
subset_df <- subset(df, age > 28)

Example for Tibble:

library(dplyr)
# Filter rows where age is greater than 28
filtered_tibble <- tibble_df %>%
  filter(age > 28)

9. How do you combine two data frames or tibbles by rows?

Answer: Use rbind() to combine by rows.

Example:

# Combine two data frames by rows
new_df <- rbind(df1, df2)

# Combine two tibbles by rows
library(dplyr)
new_tibble <- bind_rows(tibble_df1, tibble_df2)

10. How do you combine two data frames or tibbles by columns?

Answer: Use cbind() to combine by columns.

Example:

You May Like This Related .NET Topic

Login to post a comment.