Welcome! In this module, we cover data manipulation basics - tools every Data Scientist needs. We will focus on the "Holy Trinity" of Python Data Science:

  1. NumPy for numerical computing.
  2. Pandas for data manipulation.
  3. Matplotlib for basic visualization.

NumPy is the backbone of almost all data libraries in Python. It allows for fast operations on arrays.

import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])

# Perform vector operations
print(arr * 2) 

Pandas introduces the DataFrame, a powerful tool for handling tabular data.

# Loading a Dataset

import pandas as pd

# Creating a simple dataframe
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Paris']
}
df = pd.DataFrame(data)
print(df)

import matplotlib.pyplot as plt

plt.plot([1, 2, 3, 4], [10, 20, 25, 30])
plt.title("Simple Growth Chart")
plt.show()

To get hands-on experience, we have prepared an interactive Marimo notebook. This allows you to run code directly in your browser without installing anything.

🚀 Action Required

Positive : Launch the Interactive Notebook : Click the link below to open the lab in a new window. Once finished, return here to continue the tutorial. : 👉 Open Marimo Notebook

You've successfully:

Learned how to use NumPy arrays.

Created your first Pandas DataFrame.

Visualized basic data points.

Next Step: Head over to Course 2: Visual Storytelling to learn how to make these charts look professional.