Mastering Pandas: A Visual Infographic

🐼 Pandas Master Reference Table

Your quick lookup guide. Click any function or method to go directly to the official Pandas documentation.

Category	Core / Must-Know Functions & Methods	Other Useful Functions & Methods
Creation & Data Structures	pd.Series pd.DataFrame pd.date_range	pd.period_range pd.timedelta_range pd.interval_range Categorical Types
High-Level Manipulation	pd.concat pd.merge pd.pivot_table pd.melt pd.cut pd.qcut pd.get_dummies	pd.merge_asof pd.crosstab pd.factorize pd.wide_to_long
Missing Data Handling	.isna() / .isnull() .notna() / .notnull() .dropna() .fillna() .interpolate()	.ffill() .bfill()
Input / Output (IO)	pd.read_csv .to_csv() pd.read_excel .to_excel() pd.read_json pd.read_sql	.read_parquet() .read_feather() .read_html() .read_clipboard()
Selection & Indexing	.loc[] .iloc[] Boolean Indexing .isin() .query() .head() / .tail()	.at[] / .iat[] .sample() .take() .nlargest() .nsmallest()
Statistics & Aggregation	.sum() .mean() .median() .min() / .max() .std() / .var() .quantile() .describe() .count()	.skew() / .kurt() .sem() .cumsum() .cumprod()
GroupBy & Window	.groupby() .agg() .transform() .filter() .rolling() .expanding() .ewm() .resample()	Custom aggregations, applying multiple functions at once.
Time Series / Date-Time	pd.to_datetime pd.to_timedelta .dt accessor .resample()	Time zone handling Date Offsets
String / Text Methods	.str accessor .str.contains() .str.replace() .str.split() .str.lower()	.str.extract() .str.cat() .str.pad()
Utility Methods	.astype() .info() / .dtypes .copy() .drop() .rename() .unique()	.pipe() .eval() .style .attrs

📚 Key Documentation Links

Pandas API Reference → The main index page for all public methods and functions.

General Functions → Top-level functions like `merge`, `concat`, `pivot`, `date_range` etc.

User Guide: Essential Basic Functionality → Describes core methods like selection, statistics etc.

Important Functions (Supporting Roles)

Frequently used to support the critical functions.

sort_values()rename()drop()astype()sum()mean()count()std()apply()concat()to_datetime()set_index()reset_index()

Utility Functions (Specialized Tools)

Valuable for edge cases and specific applications.

pivot_table()melt()stack()unstack()cut()qcut()duplicated()drop_duplicates()resample()to_excel()read_sql()pipe()explode()

Recommended Learning Path

Follow this path for maximum return on your learning investment.

1

Data Structures & I/O

Series, DataFrame, read_csv

2

Inspection & Selection

head, info, describe, loc, iloc

3

Grouping & Merging

groupby, merge, value_counts

4

Cleaning & Advanced

dropna, fillna, apply, pivot_table

Full A-Z Quick Reference

A searchable list of all covered pandas functions.

Function	Description

*(Note: This is a subset; pandas has hundreds of functions.)*

The Critical 12 Functions (80% of Use Cases)

Master these to handle the vast majority of your data analysis tasks.

1. Data Structures & I/O

pd.DataFrame()

The primary data structure: a 2D table with labeled axes (rows and columns).

df = pd.DataFrame({'A': [1,2], 'B': [3,4]})

Visualization:

{'A':[1,2]...}

Python Dict

→

A

B

1

3

2

4

pd.read_csv()

The most common way to load data from a file into a DataFrame.

df = pd.read_csv('data.csv')

Visualization:

data.csv

→

col1

col2

a

b

c

d

df.to_csv()

Save a DataFrame to a CSV file, completing the I/O cycle.

df.to_csv('output.csv')

Visualization:

A

B

1

2

→

output.csv

2. Inspection & Selection

df.head() / info()

Quickly inspect the first few rows, data types, and null counts of your data.

df.head(2)

Visualization (head):

Row 1

Row 2

...

Row N

→

Row 1

Row 2

df.loc[]

Select data by labels (row index names, column names, or boolean conditions).

df.loc[df['A'] > 1]

Visualization:

Filter: A > 1

→

A

B

1

3

2

4

df.iloc[]

Select data by integer position (e.g., the first row, last three columns).

df.iloc[0:2, 1]

Visualization: Rows 0-1, Col 1

A

B

1

3

2

4

→

3

4

3. Manipulation & Analysis

df.groupby()

The core of analysis: split data into groups, apply a function, and combine results.

df.groupby('category').sum()

Visualization:

A

B

A

→

A

B

→

Σ(A)

Σ(B)

pd.merge()

Combine datasets by linking rows using one or more keys (like a SQL join).

pd.merge(df1, df2, on='key')

Visualization:

DF1

+

DF2

→

DF1

DF2

df.value_counts()

Quickly count the number of occurrences of each unique value in a column.

df['category'].value_counts()

Visualization:

A

B

A

→

A: 2

B: 1

4. Data Cleaning

df.dropna()

Remove missing values (NaN) from a DataFrame, essential for clean analysis.

df.dropna()

Visualization:

Row 1

Row with NaN

Row 3

→

Row 1

Row 3

df.fillna()

Fill in missing values with a specified value or method (e.g., the mean).

df.fillna(0)

Visualization:

1

NaN

3

→

1

0

3

df.astype()

Convert a column to a different data type (e.g., object to integer or datetime).

df['col'].astype('int')

Visualization:

"123"

Object

→

123

Integer

Why Are These Critical?

Core Workflow: They represent the entire data analysis pipeline: loading data (`read_csv`), inspecting it (`head`, `info`), selecting subsets (`loc`, `iloc`), analyzing it (`groupby`), combining it (`merge`), and cleaning it (`dropna`, `fillna`).

Universal Need: Virtually every data science project using Python and pandas will rely heavily on this specific set of functions. Mastering them provides the foundation for everything else.

The Important Functions (Supporting Roles)

Frequently used to support critical functions and handle common data transformations.

1. Aggregation & Summarization

sum() / mean()

Basic aggregations that calculate the total or average of a column's values.

df['sales'].sum()

Visualization (sum):

100

200

150

→

450

describe()

Generate a quick statistical summary (count, mean, std, etc.) of numerical columns.

df.describe()

Visualization:

DataFrame

→

count: 3.0

mean: 150.0

std: 50.0

...

drop_duplicates()

A common cleaning operation to remove entire duplicate rows from a DataFrame.

df.drop_duplicates()

Visualization:

Row A

Row B

→

Row A

Row B

2. Transformation & Reshaping

apply()

Apply a custom function along an axis (rows or columns). Powerful but can be slow.

df['col'].apply(lambda x: x*2)

Visualization:

1

2

f(x) = x*2

→

2

4

map()

Substitute each value in a Series with another value (from a dict or function).

df['col'].map({'A': 'Apple'})

Visualization:

A

B

{'A':'Apple'}

→

Apple

NaN

pd.concat()

Stack pandas objects (like DataFrames) vertically or horizontally.

pd.concat([df1, df2])

Visualization:

DF1

DF2

→

DF1

DF2

set_index() / reset_index()

Promote a column to be the DataFrame's index, or demote the index to a column.

df.set_index('ID')

Visualization (set_index):

ID

A

→

A

pivot_table()

Reshape data by creating a spreadsheet-style pivot table based on columns.

df.pivot_table(index='A', columns='B')

Visualization:

Long Data

→

Wide Data

3. Filtering & Selection

isin()

Filter a DataFrame where a column's value is present in a given list of items.

df[df['col'].isin(['A', 'C'])]

Visualization:

A

B

C

['A', 'C']

→

A

C

query()

An alternative way to filter rows using a boolean expression as a string.

df.query('sales > 100 and region == "West"')

Visualization:

DataFrame

"sales > 100"

→

Filtered DF

Why Are These Important?

Bridge to Advanced Analysis: These functions are the workhorses that prepare your data for critical operations. You use isin() or query() to select data *before* feeding it to groupby().

Flexibility and Power: Functions like apply() and pivot_table() provide immense flexibility to perform complex, custom transformations that are not possible with basic aggregations alone.

The Utility Functions (Specialized Tools)

Valuable for edge cases, convenience, or specific applications.

1. Statistical Analysis & Visualization

corr() / var()

Calculate the correlation matrix between columns or the variance of a series.

df.corr()

Visualization (corr):

DF

→

1.0

0.9

1.0

quantile()

Return value at the given quantile (e.g., 0.5 for the median).

df['col'].quantile(0.5)

Visualization (0.5 Quantile):

10

20

100

→

20.0

nunique()

Count the number of distinct/unique elements in a Series.

df['col'].nunique()

Visualization:

A

B

A

→

2

plot()

A wrapper for Matplotlib to quickly generate plots from a DataFrame or Series.

df['sales'].plot(kind='line')

Visualization:

Data

→

2. Selection, Reshaping & Conversion

astype()

Cast a pandas object (like a column) to a specified data type.

df['col'].astype(float)

Visualization:

"123"

Object

→

123.0

Float

transpose() or .T

Reflect the DataFrame over its main diagonal by swapping rows and columns.

df.T

Visualization:

2x3 DF

→

3x2 DF

sample()

Return a random sample of items from an axis of an object.

df.sample(n=2)

Visualization:

Full DF

→

Random
Sample

filter()

Subset rows or columns of a DataFrame according to labels in a specified index.

df.filter(like='_id')

Visualization (filter columns):

user_id, sale, item_id

→

user_id, item_id

to_dict() / to_numpy()

Convert the DataFrame to a Python dictionary or a NumPy array.

df.to_dict('records')

Visualization:

DF

→

[{'A':1}, {'A':2}]

Why Are These "Utility"?

Niche Use Cases: Many of these functions solve specific problems that don't appear in every project. For example, .plot() is for quick visualization, and .filter() is for label-based selection, which is less common than conditional filtering.

Convenience & Interoperability: Some are helpers that offer convenient syntax (.T) or are used for converting data to other formats for use with other libraries (to_numpy(), to_dict()).

🐼 Pandas Series & DataFrame Reference

The two fundamental data structures that power everything in Pandas.

1. Pandas Series

A Series is a one-dimensional labeled array, like a single column in a spreadsheet with an index.

✅ How to Create a Series

From a List

The most basic way, with an auto-generated integer index.

s = pd.Series([10, 20, 30])

Visualization:

[10, 20, 30]

→

0
1
2

10
20
30

With a Custom Index

Label your data points with meaningful keys.

s = pd.Series([1,2], index=['a','b'])

Visualization:

Data: [1,2]
Index: ['a','b']

→

a
b

1
2

From a Dictionary

Keys become the index, values become the data.

s = pd.Series({'a':100, 'b':200})

Visualization:

{'a':100,
'b':200}

→

a
b

100
200

✅ How to Access Elements

By Position

Use integer indices just like a list.

s[0]  # Access first element

Visualization:

0
1
2

10
20
30

→

10

By Label

Use the custom index labels for direct access.

s['b'] # Access element with label 'b'

Visualization:

a
b
c

1
2
3

→

2

Slicing

Select a range of elements. Works with positions and labels.

s[:2] # Select first two elements

Visualization:

0
1
2

10
20
30

✅ Useful Series Functions & Attributes

.head() / .tail()

Preview the first or last 5 elements.

.index / .values

Get the index or the data as a NumPy array.

.dtype / .size

Check data type and number of elements.

.describe()

Get quick summary statistics (count, mean, std, etc.).

.unique()

Get an array of the unique values in the Series.

.value_counts()

Count the occurrences of each unique value.

.mean() / .sum()

Calculate the mean or sum of the values.

.min() / .max()

Find the minimum or maximum value.

2. Pandas DataFrame

A DataFrame is a 2D labeled table with columns of potentially different types, like a full spreadsheet.

✅ How to Create a DataFrame

From a Dictionary of Lists

Dictionary keys become column headers, lists become the column data.

data = {'name': ["A", "B"], 'age': [25, 30]}
df = pd.DataFrame(data)

Visualization:

{'name':["A","B"],
'age':[25,30]}

→

	name	age
0	A	25
1	B	30

From a List of Dictionaries

Each dictionary in the list becomes a row in the DataFrame.

data = [{'name':'A','age':25}, {'name':'B'}]
df = pd.DataFrame(data)

Visualization:

[{'n':'A','a':25},
{'n':'B'}]

→

	name	age
0	A	25.0
1	B	NaN

✅ How to Access Data

Access a Column

df['age']

name	age
A	25
B	30

Access a Row (.iloc)

df.iloc[0]

name	age
A	25
B	30

Access a Cell (.at)

df.at[0, 'age']

name	age
A	25
B	30

✅ Useful DataFrame Functions & Attributes

.head() / .tail()

Preview the first or last 5 rows.

.shape / .columns

Get `(rows, cols)` tuple or list of column names.

.info() / .dtypes

Get a full summary or check column data types.

.describe()

Summary statistics for numerical columns.

.T (Transpose)

Swap rows and columns.

.sort_values()

Sort the DataFrame by one or more columns.

.drop()

Remove rows or columns by label.

.isnull().sum()

Count missing values in each column.

.fillna()

Replace missing (NaN) values with a specified value.

Column Ops

Run functions like `.mean()` or `.max()` on a single column.

.groupby()

Group rows based on a column to perform aggregate functions.

📂 Importing & Exporting Data in Pandas

Your visual guide to moving data between Pandas and various file formats.

Format	Import Function	Export Function
CSV	`pd.read_csv()`	`df.to_csv()`
Excel	`pd.read_excel()`	`df.to_excel()`
JSON	`pd.read_json()`	`df.to_json()`
SQL	`pd.read_sql()`	`df.to_sql()`
HTML	`pd.read_html()`	`df.to_html()`
Parquet	`pd.read_parquet()`	`df.to_parquet()`
Pickle	`pd.read_pickle()`	`df.to_pickle()`
Feather	`pd.read_feather()`	`df.to_feather()`
Clipboard	`pd.read_clipboard()`	`df.to_clipboard()`

📌 Common File Formats

1. CSV Files

Comma-Separated Values are the most common flat file format.

# Read from CSV
df = pd.read_csv("data.csv")
# Write to CSV
df.to_csv("output.csv", index=False)

Visualization:

↔

DataFrame

2. Excel Files

Read from and write to Microsoft Excel spreadsheets.

# Read from Excel
df = pd.read_excel("data.xlsx")
# Write to Excel
df.to_excel("output.xlsx", index=False)

Visualization:

↔

DataFrame

3. JSON Files

JavaScript Object Notation is common for web APIs.

# Read from JSON
df = pd.read_json("data.json")
# Write to JSON
df.to_json("output.json", orient="records")

Visualization:

↔

DataFrame

4. SQL Databases

Execute a query and load results directly into a DataFrame.

# Read from SQL
df = pd.read_sql("SELECT * FROM users", conn)
# Write to SQL
df.to_sql("new_table", conn)

Visualization:

↔

DataFrame

5. Parquet

An efficient, columnar storage format for big data.

# Read from Parquet
df = pd.read_parquet("data.parquet")
# Write to Parquet
df.to_parquet("output.parquet")

Visualization:

↔

DataFrame

6. HTML Tables

Read tables directly from a webpage URL.

# Read tables from URL
dfs = pd.read_html("page.html")
# Write to HTML
df.to_html("output.html")

Visualization:

↔

DataFrame

📌 Specialized & Binary Formats

Pickle

Saves any Python object, including DataFrames with complex types.

df.to_pickle("data.pkl")
df = pd.read_pickle("data.pkl")

HDF5

Hierarchical format for storing multiple datasets in one file.

df.to_hdf("data.h5", key="my_data")
df = pd.read_hdf("data.h5", "my_data")

Feather

A fast, lightweight binary format for sharing data.

df.to_feather("data.feather")
df = pd.read_feather("data.feather")

📊 Example Quick Use: Read, Process, Save

A common workflow is to load data from one format, perform some calculations, and save the result in another.

# Read CSV
df = pd.read_csv("sales.csv")
# Process Data
df["total"] = df["price"] * df["quantity"]
# Save to Excel
df.to_excel("sales_report.xlsx", index=False)

Workflow Visualization:

sales.csv

→

Process in
DataFrame

→

report.xlsx

🐼 Pandas Data Selection, Filtering & Indexing

Visually mastering how to access, query, and isolate your data.

Sample DataFrame for Examples

All examples in this section will use the following DataFrame, df.

	name	age	city
0	Alice	25	NY
1	Bob	30	LA
2	Charlie	35	SF
3	David	40	TX

📌 1. Selecting Columns

Single Column (returns a `Series`)

df['name']

Visualization:

name	age	city
Alice	25	NY
Bob	30	LA

Multiple Columns (returns a `DataFrame`)

df[['name', 'age']]

Visualization:

name	age	city
Alice	25	NY
Bob	30	LA

📌 2. Selecting Rows

By Position (`.iloc`)

df.iloc[0]

name	age	city
Alice	25	NY
Bob	30	LA

Slicing by Position (`.iloc`)

df.iloc[1:3]

name	age	city
Alice	25	NY
Bob	30	LA
Charlie	35	SF
David	40	TX

By Condition

df[df['age'] > 30]

name	age	city
Alice	25	NY
Bob	30	LA
Charlie	35	SF
David	40	TX

📌 3. Selecting Specific Cells

By Position (`.iloc`)

df.iloc[0, 1]

name	age	city
Alice	25	NY

By Label (`.loc`, `.at`)

df.loc[0, 'age']

name	age	city
Alice	25	NY

📌 4. Boolean Filtering

Multiple Conditions (`&`, `|`)

df[(df['age'] > 25) & (df['city'] == "LA")]

name	age	city
Alice	25	NY
Bob	30	LA

Multiple Values (`.isin()`)

df[df['city'].isin(["NY", "LA"])]

name	age	city
Alice	25	NY
Bob	30	LA
Charlie	35	SF

Not Null (`.notnull()`)

df[df['city'].notnull()]

Filters out rows with missing values (NaN).

📌 5. Indexing Concepts

Set & Reset Index

df.set_index('name')

...

→

name	age	city
Alice	25	NY
Bob	30	LA

MultiIndex (Hierarchical)

Create an index with multiple levels for grouping.

		score	subject
Alice	Test1	85	Math
Alice	Test2	90	Science
Bob	Test1	78	Math
Bob	Test2	88	Science

📌 6. Useful Indexing Functions

.index / .columns

Get the index or column labels.

.sample(n)

Get `n` random rows from the DataFrame.

.nlargest(n, 'col')

Get the top `n` rows based on a column's values.

.nsmallest(n, 'col')

Get the bottom `n` rows based on a column's values.

🐼 Pandas Slicing Concepts

Visually extracting subsets of your data with precision and ease.

Sample DataFrame for Examples

All examples in this section use the following DataFrame, df.

	name	age	city	score
0	Alice	25	NY	88
1	Bob	30	LA	92
2	Charlie	35	SF	95
3	David	40	TX	70
4	Eva	28	NV	85
5	Frank	33	NY	90

📌 1. Row & Column Slicing

Row Slice by Position (`.iloc`)

df.iloc[1:4]

name	age	city
Alice	25	NY
Bob	30	LA
Charlie	35	SF
David	40	TX
Eva	28	NV

Row Slice by Label (`.loc`)

df.loc[0:2]

name	age	city
Alice	25	NY
Bob	30	LA
Charlie	35	SF
David	40	TX

Note: .loc is inclusive, so index 2 is included.

Column Slice by Label (`.loc`)

df.loc[:, 'age':'score']

name	age	city	score
Alice	25	NY	88
Bob	30	LA	92

Row & Column Combined

df.iloc[1:4, 0:2]

name	age	city
Alice	25	...
Bob	30	...
Charlie	35	...
David	40	...

Step Slicing

df.iloc[::2]

name	age	city
Alice	25	NY
Bob	30	LA
Charlie	35	SF
David	40	TX
Eva	28	NV

Negative Indexing

df.iloc[-2:]

name	age	city
...	...	...
Eva	28	NV
Frank	33	NY

📌 2. Conditional Slicing & Filtering

Simple Condition

df[df['age'] > 30]

name	age	city
Charlie	35	SF
David	40	TX
Frank	33	NY

List Membership (`.isin()`)

df[df['city'].isin(['NY', 'LA'])]

name	age	city
Alice	25	NY
Bob	30	LA
Frank	33	NY

Query Method (`.query()`)

df.query('age > 30 & score > 80')

name	age	score
Charlie	35	95
Frank	33	90

📌 3. MultiIndex Slicing

MultiIndex DataFrame Structure

An index with multiple levels for grouping.

		score
Alice	Test1	88
Alice	Test2	92
Bob	Test1	90
Bob	Test2	85

Slicing by Index Level

df_multi.loc['Alice']

Result:

	score
Test1	88
Test2	92

⚡ Summary of Slicing Techniques

Method	Example	Notes
Row slice by position	`df.iloc[1:4]`	End index is excluded.
Row slice by label	`df.loc[0:3]`	End label is included.
Step slicing	`df.iloc[::2]`	Selects every Nth element.
Conditional slice	`df[df['age']>30]`	Uses a boolean mask.
Query-based slice	`df.query('age>30')`	A readable string-based alternative.
MultiIndex slice	`df_multi.loc['Alice']`	Selects all entries from the first index level.

🐼 Pandas Grouping & Aggregation

Visually summarizing, transforming, and filtering your data by category.

Sample DataFrame for Examples

All examples in this section use the following DataFrame, df.

	department	employee	salary	age	gender
0	HR	Alice	50000	25	F
1	IT	Bob	60000	30	M
2	HR	Charlie	55000	28	M
3	IT	David	65000	35	M
4	Finance	Eva	70000	40	F
5	HR	Frank	52000	32	M
6	Finance	Grace	72000	38	F
7	IT	Hannah	68000	29	F

📌 1. Basic Grouping & Aggregation

Simple Aggregation

df.groupby('department')['salary'].sum()

department	salary
Finance	142000
HR	157000
IT	193000

Multiple Aggregations

df.groupby('department')['salary']
  .agg(['sum', 'mean', 'max'])

dept	sum	mean	max
Finance	142k	71k	72k
HR	157k	52.3k	55k
IT	193k	64.3k	68k

Multi-level Grouping

df.groupby(['department', 'gender'])['salary'].mean()

		salary
Finance	F	71000
HR	F	50000
HR	M	53500

📌 2. Group Operations (Iterate, Select, Apply)

Iterating Over Groups

for dept, group in grouped: ...

Example: First Iteration (dept='Finance')

employee	salary
Eva	70000
Grace	72000

Selecting a Specific Group

grouped.get_group('HR')

employee	salary
Alice	50000
Charlie	55000
Frank	52000

Applying Custom Functions

def salary_range(x): ...
grouped['salary'].apply(salary_range)

department	salary_range
Finance	2000
HR	5000
IT	8000

📌 3. Transform & Filter

Transform & Add Aggregated Values

df['dept_avg_salary'] = 
  df.groupby('department')['salary'].transform('mean')

dept	employee	salary	dept_avg_salary
HR	Alice	50k	52333
IT	Bob	60k	64333
HR	Charlie	55k	52333
IT	David	65k	64333

Filter Groups Based on Condition

grouped.filter(lambda x: x['salary'].mean() > 60000)

dept	employee	salary
IT	Bob	60000
IT	David	65000
Finance	Eva	70000
Finance	Grace	72000
IT	Hannah	68000

📌 4. Reshaping & Frequency Tables

Pivot Table

pd.pivot_table(df, values='salary',
 index='department', columns='gender', aggfunc='mean')

gender	F	M
department
Finance	71k	0
HR	50k	53.5k
IT	68k	62.5k

Crosstab (Frequency Table)

pd.crosstab(df['department'], df['gender'])

gender	F	M
department
Finance	2	0
HR	1	2
IT	1	2

Cumulative Operations

df.groupby('department')['salary'].cumsum()

dept	salary	cum_salary
HR	50k	50k
HR	52k	102k
HR	55k	157k
IT	60k	60k

Note: Cumulation restarts for each group.

⚡ Summary of Grouping Methods

Method	Description
`.groupby('col')`	Groups DataFrame by a column.
`.agg(['sum', 'mean'])`	Computes multiple summary statistics.
`.apply(func)`	Applies a custom function to each group.
`.transform('mean')`	Broadcasts a group calculation back to the original rows.
`.filter(lambda x: ...)`	Removes entire groups based on a condition.
`pd.pivot_table(...)`	Creates an Excel-style pivot table.
`pd.crosstab(...)`	Computes a frequency table of factors.

🐼 Pivot Tables & Cross Tabulation

Visually summarizing and analyzing relationships between variables.

Sample DataFrame for Examples

All examples in this section use the following DataFrame, df.

	department	employee	salary	age	gender
0	HR	Alice	50000	25	F
1	IT	Bob	60000	30	M
2	HR	Charlie	55000	28	M
3	IT	David	65000	35	M
4	Finance	Eva	70000	40	F

📌 1. Pivot Tables

Basic Pivot (Single Aggregation)

pd.pivot_table(df, values='salary', index='department', aggfunc='mean')

department	salary
Finance	71000.0
HR	52333.3
IT	64333.3

Pivot with Columns

pd.pivot_table(df, values='salary', index='department', columns='gender', aggfunc='mean', fill_value=0)

gender	F	M
department
Finance	71000	0
HR	50000	53500
IT	68000	62500

Pivot with Multiple Aggregations

pd.pivot_table(df, values='salary', index='department', aggfunc=['sum', 'mean', 'max'])

department	salary
department	sum	mean	max
Finance	142k	71k	72k
HR	157k	52.3k	55k

Advanced Pivot (Multiple Values & Columns)

pd.pivot_table(df, values=['salary', 'age'], index='department', columns='gender', aggfunc='mean')

department	salary		age
department	F	M	F	M
Finance	71k	0	39	0
HR	50k	53.5k	25	30

📌 2. Cross Tabulation (`crosstab`)

Basic Crosstab (Frequency Count)

pd.crosstab(df['department'], df['gender'])

gender	F	M
department
Finance	2	0
HR	1	2
IT	1	2

Normalized Crosstab (Percentage)

pd.crosstab(df['department'], df['gender'], normalize='index')

gender	F	M
Finance	1.00	0.00
HR	0.33	0.67
IT	0.33	0.67

Crosstab with Margins (Totals)

pd.crosstab(df['department'], df['gender'], margins=True)

gender	F	M	All
Finance	2	0	2
HR	1	2	3
IT	1	2	3
All	4	4	8

📌 3. When to Use Pivot vs. Crosstab

Feature	Pivot Table	Crosstab
Purpose	Summarize numerical data	Count/summarize categorical data
Aggregation	`mean, sum, max, min`, etc.	Mainly `count` (can also do %)
Input	A DataFrame	Series or array-like objects
Margins/Totals	Supported via `margins=True`	Supported via `margins=True`