Creating DataFrames
DataFrames — Your Data Table
What is a DataFrame?
In formal terms, a DataFrame is a two-dimensional, size-mutable, heterogeneous tabular data structure with labeled axes (rows and columns). Borrowed from the R programming language and implemented in Python by the Pandas library, a DataFrame is the primary data structure for tabular data manipulation — essentially a fully-programmable spreadsheet or in-memory SQL table.
Each column in a DataFrame is a Series (a labeled 1D array), and all columns share the same row index. This design allows you to mix data types across columns (integers, strings, floats, booleans) while keeping operations vectorized and fast.
Creating from a Dictionary
The most common way — keys become column names, values become column data:
import pandas as pd
df = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"score": [92.5, 88.0, 95.3]
})
Key Inspection Methods
df.shape # (rows, columns) → (3, 3)
df.dtypes # column data types
df.head(5) # first 5 rows
df.tail(5) # last 5 rows
df.info() # summary with memory usage
df.describe() # statistical summary
Rule of thumb: Always run
df.info()anddf.describe()first when you receive a new dataset.