12/12
Performance & Optimization · Page 1 of 1

Benchmarking & Optimization

Performance & Optimization

Timing Code with timeit

Always measure before and after optimization:

import timeit

# Time a statement 1000 times
time_ms = timeit.timeit('x**2', 'x = 5', number=1000) / 1000
print(f"Time per operation: {time_ms * 1e6:.2f} microseconds")

Vectorization vs Loops

The golden rule: Avoid loops, use vectorization

# SLOW: Python loop
result = []
for i in range(1000000):
    result.append(i**2)

# FAST: NumPy vectorization (100x faster)
result = np.arange(1000000)**2

Memory Layout: C vs Fortran Order

NumPy arrays can be row-major (C) or column-major (Fortran):

arr_c = np.array([[1, 2], [3, 4]], order='C')  # Row-major
arr_f = np.array([[1, 2], [3, 4]], order='F')  # Column-major

# Iterate rows (C-order faster)
for row in arr_c:
    sum(row)

# Iterate columns (Fortran-order faster)
for col in arr_f.T:
    sum(col)

Common Performance Traps

  1. Type mismatch — float64 × float32 requires conversion
  2. Axis ambiguity — specify axis explicitly
  3. Copying vs views.copy() vs slicing
  4. Function call overhead — numpy functions > custom loops

Pro tip: Use NumPy's built-in functions (they're optimized in C). Avoid np.apply_along_axis for large arrays.

Done
main.py
Loading...
OUTPUT
Click "Run Code" to execute…