Skip to content
Pablo Rodriguez

Gaussian Distribution

The Gaussian distribution is also called the normal distribution. When you hear either term, they mean exactly the same thing. Also known as the “bell-shaped distribution.”

If x is a random variable with Gaussian distribution:

  • Mean parameter: μ (center of curve)
  • Variance parameter: σ²
  • Standard deviation: σ (width of curve)
gaussian-formula
p(x) = (1 / √(2π)) * (1/σ) * e^(-(x-μ)²/(2σ²))

Where:

  • π ≈ 3.14159 (ratio of circle’s circumference to diameter)
  • e = exponential function
  • μ = mean parameter
  • σ = standard deviation parameter
  • Center: Located at mean μ
  • Width: Determined by standard deviation σ
  • Shape: Symmetric bell curve
  • Area under curve: Always equals 1 (probability requirement)
  • Called “bell-shaped” because resembles shape of classic tower bells
  • Example: Liberty Bell’s top portion follows this curve shape

σ = 1 (μ = 0):

  • Standard normal distribution
  • Moderate width curve

σ = 0.5 (μ = 0):

  • Narrower curve (less variance)
  • Taller peak (area still = 1)
  • σ² = 0.25 (variance)

σ = 2 (μ = 0):

  • Wider curve (more variance)
  • Shorter peak (area still = 1)
  • σ² = 4 (variance)

Different μ values:

  • Shifts distribution left or right
  • Does not change shape or width
  • Width still determined by σ

With m examples: x⁽¹⁾, x⁽²⁾, …, x⁽ᵐ⁾

mean-estimation
μ = (1/m) * Σ(i=1 to m) x⁽ⁱ⁾

Calculation: Average of all training examples

variance-estimation
σ² = (1/m) * Σ(i=1 to m) (x⁽ⁱ⁾ - μ)²

Calculation: Average of squared differences from mean

  • These formulas are called maximum likelihood estimates
  • Some statistics classes use (1/(m-1)) instead of (1/m)
  • In practice, difference between 1/m and 1/(m-1) is negligible
  • Using 1/m is more common in machine learning

If you drew:

  • 100 numbers from this distribution → histogram approximates bell curve
  • 1,000 numbers → closer approximation
  • Infinite numbers with fine bins → exact bell curve
  • High p(x): Example likely normal (near center)
  • Low p(x): Example likely anomalous (far from center)

With fitted Gaussian distribution:

  • Example near center: High probability, considered normal
  • Example far from center: Low probability, considered anomalous

Understanding the Gaussian distribution is essential for anomaly detection as it provides a principled way to model normal behavior and identify deviations from expected patterns.