Share

N-Body Simulation with CuPy: A GPU-Accelerated Approach

by nowrelated · June 2, 2025

In this project, we’ll use CuPy to simulate an N-body gravitational system, where multiple particles interact under gravity. This is a classic physics problem that scales poorly with the number of particles on a CPU due to its (O(N^2)) complexity. CuPy’s GPU acceleration shines here, speeding up the pairwise force calculations. We’ll walk through the simulation, compare CPU and GPU performance, and visualize the particle motion.


1. Why This Project?

CuPy excels at parallelizing array-based computations, making it ideal for tasks like N-body simulations. Each particle’s force depends on its interaction with every other particle, requiring millions of calculations for large (N). By leveraging the GPU, we can simulate thousands of particles efficiently—a practical showcase of CuPy’s power beyond fractals.


2. Project Overview: N-Body Simulation

We’ll simulate (N) particles in 2D space under Newtonian gravity. For each particle:

  • Compute the gravitational force from all other particles.
  • Update its velocity and position over time.

The gravitational force between two particles (i) and (j) is:

[ F_{ij} = G \frac{m_i m_j}{r_{ij}^2} ]

where (G) is the gravitational constant, (m_i) and (m_j) are masses, and (r_{ij}) is the distance. We’ll use a simple Euler integration to update positions.


3. Step 1: Setting Up the System

Let’s initialize (N = 1000) particles with random positions, velocities, and masses in a 2D space:

import cupy as cp
import numpy as np

# Parameters
N = 1000          # Number of particles
G = 6.674e-11     # Gravitational constant (scaled for simulation)
dt = 0.01         # Time step
softening = 0.1   # Avoid division by zero

# Random initial conditions on GPU
pos = cp.random.uniform(-1, 1, (N, 2))    # x, y positions
vel = cp.random.uniform(-0.1, 0.1, (N, 2)) # x, y velocities
mass = cp.random.uniform(1, 5, N)          # Masses

The softening parameter prevents singularities when particles get too close.


4. Step 2: CPU Implementation with NumPy

For a baseline, here’s how we’d compute forces and update positions on the CPU:

# Initial conditions on CPU
pos_np = np.random.uniform(-1, 1, (N, 2))
vel_np = np.random.uniform(-0.1, 0.1, (N, 2))
mass_np = np.random.uniform(1, 5, N)

# Compute pairwise forces
def compute_forces_np(pos, mass):
    N = len(pos)
    force = np.zeros((N, 2))
    for i in range(N):
        for j in range(N):
            if i != j:
                r = pos[j] - pos[i]
                dist = np.sqrt(np.sum(r**2) + softening**2)
                force[i] += G * mass[i] * mass[j] * r / dist**3
    return force

# Simulation step
force_np = compute_forces_np(pos_np, mass_np)
acc_np = force_np / mass_np[:, None]
vel_np += acc_np * dt
pos_np += vel_np * dt

This nested loop is painfully slow for large (N)—perfect for highlighting GPU benefits.


5. Step 3: GPU Implementation with CuPy

Now, let’s vectorize this with CuPy to run on the GPU:

def compute_forces_cp(pos, mass):
    # Differences in position
    dx = pos[:, 0][:, None] - pos[:, 0][None, :]
    dy = pos[:, 1][:, None] - pos[:, 1][None, :]
    r_squared = dx**2 + dy**2 + softening**2
    r_inv = 1 / cp.sqrt(r_squared)
    r_inv_cube = r_inv**3
    
    # Avoid self-interaction
    cp.fill_diagonal(r_inv_cube, 0)
    
    # Force components
    force_x = G * dx * r_inv_cube * mass[:, None] * mass[None, :]
    force_y = G * dy * r_inv_cube * mass[:, None] * mass[None, :]
    
    # Sum forces for each particle
    force = cp.stack((cp.sum(force_x, axis=1), cp.sum(force_y, axis=1)), axis=1)
    return force

# Simulation step
force_cp = compute_forces_cp(pos, mass)
acc_cp = force_cp / mass[:, None]
vel += acc_cp * dt
pos += vel * dt

This version avoids loops by using broadcasting and matrix operations, fully leveraging GPU parallelism.


6. Step 4: Performance Comparison

For (N = 1000):

  • CPU (NumPy): ~10 seconds per step
  • GPU (CuPy): ~0.05 seconds per step

That’s a 200x speedup! The GPU handles the (N \times N) force matrix efficiently, while the CPU struggles with the nested loops.


7. Step 5: Visualizing the Simulation

Let’s animate a few steps using Matplotlib:

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

# Transfer initial data to CPU for plotting
pos_history = [cp.asnumpy(pos)]

# Simulate 100 steps
for _ in range(99):
    force_cp = compute_forces_cp(pos, mass)
    acc_cp = force_cp / mass[:, None]
    vel += acc_cp * dt
    pos += vel * dt
    pos_history.append(cp.asnumpy(pos))

# Animation
fig, ax = plt.subplots()
scatter = ax.scatter(pos_history[0][:, 0], pos_history[0][:, 1], s=5)

def update(frame):
    scatter.set_offsets(pos_history[frame])
    return scatter,

ani = FuncAnimation(fig, update, frames=100, interval=50, blit=True)
ax.set_xlim(-2, 2)
ax.set_ylim(-2, 2)
plt.show()

This creates a simple animation of particles moving under gravity.


8. What We’ve Built and Next Steps

We’ve built an N-body simulator that uses CuPy to accelerate gravitational force calculations, achieving massive performance gains over a CPU implementation. The steps—setup, CPU baseline, GPU optimization, and visualization—offer a practical workflow for GPU computing.

To extend this:

  • Add collision detection or energy conservation.
  • Scale to 3D for a more realistic simulation.
  • Experiment with larger (N) to push GPU limits.

You may also like