Matrix Multiplication in Ruby

Ruby has an inbuilt Matrix class, which you should use in any serious application. In this article, we discuss how matrix multiplication can be performed efficiently. However, because the code we write is in Ruby, even a very good implementation will run slower than using the Matrix class.

For the sake of simplicity, we confine ourselves to the multiplication of square matrices.

Representation of Matrices

We shall represent Matrices as 2-dimensional arrays. Every element of the outer array will represent a row of the matrix.

We shall denote the element in the \(i\)th row and \(j\)th column of a matrix called \(C\) as \(c_{ij}\).

The Naive Method

We can find the product \(C\) of two \(n \times n\) matrices \(A\) and \(B\) using simple row-by-column multiplication:
\[ c_{ij} = \sum\limits_{k=1}^n a_{ik} \cdot b_{kj} \]

def naive_matrix_mult(a, b)
  n = a.length
  # Create a new n x n matrix with all elements initialized to 0.
  c = { (0) }
  0.upto(n-1) do |i|
    0.upto(n-1) do |j|
      0.upto(n-1) do |k|
        c[i][j] += a[i][k] * b[k][j]

naive_matrix_mult([[1, 2], [3, 4]], [[5, 6], [7, 8]]) 
                       # => [[19, 22], [43, 50]]

Clearly, this will take \( \Theta(n^3) \) time to multiply two \(n
\times n \) matrices.

Divide and Conquer

Another approach is to try to use the divide-and-conquer paradigm, and split each of the matrices \(A\), \(B\) and \(C\) into four \( \frac n2 \times \frac n2 \) matrices:

\[ A = \left( \begin{array}{cc} A_{11} & A_{12} \\\\ A_{21} & A_{22} \end{array} \right), \]\[ B = \left( \begin{array}{cc} B_{11} & B_{12} \\\\ B_{21} & B_{22} \end{array} \right), \]\[ C = \left( \begin{array}{cc} C_{11} & C_{12} \\\\ C_{21} & C_{22} \end{array} \right). \]

We can rewrite \( C=A \cdot B \) as

\[ \left( \begin{array}{cc} C_{11} & C_{12} \\\\ C_{21} & C_{22} \end{array} \right) =
\left( \begin{array}{cc} A_{11} & A_{12} \\\\ A_{21} & A_{22} \end{array} \right) \cdot
\left( \begin{array}{cc} B_{11} & B_{12} \\\\ B_{21} & B_{22} \end{array} \right). \]

Hence, we get the following four equations:

\[ C_{11} = A_{11} \cdot B_{11} + A_{12} \cdot B_{21}, \]\[ C_{12} = A_{11} \cdot B_{12} + A_{12} \cdot B_{22}, \]\[ C_{21} = A_{21} \cdot B_{11} + A_{22} \cdot B_{21}, \]\[ C_{22} = A_{21} \cdot B_{12} + A_{22} \cdot B_{22}. \]

This reduces the multiplication of two \(n \times n\) matrices to eight multiplications of two \(\frac n2 \times \frac n2\) matrices and four \(\frac n2 \times \frac n2\) matrix additions.

Each addition will take \(\Theta(n^2)\) time. If the time taken by this method to multiply two \(n \times n\) matrices in \(T(n)\), then we get the recurrence:

\[ T(n) = 8 T \left( \frac n2 \right) + \Theta(n^2). \]

The solution to this recurrence is \(\Theta(n^3)\), meaning that the divide-and-conquer method is no better than the naive method.

def partition(a)
  n = a.length
  lim = (n-1)/2
  a_11 = a[0..lim].map {|x| x[0..lim] }
  a_12 = a[0..lim].map {|x| x[(lim+1)..(n-1)] }
  a_21 = a[(lim+1)..(n-1)].map {|x| x[0..lim] }
  a_22 = a[(lim+1)..(n-1)].map {|x|  x[(lim+1)..(n-1)] }
  [a_11, a_12, a_21, a_22]

def combine(a_11, a_12, a_21, a_22)
  n1 = a_11.length
  n2 = a_21.length
  c = { nil }
  for i in 0..(n1-1)
    c[i] = a_11[i] + a_12[i]
  for i in n1..(n1+n2-1)
    c[i] = a_21[i-n1] + a_22[i-n1]

def add(a, b)
  n = a.length
  c = { { 0 } }
  for i in 0..(n-1)
    for j in 0..(n-1)
      c[i][j] = a[i][j] + b[i][j]

def recursive_matrix_mult(a, b)
  n = a.length
  return [[]] if n == 0
  return [[a[0][0] * b[0][0]]] if n == 1
  a_11, a_12, a_21, a_22 = *partition(a)
  b_11, b_12, b_21, b_22 = *partition(b)
  c_11 = add(recursive_matrix_mult(a_11, b_11),
               recursive_matrix_mult(a_12, b_21))
  c_12 = add(recursive_matrix_mult(a_11, b_12),
               recursive_matrix_mult(a_12, b_22))
  c_21 = add(recursive_matrix_mult(a_21, b_11),
               recursive_matrix_mult(a_22, b_21))
  c_22 = add(recursive_matrix_mult(a_21, b_12),
               recursive_matrix_mult(a_22, b_22))
  combine(c_11, c_12, c_21, c_22)

Other methods

Strassen’s algorithm also involves breaking matrices into smaller pieces. It can multiply two \(n \times n\) matrices in \(\Theta(n^{lg\; 7}) \approx \Theta(n^{2.807})\) time. However, it is far more cumbersome to implement, and is not as numerically stable. Strassen’s method is used when numerical stability is not critical and the matrices are large enough to provide a speed advantage. (The constant factor is large enough that for smaller matrices the naive method will work better.) Implementations of Strassen’s method will typically have a crossover point, below which matrices are multiplied using the naive method.

The current theoretical optimum is the Coppersmith-Winograd algorithm which runs in \(O(n^{2.376})\) time. It is almost never used in practice because the constant factor is so large that matrices of the size required to obtain an advantage over Strassen’s method cannot be processed by current hardware.