zenzicubi.co/posts/finite-field/4/index.qmd

---
title: "Exploring Finite Fields, Part 4: The Power of Forgetting"
description: |
  ...
format:
  html:
    html-math-method: katex
jupyter: python3
date: "2024-02-03"
date-modified: "2025-07-20"
categories:
  - algebra
  - finite field
  - haskell
---


The [last post](../3) in this series focused on understanding some small linear groups
  and implementing them on the computer over both a prime field and prime power field.

The prime power case was particularly interesting.
First, we adjoined the roots of a polynomial to the base field, GF(2).
Rather than the traditional means of adding new symbols like *α*, we used companion matrices,
  which behave the same arithmetically.
For example, for the smallest prime power field, GF(4), we use the polynomial $p(x) = x^2 + x + 1$,
  and map its symbolic roots (*α* and *α*^2^), to matrices over GF(2):

$$
\begin{gather*}
  f : \mathbb{F}_4 \longrightarrow \mathbb{F}_2 {}^{2 \times 2}
  \\ \\
  \begin{gather*}
    f(0) = {\bf 0} =
      \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix}\right)
      & f(1) = I
      = \left(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}\right)
    \\
    f(\alpha) = C_p
      = \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix}\right)
      & f(\alpha^2) = C_p {}^2
      = \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix}\right)
  \end{gather*}
  \\ \\
  f(a + b)= f(a) + f(b), \quad f(ab) = f(a)f(b)
\end{gather*}
$$

Finally, we constructed GL(2, 4) using matrices of matrices
  -- not [block matrices](https://en.wikipedia.org/wiki/Block_matrix)!
This post will focus on studying this method in slightly more detail.


Reframing the Path Until Now
----------------------------

In the above description, we already mentioned larger structures over GF(2),
  namely polynomials and matrices.
Since GF(4) can itself be described with matrices over GF(2),
  we can generalize *f* to give us two more maps:

- $f^*$, which converts matrices over GF(4) to double-layered matrices over GF(2), and
- $f^\bullet$, which converts polynomials over GF(4) to polynomials of matrices over GF(2)


### Matrix Map

We examined the former map briefly in the previous post.
More explicitly, we looked at a matrix *B* in SL(2, 4) which had the property
  that it was cyclic of order five.
Then, to work with it without relying on symbols, we simply applied *f* over the contents of the matrix.

$$
\begin{gather*}
  f^* : \mathbb{F}_4 {}^{2 \times 2}
    \longrightarrow
    (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2}
  \\[10pt]
  B = \left(\begin{matrix}
             0 & \alpha   \\
      \alpha^2 & \alpha^2
    \end{matrix} \right)
  \\
  B^* = f^*(B)
    = \left(\begin{matrix}
      f(0)        & f(\alpha)   \\
      f(\alpha^2) & f(\alpha^2)
    \end{matrix} \right)
    = \left(\begin{matrix}
      \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right)
        & \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right)
      \\
      \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
        & \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
    \end{matrix} \right)
\end{gather*}
$$

We can do this because a matrix contains values in the domain of *f*, thus uniquely determining
  a way to change the internal structure (what Haskell calls
  a [functor](https://wiki.haskell.org/Functor)).
Furthermore, due to the properties of *f*, it and *f*\* commute with the determinant,
  as shown by the following diagram:

$$
\begin{gather*}
  f(\det(B)) = f(1) = I =\det(B^*)= \det(f^*(B))
  \\[10pt]
  \begin{CD}
    \mathbb{F}_4 {}^{2 \times 2}
      @>{\det}>>
      \mathbb{F}_4
    \\
    @V{f^*}VV ~ @VV{f}V
    \\
    (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2}
      @>>{\det}>
      \mathbb{F}_2 {}^{2 \times 2}
  \end{CD}
\end{gather*}
$$

It should be noted that the determinant strips off the *outer* matrix.
We could also consider the map **det**\* , where we apply the determinant
  to the internal matrices (in Haskell terms, `fmap det`).
This map isn't as nice though, since:

$$
\begin{align*}
  \det {}^*(B^*)
    &= \left(\begin{matrix}
      \det \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right)
        & \det \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right)
      \\
      \det \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
        & \det \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
    \end{matrix} \right)
    = \left(\begin{matrix}
      0 & 1 \\
      1 & 1
    \end{matrix} \right)
  \\ \\
  &\neq \left(\begin{matrix}
      1 & 0 \\
      0 & 1
    \end{matrix} \right)
    = \det(B^*)
\end{align*}
$$


### Polynomial Map

Much like how we can change the internal structure of matrices, we can do the same for polynomials.
For the purposes of demonstration, we'll work with $b = \lambda^2 + \alpha^2 \lambda + 1$,
  the characteristic polynomial of *B*, since it has coefficients in the domain of *f*.
We define the extended map $f^\bullet$ as:

$$
\begin{gather*}
  f^{\bullet} : \mathbb{F}_4[\lambda] \longrightarrow
     \mathbb{F}_2 {}^{2 \times 2}[\Lambda]
  \\
  f^{\bullet} (\lambda) = \Lambda \qquad
    f^{\bullet}(a) = f(a), \quad a \in \mathbb{F}_4
  \\ \\
  \begin{align*}
    b^{\bullet}
      = f^{\bullet}(b)
      &= f^{\bullet}(\lambda^2)
      &&+&& f^{\bullet}(\alpha^2)f^{\bullet}(\lambda)
      &&+&& f^{\bullet}(1)
    \\
    &= \Lambda^2
      &&+&& \left(\begin{matrix} 1 & 1 \\ 1 & 0\end{matrix}\right) \Lambda
      &&+&& \left(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}\right)
  \end{align*}
\end{gather*}
$$

Since we're looking at the characteristic polynomial of *B*, we might as well also look
  at the characteristic polynomial of *B*\*, its image under $f^*$.
We already looked at the determinant of this matrix, which is the constant term
  of the characteristic polynomial (up to sign).
Therefore, it's probably not surprising that $f^\bullet$ and the characteristic polynomial commute
  in a similar fashion to the determinant.

$$
\begin{gather*}
  \begin{align*}
    b^*
      &= \text{charpoly}(f^*(B))
      = \text{charpoly}
          \left(\begin{matrix}
            \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) &
            \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \\
            \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) &
            \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
          \end{matrix} \right)
    \\
    &= \Lambda^2 +
        \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) \Lambda +
        \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
      = f^{\bullet}(\text{charpoly}(B))
      =  b^\bullet
  \end{align*}
  \\ \\
  \begin{CD}
    \mathbb{F}_4 {}^{2 \times 2}
      @>{\text{charpoly}}>>
      \mathbb{F}_4[\lambda]
    \\
    @V{f^*}VV ~ @VV{f^\bullet}V
    \\
    (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2}
      @>>{\text{charpoly}}>
      (\mathbb{F}_2 {}^{2 \times 2})[\Lambda]
  \end{CD}
\end{gather*}
$$

It should also be mentioned that **charpoly**\*, taking the characteristic polynomials
  of the internal matrices, does *not* obey the same relationship.
For one, the type is wrong: the codomain is a matrix *containing* polynomials,
  rather than a polynomial over matrices.

There *does* happen to be an isomorphism between the two structures
  (a direction of which we'll discuss momentarily).
But even by converting to the proper type, we already have a counterexample in the constant term
  from taking **det**\* earlier.

$$
\begin{align*}
  \text{charpoly}^*(B^*)
    &= \left(\begin{matrix}
      \text{charpoly} \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) &
      \text{charpoly} \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \\
      \text{charpoly} \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) &
      \text{charpoly} \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
    \end{matrix} \right)
  \\
  &= \left(\begin{matrix}
                  \lambda^2 & \lambda^2 + \lambda + 1 \\
    \lambda^2 + \lambda + 1 & \lambda^2 + \lambda + 1
  \end{matrix} \right)
  \\
  &\cong
    \left(\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix} \right) \Lambda^2
    + \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \Lambda
    + \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right)
  \\ \\
  &\neq f^{\bullet}(\text{charpoly}(B))
\end{align*}
$$


Forgetting
----------

Clearly, layering matrices has several advantages over how we usually interpret block matrices.
But what happens if we *do* "forget" about the internal structure?

$$
\begin{gather*}
  \text{forget} : (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2}
  \longrightarrow \mathbb{F}_2 {}^{4 \times 4}
  \\ \\
  \hat B = \text{forget}(B^*)
    = \text{forget}\left(\begin{matrix}
      \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right)
        & \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right)
      \\
      \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
        & \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right)
    \end{matrix} \right)
    = \left(\begin{matrix}
      0 & 0 & 0 & 1 \\
      0 & 0 & 1 & 1 \\
      1 & 1 & 1 & 1 \\
      1 & 0 & 1 & 0
    \end{matrix} \right)
\end{gather*}
$$

<details>
<summary>
Haskell implementation of `forget`
</summary>

<!-- TODO: run in jupyter -->
```{.haskell}
forget :: Matrix (Matrix a) -> Matrix a
-- Massively complicated point-free way to forget matrices:
--   1. Convert internal matrices to lists of lists
--   2. Convert the external matrix to a list of lists
--   3. There are now four layers of lists. Transpose the second and third.
--   4. Concat the new third and fourth layers together
--   5. Concat the first and second layers together
--   6. Convert the list of lists back to a matrix
forget = toMatrix . concat . fmap (fmap concat . transpose) .
  fromMatrix . fmap fromMatrix
```

To see why this is the structure, remember that we need to work with rows
  of the external matrix at the same time.
We'd like to read across the whole row, but this involves descending into two matrices.
The `fmap transpose` allows us to collect rows in the way we expect.
For example, for the above matrix, We get `[[[0,0],[0,1]], [[0,0],[1,1]]]` after the transposition,
  which are the first two rows, grouped by the matrix they belonged to.
Then, we can finally get the desired row by `fmap (fmap concat)`ing  the rows together.
Finally, we `concat` once more to undo the column grouping.
</details>

Like *f*, `forget` preserves addition and multiplication, a fact already appreciated by block matrices.
Further, by *f*, the internal matrices multiply the same as elements of GF(4).
Hence, this shows us directly that GL(2, 4) is a subgroup of GL(4, 2).

However, an obvious difference between layered and "forgotten" matrices is
  the determinant and characteristic polynomial:

$$
\begin{align*}
  \det {B^*} &= \left(\begin{matrix}1 & 0 \\ 0 & 1\end{matrix}\right)
  \\ \\
  \det {\hat B} &= 1
\end{align*}
\qquad
\begin{align*}
  \text{charpoly}(B^*)
    &= \Lambda^2 +
    \left(\begin{matrix}1 & 1 \\ 1 & 0 \end{matrix}\right)\Lambda +
    \left(\begin{matrix}1 & 0 \\ 0 & 1\end{matrix}\right)
  \\ \\
    \text{charpoly}(\hat B)
      &= \lambda^4 + \lambda^3 + \lambda^2 + \lambda + 1\\
\end{align*}
$$


### Another Path to the Forgotten

It's a relatively simple matter to move between determinants, since it's straightforward
  to identify 1 and the identity matrix.
However, a natural question to ask is whether there's a way to reconcile or coerce
  the matrix polynomial into the "forgotten" one.

<!-- TODO: reorganize parts of second post? -->
First, let's formally establish a path from matrix polynomials to a matrix of polynomials.
We need only use our friend from the [second post](../2) -- polynomial evaluation.
Simply evaluating a matrix polynomial at *λI* converts our matrix indeterminate (*Λ*) into a scalar one (*λ*).

$$
\begin{align*}
  \text{eval}_{\Lambda \mapsto \lambda I}
    &: (\mathbb{F}_2 {}^{2 \times 2})[\Lambda]
    \rightarrow (\mathbb{F}_2[\lambda]) {}^{2 \times 2}
  \\
  &:: \quad
  r(\Lambda) \mapsto r(\lambda I)
  \\ \\
  \text{eval}_{\Lambda \mapsto \lambda I}(\text{charpoly}(B^*))
    &= (\lambda I)^2
    + \left(\begin{matrix}1 & 1 \\ 1 & 0 \end{matrix}\right)(\lambda I)
    + \left(\begin{matrix}1 & 0 \\ 0 & 1\end{matrix}\right)
   \\
   &= \left(\begin{matrix}
       \lambda^2 + \lambda + 1 & \lambda \\
       \lambda & \lambda^2 + 1
     \end{matrix}\right)
\end{align*}
$$

Since a matrix containing polynomials is still a matrix, we can then take its determinant.
What pops out is exactly what we were after...

$$
\begin{align*}
  \det(\text{eval}_{\Lambda \mapsto \lambda I}(\text{charpoly}(B^*)))
    &= (\lambda^2 + \lambda + 1)(\lambda^2 + 1) - \lambda^2
  \\
  &= \lambda^4 + \lambda^3 + \lambda^2 + \lambda + 1
  \\
  &= \text{charpoly}(\hat B)
\end{align*}
$$

...and we can arrange our maps into another diagram:

$$
\begin{gather*}
  \begin{CD}
    (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2}
      @>{\text{charpoly}}>>
      (\mathbb{F}_2 {}^{2 \times 2})[\Lambda]
    \\
    @V{\text{id}}VV ~ @VV{\text{eval}_{\Lambda \mapsto \lambda I}}V
    \\
    -
      @. (\mathbb{F}_2 [\lambda])^{2 \times 2}
    \\
    @V{\text{forget}}VV  ~ @VV{\det}V
    \\
    \mathbb{F}_2 {}^{4 \times 4}
      @>>{\text{charpoly}}>
      \mathbb{F}_2[\lambda]
  \end{CD}
  \\ \\
  \text{charpoly} \circ \text{forget}
    = \det \circ ~\text{eval}_{\Lambda \mapsto \lambda I} \circ\text{charpoly}
\end{gather*}
$$

<details>
<summary>
Haskell demonstration of this commutation
</summary>
Fortunately, the implementation of `charpoly` using Laplace expansion already works with numeric matrices.
Therefore, we need only define the special eval:

```{.haskell}
toMatrixPolynomial :: Num a => Polynomial (Matrix a) -> Matrix (Polynomial a)
-- Collect our coefficient matrices into a single matrix of polynomials
toMatrixPolynomial (Poly ps) = Mat $ array rs values where
  -- Technically, we're always working with square matrices, but we should
  -- always use the largest bounds available.
  (is,js)          = unzip $ map mDims ps
  rs               = ((0,0),(maximum is - 1,maximum js - 1))
  -- Address a matrix. This needs defaulting to zero to be fully correct
  -- with respect to the range given by `rs`
  access b (Mat m) = m!b
  -- Build the value at an address by addressing over the coefficients
  -- ps is already in rising coefficient order, so our values are too.
  values           = map (\r -> (r, Poly $ map (access r) $ ps)) (range rs)
```

Now we can simply observe:

<!-- TODO: run in jupyter -->
```{.haskell}
field4 = [zero 2, eye 2, toMatrix [[0,1],[1,1]], toMatrix [[1,1],[1,0]]]

mB = toMatrix $ [[field4!!0, field4!!2], [field4!!3, field4!!3]]

-- >>> mapM_ print $ fromMatrix $ forget mB
-- -- [0,0,0,1]
-- -- [0,0,1,1]
-- -- [1,1,1,1]
-- -- [1,0,1,0]

-- >>> fmap (`mod` 2) $ charpoly $ forget mB
-- -- 1x^4 + 1x^3 + 1x^2 + 1x + 1
-- >>> fmap (`mod` 2) $ determinant $ toMatrixPolynomial $ charpoly mB
-- -- 1x^4 + 1x^3 + 1x^2 + 1x + 1
```
</details>

It should be noted that we do *not* get the same results by taking the determinant after
  applying **charpoly**\*, indicating that the above method is "correct".

$$
\begin{align*}
  \text{charpoly}^*(B^*) &= \left(\begin{matrix}
                    \lambda^2 & \lambda^2 + \lambda + 1 \\
      \lambda^2 + \lambda + 1 & \lambda^2 + \lambda + 1
    \end{matrix}\right)
  \\ ~ \\
  \det( \text{charpoly}^*(B^*))
    &= \lambda^2(\lambda^2 + \lambda + 1) - (\lambda^2 + \lambda + 1)^2
  \\
  &= \lambda^3 + 1 \mod 2
\end{align*}
$$


### Cycles and Cycles

Since we can get $\lambda^4 + \lambda^3 + \lambda^2 + \lambda + 1$ in two ways,
  it's natural to assume this polynomial is significant in some way.
In the language of the the second post, the polynomial can also be written as ~2~31,
  whose root we determined was cyclic of order 5.
This happens to match the order of *B* in GL(2, 4).

Perhaps this is unsurprising, since there are only so many polynomials of degree 4 over GF(2).
However, the reason we see it is more obvious if we look at the powers of scalar multiples of *B*.
First, recall that *f*\* takes us from a matrix over GF(4) to a matrix of matrices of GF(2).
Then define a map *g* that gives us degree 4 polynomials:

::: {layout="[[1],[1,1,1]]"}
$$
\begin{gather*}
  g : \mathbb{F}_4^{2 \times 2} \rightarrow \mathbb{F}_2[\lambda]
  \\
  g = \text{charpoly} \circ \text{forget} \circ f^*
\end{gather*}
$$

$$
\begin{array}{}
  & \scriptsize \left(\begin{matrix}
           0 & \alpha   \\
    \alpha^2 & \alpha^2
  \end{matrix}\right)
  \\
  B & \overset{g}{\mapsto}
    & 11111_\lambda
  \\
  B^2 & \overset{g}{\mapsto}
    & 11111_\lambda
  \\
  B^3 & \overset{g}{\mapsto}
    & 11111_\lambda
  \\
  B^4 & \overset{g}{\mapsto}
    & 11111_\lambda
  \\
  B^5 & \overset{g}{\mapsto}
    & 10001_\lambda
\end{array}
$$

$$
\begin{array}{}
  & \scriptsize \left(\begin{matrix}
    0 & \alpha^2 \\
    1 &        1
  \end{matrix}\right)
  \\
  \alpha B & \overset{g}{\mapsto}
    & 10011_\lambda
  \\
  (\alpha B)^2 & \overset{g}{\mapsto}
    & 10011_\lambda
  \\
  (\alpha B)^3 & \overset{g}{\mapsto}
    & 11111_\lambda
  \\
  (\alpha B)^4 & \overset{g}{\mapsto}
    & 10011_\lambda
  \\
  (\alpha B)^5 & \overset{g}{\mapsto}
    & 10101_\lambda
\end{array}
$$

$$
\begin{array}{}
  & \scriptsize \left(\begin{matrix}
         0 &      1 \\
    \alpha & \alpha
  \end{matrix}\right)
  \\
  \alpha^2 B & \overset{g}{\mapsto}
    & 11001_\lambda
  \\
  (\alpha^2 B)^2 & \overset{g}{\mapsto}
    & 11001_\lambda
  \\
  (\alpha^2 B)^3 & \overset{g}{\mapsto}
    & 11111_\lambda
  \\
  (\alpha^2 B)^4 & \overset{g}{\mapsto}
    & 11001_\lambda
  \\
  (\alpha^2 B)^5 & \overset{g}{\mapsto}
    & 10101_\lambda
\end{array}
$$
:::

The matrices in the middle and rightmost columns both have order 15 inside GL(2, 4).
Correspondingly,  both 10011~λ~ = ~2~19 and 11001~λ~ = ~2~25 are primitive,
  and so have roots of order 15 over GF(2).


### A Field?

Since we have 15 matrices generated by the powers of one, you might wonder whether or not
  they can correspond to the nonzero elements of GF(16).
And they can!
In a sense, we've "borrowed" the order 15 elements from this "field" within GL(4, 2).
However, none of the powers of this matrix are the companion matrix of either ~2~19 or ~2~25.

<details>
<summary>
Haskell demonstration of the field-like-ness of these matrices
</summary>

All we really need to do is test additive closure, since the powers trivially commute and include the identity matrix.

<!-- TODO: run in jupyter -->
```{.haskell}
hasAdditiveClosure :: Integral a => Int -> a -> [Matrix a] -> bool
-- Check whether n x n matrices (mod p) have additive closure
-- Supplement the identity, even if it is not already present
hasAdditiveClosure n p xs = all (`elem` xs') sums where
  -- Add in the zero matrix
  xs' = zero n:xs
  -- Calculate all possible sums of pairs (mod p)
  sums = map (fmap (`mod` p)) $ (+) <$> xs' <*> xs'


generatesField :: Integral a => Int -> a -> Matrix a -> bool
-- Generate the powers of x, then test if they form a field (mod p)
generatesField n p x = hasAdditiveClosure n p xs where
  xs = map (fmap (`mod` p) . (x^)) [1..p^n-1]

alphaB = toMatrix [[zero 2, field4!!3],[eye 2, eye 2]]

-- >>> mapM_ $ print $ fromMatrix $ forget alphaB
-- -- [0,0,1,1]
-- -- [0,0,1,0]
-- -- [1,0,1,0]
-- -- [0,1,0,1]
--
-- >>> generatesField 4 2 $ forget $ alphaB
-- -- True
```
</details>

More directly, we might also observe that *α*^2^*B* is the companion matrix of
  an irreducible polynomial over GF(4), namely $q(x) = x^2 - \alpha x - \alpha$.

Both the "forgotten" matrices and the aforementioned companion matrices lie within GL(4, 2).
A natural question to ask is whether we can make fields by the following process:

1. Filter out all order-15 elements of GL(4, 2)
2. Partition the elements and their powers into their respective order-15 subgroups
3. Add the zero matrix into each class
4. Check whether all classes are additively closed (and are therefore fields)

In this case, it happens to be true, but proving this in general is difficult, and I haven't done so.


Expanding Dimensions
--------------------

Of course, we need not only focus on GF(4) -- we can just as easily work over GL(2, 2*r*) for other *r* than 2.
In this case, the internal matrices will be *r*×*r* while the external one remains 2×2.
But neither do we have to work exclusively with 2×2 matrices -- we can work over GL(*n*, 2^*r*^).
In either circumstance, the "borrowing" of elements of larger order still occurs.
This is summarized by the following diagram:

$$
\begin{CD}
  \underset{
    \scriptsize S \text{ (order $k$)}
  }{
    \text{SL}(n,2^r)
  }
    @>>>
    \underset{
      \scriptsize
      \begin{matrix}
        S \text{ (order $k$)} \\
        T \text{ (order $2^{nr}-1$)}
      \end{matrix}
    }{
      \text{GL}(n, 2^r)
    }
    @>{\text{forget} \circ f_{r}^*}>>
    {\text{GL}(nr, 2)}
    @<{f_{nr}}<<
    \underset{
      \scriptsize
      \begin{matrix}
        s \text{ (order $k$)} \\
        t \text{ (order $2^{nr}-1$)}
      \end{matrix}
    }{
      \mathbb{F}_{2^{nr}}
    }
\end{CD}
$$

Here, *f*~*r*~ is our map from GF(2^*r*^) to *r*×*r* matrices and *f*~*nr*~ is a similar map.
*r* must greater than 1 for us to properly make use of matrix arithmetic.
Similarly, *n* must be greater than 1 for the leftmost GL.
Thus, *nr* is a composite number.
Here, *k* is a proper factor of 2^*nr*^ - 1.
In the prior discussion, *k* was 5 and 2^*nr*^ - 1 was 15.

Recall that primitive polynomials over GF(2^*nr*^) have roots with order 2^*nr*^ - 1.
This number can *never* be prime, since the only primes of the form
  2^*p*^ - 1 are Mersenne primes -- *p* itself must be prime.
Thus, in GL of prime dimensions, we can never loan to a GL over a field
  of larger order with the same characteristic.
Conversely, GL(*nr* + 1, 2) trivially contains GL(*nr*, 2) by fixing a subspace.
So we do eventually see elements of order 2^*m*^ - 1 for either prime or composite *m*.


### Other Primes

This concern about prime dimensions is unique to characteristic 2.
For any other prime *p*, *p*^*m*^ - 1 is composite since it is at the very least even.
All other remarks about the above diagram should still hold for any other prime *p*.

In addition, our earlier diagram where we correspond the order of an element in GL(2, 2^2^)
  with the order of an element in GF(2^2×2^) via the characteristic polynomial also generalizes.
Though I have not proven it, I strongly suspect the following diagram commutes,
  at least in the case where *K* is a finite field:

$$
\begin{CD}
  (K^{r \times r})^{n \times n}
    @>{\text{charpoly}}>>
    (K^{r \times r})[\Lambda]
  \\
  @V{\text{id}}VV ~ @VV{\text{eval}_{\Lambda \mapsto \lambda I}}V
  \\
    -
    @. (K [\lambda])^{r \times r}
  \\
  @V{\text{forget}}VV  ~ @VV{\det}V
  \\
  K^{nr \times nr}
    @>>{\text{charpoly}}>
    K[\lambda]
\end{CD}
$$

Over larger primes, the gap between GL and SL may grow ever larger,
  but SL over a prime power field seems to inject into SL over a prime field.
If the above diagram is true, then the prior statement follows.


### Monadicity and Injections

The action of forgetting the internal structure may sound somewhat familiar if you know your Haskell.
Remember that for lists, we can do something similar
  -- converting `[[1,2,3],[4,5,6]]` to `[1,2,3,4,5,6]` is just a matter of applying `concat`.
But this is an instance in which we know lists to behave like a [monad](https://wiki.haskell.org/Monad).
Despite being an indecipherable bit of jargon to newcomers, it just means we:

1. can apply functions inside the structure (for example, to the elements of a list),
2. have a sensible injection into the structure (creating singleton lists, called `return`), and
3. can reduce two layers to one (concat, or join for monads in general).
    - Monads are traditionally defined using the operator  `>>=`, but `join = (>>= id)`

Just comparing the types of `join :: Monad m => m (m a) -> m a`
  and `forget :: Matrix (Matrix a) -> Matrix a` suggests that `Matrix` (meaning square matrices)
  could be a monad, and further, one which respects addition and multiplication.
Of course, **this is only true when our internal matrices are all the same size**.
In the above diagrams, this restriction has applied, but should be stated explicitly
  since no dimension is specified by `Matrix a`.

However, we run into difficulty at condition 2.
For one, only "numbers" (elements of a ring) can go inside matrices.
This restricts where monadicity can hold.
More importantly, we have a *lot* of freedom in what dimension we choose to inject into.
For example, we might pick a `return` that uses 1×1 matrices (which add no additional structure).
We might also pick `return2`, which scalar-multiplies its argument to a 2×2 identity matrix instead.

Unfortunately, there's no good answer.
At the very least, we can close our eyes and pretend that we have a nice diagram:

$$
\begin{gather*}
  \begin{matrix}
    & L\underset{\text{degree } r}{/} K
    \\ \\
    \small f
      & \begin{matrix} | \\ \downarrow \end{matrix}
    \\ \\
    & K^{r \times r}
  \end{matrix}
  & \quad & \quad
  & \begin{matrix}
    & (L\underset{\text{degree } r}{/} K)^{n \times n}
    \\ \\
    \small f^* &
      \begin{matrix} | \\ \downarrow \end{matrix}
      & \searrow & \small \texttt{>>=} ~ f \qquad
    \\ \\
    & (K^{r \times r})^{n \times n}
      & \underset{\text{forget}} {\longrightarrow}
      & K {}^{nr \times nr}
  \end{matrix}
\end{gather*}
$$

As one last note on the monadicity of matrices, I *have* played around with an alternative `Matrix`
  type which includes scalars alongside proper matrices, which would allow for
  a simple canonical injection.
Unfortunately, it complicates `join` -- we just place the responsibility of sizing the internal matrices
  front-and-center since we can correspond internal scalars with identity matrices.


Closing
-------

At this point, I've gone on far too long about algebra.
One nagging curiosity makes me wonder whether the there are any diagrams like the following:

$$
\begin{matrix}
  & (L\underset{\text{degree } r}{/} K)^{n \times n}
    & & & & (L\underset{\text{degree } n}{/} K)^{r \times r}
  \\ \\
  \small f_1^*
    & \begin{matrix} | \\ \downarrow \end{matrix}
    & \searrow & & \swarrow
    & \begin{matrix} | \\ \downarrow \end{matrix}
    & f_2^*
  \\ \\
  & (K^{r \times r})^{n \times n}
    & \underset{\text{forget}} {\longrightarrow}
    & K {}^{nr \times nr}
    & \underset{\text{forget}}{\longleftarrow}
    & (K^{n \times n})^{r \times r}
\end{matrix}
$$

Or in English, whether "rebracketing" certain *nr*×*nr* matrices can be traced back to
  not only a degree *r* field extension, but also one of degree *n*.

The mathematician in me tells me to believe in well-defined structures.
Matrices are one such structure, with myriad applications.
However, the computer scientist in me laments that the application of these structures is
  buried in symbols and that layering them is at most glossed over.
There is clear utility and interest in doing so, otherwise the diagrams shown above would not exist.

Of course, there's plenty of reason *not* to go down this route.
For one, it's plainly inefficient -- GPUs are *built* on matrix operations being as efficient as possible,
  i.e., without the layering.
It's also inefficient to learn for people *just* learning matrices.
I'd still argue that the method is efficient for learning about more complex topics, like field extensions.