zenzicubi.co/posts/finite-field/4/index.qmd

---
title: "Exploring Finite Fields, Part 4: The Power of Forgetting"
description: |
  Or: how I stopped learned to worrying and appreciate the Monad.
format:
  html:
    html-math-method: katex
date: "2024-02-03"
date-modified: "2025-08-05"
categories:
  - algebra
  - finite field
  - haskell
---

```{haskell}
--| echo: false

:l Previous.hs

import Data.Array (ixmap, bounds, (!), array, range)
import Data.List (intercalate)
import IHaskell.Display (markdown)

import Previous (
    Matrix(Mat, unMat), Polynomial(Poly, coeffs),
    asPoly, evalPoly, synthDiv,
    companion, zero, eye, fromMatrix, toMatrix
  )

-- Explicit Matrix operations
(|+|) :: Num a => Matrix a -> Matrix a -> Matrix a
(|+|) = (+)

(|*|) :: Num a => Matrix a -> Matrix a -> Matrix a
(|*|) = (*)

-- Original determinant implementation, instead of the imported one based on Faddeev-Leverrier
determinant :: (Num a, Eq a) => Matrix a -> a
determinant (Mat xs) = determinant' xs where
  -- Evaluate (-1)^i without repeated multiplication
  parity i = if even i then 1 else -1
  -- Map old array addresses to new ones when eliminating row 0, column i
  rowMap i (x,y) = (x+1, if y >= i then y+1 else y)
  -- Recursive determinant Array
  determinant' xs
    -- Base case: 1x1 matrix
    | n == 0    = xs!(0,0)
    -- Sum of cofactor expansions
    | otherwise = sum $ map cofactor [0..n] where
      -- Produce the cofactor of row 0, column i
      cofactor i
        | xs!(0,i) == 0 = 0
        | otherwise     = (parity i) * xs!(0,i) * determinant' (minor i)
      -- Furthest extent of the bounds, i.e., the size of the matrix
      (_,(n,_)) = bounds xs
      -- Build a new Array by eliminating row 0 and column i
      minor i = ixmap ((0,0),(n-1,n-1)) (rowMap i) xs

-- Characteristic Polynomial
charpoly :: (Num a, Eq a) => Matrix a -> Polynomial a
charpoly xs = determinant $ eyeLambda |+| negPolyXs where
  -- Furthest extent of the bounds, i.e., the size of the matrix
  (_,(n,_)) = bounds $ unMat xs
  -- Negative of input matrix, after being converted to polynomials
  negPolyXs = fmap (\x -> Poly [-x]) xs
  -- Identity matrix times lambda (encoded as Poly [0, 1])
  eyeLambda = (\x -> Poly [x] * Poly [0, 1]) <$> eye (n+1)

-- Convert Polynomial to LaTeX string
texifyPoly' :: (Num a, Eq a) => String -> (a -> String) -> Polynomial a -> String
texifyPoly' var f (Poly xs) = texify' $ zip xs [0..] where
  texify' [] = "0"
  texify' ((c, n):xs)
    | all ((==0) . fst) xs = showPow c n
    | c == 0               = texify' xs
    | otherwise            = showPow c n ++ " + " ++ texify' xs
  showPow c 0 = f c
  showPow 1 1 = var
  showPow c 1 = f c ++ showPow 1 1
  showPow 1 n = var ++ "^{" ++ show n ++ "}"
  showPow c n = f c ++ showPow 1 n

-- Convert Polynomial to LaTeX string
texifyPoly :: (Num a, Eq a, Show a) => Polynomial a -> String
texifyPoly = texifyPoly' "x" show

texPolyAsPositional' x (Poly xs) = (++ "_{" ++ x ++ "}") $
  reverse xs >>= (\x -> if x < 0 then "\\bar{" ++ show (-x) ++ "}" else show x)

texPolyAsPositional = texPolyAsPositional' "x"

-- Convert matrix to LaTeX string
texifyMatrix' :: (a -> String) -> Matrix a -> String
texifyMatrix' f mat = surround mat' where
  mat' = intercalate " \\\\ " $ map (intercalate " & " . map f) $
    fromMatrix mat
  surround = ("\\left( \\begin{matrix}" ++) . (++ "\\end{matrix} \\right)")

texifyMatrix :: Show a => Matrix a -> String
texifyMatrix = texifyMatrix' show
```


The [last post](../3) in this series focused on understanding some small linear groups
  and implementing them on the computer over both a prime field and prime power field.

The prime power case was particularly interesting.
First, we adjoined the roots of a polynomial to the base field, GF(2).
Rather than the traditional means of adding new symbols like *α*, we used companion matrices,
  which behave the same arithmetically.
For example, for the smallest prime power field, GF(4), we use the polynomial $p(x) = x^2 + x + 1$,
  and map its symbolic roots (*α* and *α*^2^), to matrices over GF(2):

$$
\begin{gather*}
  f : \mathbb{F}_4 \longrightarrow \mathbb{F}_2 {}^{2 \times 2}
  \\ \\
  \begin{gather*}
    f(0) = {\bf 0} =
      \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix}\right)
      & f(1) = I
      = \left(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}\right)
    \\
    f(\alpha) = C_p
      = \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix}\right)
      & f(\alpha^2) = C_p {}^2
      = \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix}\right)
  \end{gather*}
  \\ \\
  f(a + b)= f(a) + f(b), \quad f(ab) = f(a)f(b)
\end{gather*}
$$

```{haskell}
--| code-fold: true
--| code-summary: "Equivalent Haskell"

data F4 = ZeroF4 | OneF4 | AlphaF4 | Alpha2F4 deriving Eq
field4 = [ZeroF4, OneF4, AlphaF4, Alpha2F4]

instance Show F4 where
  show ZeroF4   = "0"
  show OneF4    = "1"
  show AlphaF4  = "α"
  show Alpha2F4 = "α^2"

-- Addition and multiplication over F4
instance Num F4 where
  (+) ZeroF4 x = x
  (+) OneF4 AlphaF4 = Alpha2F4
  (+) OneF4 Alpha2F4 = AlphaF4
  (+) AlphaF4 Alpha2F4 = OneF4
  (+) x y = if x == y then ZeroF4 else y + x

  (*) ZeroF4 x = ZeroF4
  (*) x ZeroF4 = ZeroF4
  (*) OneF4 x = x
  (*) AlphaF4 AlphaF4 = Alpha2F4
  (*) Alpha2F4 Alpha2F4 = AlphaF4
  (*) AlphaF4 Alpha2F4 = OneF4
  (*) x y = y * x

  abs = id
  negate = id
  signum = id
  fromInteger = (cycle field4 !!) . fromInteger


-- Companion matrix of `p`, an irreducible polynomial of degree 2 over GF(2)
cP :: (Num a, Eq a, Integral a) => Matrix a
cP = companion $ Poly [1, 1, 1]

f ZeroF4   = zero 2
f OneF4    = eye 2
f AlphaF4  = cP
f Alpha2F4 = (`mod` 2) <$> cP |*| cP

field4M = map f field4
```

Finally, we constructed GL(2, 4) using matrices of matrices
  -- not [block matrices](https://en.wikipedia.org/wiki/Block_matrix)!
This post will focus on studying this method in slightly more detail.


Reframing the Path Until Now
----------------------------

In the above description, we already mentioned larger structures over GF(2),
  namely polynomials and matrices.
Since GF(4) can itself be described with matrices over GF(2),
  we can generalize *f* to give us two more maps:

- $f^*$, which converts matrices over GF(4) to double-layered matrices over GF(2), and
- $f^\bullet$, which converts polynomials over GF(4) to polynomials of matrices over GF(2)


### Matrix Map

We examined the former map briefly in the previous post.
More explicitly, we looked at a matrix *B* in SL(2, 4) which had the property
  that it was cyclic of order five.
Then, to work with it without relying on symbols, we simply applied *f* over the contents of the matrix.

```{haskell}
--| code-fold: true

-- Starred maps are instances of fmap composed with modding out
-- by the characteristic

fStar :: (Eq a, Num a, Integral a) => Matrix F4 -> Matrix (Matrix a)
fStar = fmap (fmap (`mod` 2) . f)

mBOrig = toMatrix [[ZeroF4, AlphaF4], [Alpha2F4, Alpha2F4]]
mBStar = fStar mBOrig

markdown $ "$$\\begin{gather*}" ++ concat [
    -- First row, type of fStar
    "f^* : \\mathbb{F}_4 {}^{2 \\times 2}" ++
      "\\longrightarrow" ++
      "(\\mathbb{F}_2 {}^{2 \\times 2})^{2 \\times 2}" ++
      "\\\\[10pt]",
    -- Second row, B
    "B = " ++ texifyMatrix' show mBOrig ++
      "\\\\",
    -- Third row, B*
    "B^* = f^*(B) = " ++
      texifyMatrix' (\x -> "f(" ++ show x ++ ")") mBOrig ++ " = " ++
      texifyMatrix' (texifyMatrix' show) mBStar
  ] ++
  "\\end{gather*}$$"
```

We can do this because a matrix contains values in the domain of *f*, thus uniquely determining
  a way to change the internal structure (what Haskell calls
  a [functor](https://wiki.haskell.org/Functor)).
Furthermore, due to the properties of *f*, it and *f*\* commute with the determinant,
  as shown by the following diagram:

$$
\begin{gather*}
  f(\det(B)) = f(1) = I =\det(B^*)= \det(f^*(B))
  \\[10pt]
  \begin{CD}
    \mathbb{F}_4 {}^{2 \times 2}
      @>{\det}>>
      \mathbb{F}_4
    \\
    @V{f^*}VV ~ @VV{f}V
    \\
    (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2}
      @>>{\det}>
      \mathbb{F}_2 {}^{2 \times 2}
  \end{CD}
\end{gather*}
$$

It should be noted that the determinant strips off the *outer* matrix.
We could also consider the map **det**\* , where we apply the determinant
  to the internal matrices (in Haskell terms, `fmap determinant`).
This map isn't as nice though, since:

```{haskell}
--| code-fold: true

markdown $ "$$\\begin{align*}" ++ concat [
    -- First row, det* of B
    "\\det {}^*(B^*) &= " ++
      texifyMatrix' (("\\det" ++) . texifyMatrix' show) mBStar ++ " = " ++
      texifyMatrix ((`mod` 2) . determinant <$> mBStar) ++
      "\\\\ \\\\",
    -- Second row, determinant of B*
    -- Note how the commutation between `determinant` and <$> fails
    "&\\neq" ++
      texifyMatrix ((`mod` 2) <$> determinant mBStar) ++ " = " ++
      "\\det(B^*)",
    ""
  ] ++
  "\\end{align*}$$"
```


### Polynomial Map

Much like how we can change the internal structure of matrices, we can do the same for polynomials.
For the purposes of demonstration, we'll work with $b = \lambda^2 + \alpha^2 \lambda + 1$,
  the characteristic polynomial of *B*, since it has coefficients in the domain of *f*.
We define the extended map $f^\bullet$ as:

```{haskell}
--| code-fold: true

-- Bulleted maps are also just instances of fmap, like the starred maps

fBullet :: (Eq a, Num a, Integral a) => Polynomial F4 -> Polynomial (Matrix a)
fBullet = fmap (fmap (`mod` 2) . f)
```

$$
\begin{gather*}
  f^{\bullet} : \mathbb{F}_4[\lambda] \longrightarrow
     \mathbb{F}_2 {}^{2 \times 2}[\Lambda]
  \\
  f^{\bullet} (\lambda) = \Lambda \qquad
    f^{\bullet}(a) = f(a), \quad a \in \mathbb{F}_4
  \\ \\
  \begin{align*}
    b^{\bullet}
      = f^{\bullet}(b)
      &= f^{\bullet}(\lambda^2)
      &&+&& f^{\bullet}(\alpha^2)f^{\bullet}(\lambda)
      &&+&& f^{\bullet}(1)
    \\
    &= \Lambda^2
      &&+&& \left(\begin{matrix} 1 & 1 \\ 1 & 0\end{matrix}\right) \Lambda
      &&+&& \left(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}\right)
  \end{align*}
\end{gather*}
$$

Since we're looking at the characteristic polynomial of *B*, we might as well also look
  at the characteristic polynomial of *B*\*, its image under $f^*$.
We already looked at the determinant of this matrix, which is the constant term
  of the characteristic polynomial (up to sign).
Therefore, it's probably not surprising that $f^\bullet$ and the characteristic polynomial commute
  in a similar fashion to the determinant.

```{haskell}
--| code-fold: true

bStar = fmap (fmap (`mod` 2)) $ charpoly $ fStar mBOrig
bBullet = fmap (fmap (`mod` 2)) $ fBullet $ charpoly mBOrig

if bStar /= bBullet then
    markdown "$b^\\star$ and $b^\\bullet$ are not equal!"
  else
    markdown $ "$$\\begin{align*}" ++ concat [
        "b^* &= \\text{charpoly}(f^*(B)) = \\text{charpoly} " ++
          texifyMatrix' (texifyMatrix' show) mBStar ++
          "\\\\",
        "&= " ++
          texifyPoly' "\\Lambda" (texifyMatrix' show) bStar ++ " = " ++
          "f^\\bullet(\\text{charpoly}(B)) = b^\\bullet",
        ""
      ] ++
      "\\end{align*}$$"
```

$$
\begin{CD}
  \mathbb{F}_4 {}^{2 \times 2}
    @>{\text{charpoly}}>>
    \mathbb{F}_4[\lambda]
  \\
  @V{f^*}VV ~ @VV{f^\bullet}V
  \\
  (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2}
    @>>{\text{charpoly}}>
    (\mathbb{F}_2 {}^{2 \times 2})[\Lambda]
\end{CD}
$$

It should also be mentioned that **charpoly**\*, taking the characteristic polynomials
  of the internal matrices, does *not* obey the same relationship.
For one, the type is wrong: the codomain is a matrix *containing* polynomials,
  rather than a polynomial over matrices.

There *does* happen to be an isomorphism between the two structures
  (a direction of which we'll discuss momentarily).
But even by converting to the proper type, we already have a counterexample in the constant term
  from taking **det**\* earlier.

```{haskell}
--| code-fold: true

markdown $ "$$\\begin{align*}" ++ concat [
    "\\text{charpoly}^*(B^*) &= " ++
      texifyMatrix' (("\\text{charpoly}" ++) . texifyMatrix' show) mBStar ++
      "\\\\",
    "&= " ++
      texifyMatrix' (texifyPoly' "\\lambda" show)
        (fmap (fmap (`mod` 2) . charpoly) mBStar) ++
      "\\\\",
    "&\\cong " ++
      -- Not constructing this by isomorphism yet
      texifyPoly' "\\Lambda" texifyMatrix
        (Poly [
            toMatrix [[0,1], [1,1]],
            toMatrix [[0,1], [1,1]],
            toMatrix [[1,1], [1,1]]
          ]) ++
      "\\\\ \\\\",
    "&\\neq f^\\bullet(\\text{charpoly}(B))"
  ] ++
  "\\end{align*}$$"
```


Forgetting
----------

Clearly, layering matrices has several advantages over how we usually interpret block matrices.
But what happens if we *do* "forget" about the internal structure?

```{haskell}
--| code-fold: true
--| code-summary: "Haskell implementation of `forget`"

import Data.List (transpose)

-- Massively complicated point-free way to forget double matrices:
--   1. Convert internal matrices to lists of lists
--   2. Convert the external matrix to a list of lists
--   3. There are now four layers of lists. Transpose the second and third.
--   4. Concat the new third and fourth layers together
--   5. Concat the first and second layers together
--   6. Convert the list of lists back to a matrix
forget :: Matrix (Matrix a) -> Matrix a
forget = toMatrix . concatMap (fmap concat . transpose) .
  fromMatrix . fmap fromMatrix

-- To see why this is the structure, remember that we need to work with rows
--   of the external matrix at the same time.
-- We'd like to read across the whole row, but this involves descending into two matrices.
-- The `fmap transpose` allows us to collect rows in the way we expect.
-- For example, for the above matrix, We get `[[[0,0],[0,1]], [[0,0],[1,1]]]` after the transposition,
--   which are the first two rows, grouped by the matrix they belonged to.
-- Then, we can finally get the desired row by `fmap (fmap concat)`ing  the rows together.
-- Finally, we `concat` once more to undo the column grouping.

mBHat = forget mBStar

markdown $ "$$\\begin{gather*}" ++ concat [
    "\\text{forget} : (\\mathbb{F}_2 {}^{2 \\times 2})^{2 \\times 2}" ++
      "\\longrightarrow \\mathbb{F}_2 {}^{4 \\times 4}" ++
      "\\\\[10pt]",
    "\\hat B = \\text{forget}(B^*) = \\text{forget}" ++
      texifyMatrix' (texifyMatrix' show) mBStar ++ " = " ++
      texifyMatrix mBHat,
    ""
  ] ++
  "\\end{gather*}$$"
```

Like *f*, `forget` preserves addition and multiplication, a fact already appreciated by block matrices.
Further, by *f*, the internal matrices multiply the same as elements of GF(4).
Hence, this shows us directly that GL(2, 4) is a subgroup of GL(4, 2).

However, an obvious difference between layered and "forgotten" matrices is
  the determinant and characteristic polynomial:

```{haskell}
--| code-fold: true

markdown $ "$$\\begin{align*}" ++ intercalate " \\\\ \\\\ " (
  map (intercalate " & ") [
    [
      "\\det B^* &= " ++
        texifyMatrix ((`mod` 2) <$> determinant mBStar),
      "\\text{charpoly} B^* &= " ++
        texifyPoly' "\\Lambda" texifyMatrix (fmap (`mod` 2) <$> charpoly mBStar)
    ], [
      "\\det \\hat B &= " ++
        show ((`mod` 2) $ determinant mBHat),
      "\\text{charpoly} \\hat B &= " ++
        texifyPoly' "\\lambda" show ((`mod` 2) <$> charpoly mBHat)
    ]
  ]) ++
  "\\end{align*}$$"
```


### Another Forgotten Path

It's a relatively simple matter to move between determinants, since it's straightforward
  to identify 1 and the identity matrix.
However, a natural question to ask is whether there's a way to reconcile or coerce
  the matrix polynomial into the "forgotten" one.

First, let's formally establish a path from matrix polynomials to a matrix of polynomials.
We need only use our friend from the [second post](../2) -- polynomial evaluation.
Simply evaluating a matrix polynomial *r* at *λI* converts our matrix indeterminate (*Λ*)
  into a scalar one (*λ*).

$$
\begin{align*}
  \text{eval}_{\Lambda \mapsto \lambda I}
    &: (\mathbb{F}_2 {}^{2 \times 2})[\Lambda]
    \rightarrow (\mathbb{F}_2[\lambda]) {}^{2 \times 2}
  \\
  &:: \quad
  r(\Lambda) \mapsto r(\lambda I)
\end{align*}
$$

```{haskell}
--| code-fold: true

-- Function following from the evaluation definition above
-- Note that `Poly . pure` is used to transform matrices of `a`
--   into matrices of polynomials.

toMatrixPolynomial :: (Eq a, Num a) =>
   Polynomial (Matrix a) -> Matrix (Polynomial a)
toMatrixPolynomial xs = evalPoly eyeLambda $ fmap (fmap (Poly . pure)) xs where
  -- First dimensions of the coefficients
  (is, _)    = unzip $ map (snd . bounds . unMat) $ coeffs xs
  -- Properly-sized identity matrix times a scalar lambda
  eyeLambda  = eye (1 + maximum is) * toMatrix [[Poly [0, 1]]]


markdown $ "$$\\begin{align*}" ++
  "\\text{eval}_{\\Lambda \\mapsto \\lambda I}(\\text{charpoly}(B^*)) &=" ++
  texifyPoly' "(\\lambda I)" texifyMatrix
    (fmap (`mod` 2) <$> charpoly mBStar) ++
  "\\\\ &= " ++
  texifyMatrix' (texifyPoly' "\\lambda" show)
    (toMatrixPolynomial $ fmap (`mod` 2) <$> charpoly mBStar) ++
  "\\end{align*}$$"
```

Since a matrix containing polynomials is still a matrix, we can then take its determinant.
What pops out is exactly what we were after...

```{haskell}
--| code-fold: true

markdown $ "$$\\begin{align*}" ++
  "\\det(\\text{eval}_{\\Lambda \\mapsto \\lambda I}(" ++
    "\\text{charpoly}(B^*))) &=" ++
    "(1 + \\lambda + \\lambda^2)(1 + \\lambda^2) - \\lambda^2" ++
  "\\\\ &=" ++
  texifyPoly' "\\lambda" show
    (fmap (`mod` 2) <$> determinant $ toMatrixPolynomial $ charpoly mBStar) ++
  "\\\\ &= \\text{charpoly}{\\hat B}" ++
  "\\end{align*}$$"
```

...and we can arrange our maps into another diagram:

$$
\begin{gather*}
  \begin{CD}
    (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2}
      @>{\text{charpoly}}>>
      (\mathbb{F}_2 {}^{2 \times 2})[\Lambda]
    \\
    @V{\text{id}}VV ~ @VV{\text{eval}_{\Lambda \mapsto \lambda I}}V
    \\
    -
      @. (\mathbb{F}_2 [\lambda])^{2 \times 2}
    \\
    @V{\text{forget}}VV  ~ @VV{\det}V
    \\
    \mathbb{F}_2 {}^{4 \times 4}
      @>>{\text{charpoly}}>
      \mathbb{F}_2[\lambda]
  \end{CD}
  \\ \\
  \text{charpoly} \circ \text{forget}
    = \det \circ ~\text{eval}_{\Lambda \mapsto \lambda I} \circ\text{charpoly}
\end{gather*}
$$

It should be noted that we do *not* get the same results by taking the determinant after
  applying **charpoly**\*, indicating that the above method is "correct".

```{haskell}
--| code-fold: true

markdown $ "$$\\begin{align*}" ++
  "\\text{charpoly}^*(B^*) &=" ++
  texifyMatrix' (texifyPoly' "\\lambda" show)
    (fmap (`mod` 2) <$> fmap charpoly mBStar) ++
  "\\\\ \\\\" ++
  "\\det(\\text{charpoly}^*(B^*)) &=" ++
  "\\lambda^2(1 + \\lambda + \\lambda^2) - (1 + \\lambda + \\lambda^2)^2" ++
  "\\\\ &= " ++
  texifyPoly' "\\lambda" show
    (fmap (`mod` 2) <$> determinant $ fmap charpoly mBStar) ++
  "\\end{align*}$$"
```


### Cycles and Cycles

Since we can get $\lambda^4 + \lambda^3 + \lambda^2 + \lambda + 1$ in two ways,
  it's natural to assume this polynomial is significant in some way.
In the language of the the second post, the polynomial can also be written as ~2~31,
  whose root we determined was cyclic of order 5.
This happens to match the order of *B* in GL(2, 4).

Perhaps this is unsurprising, since there are only so many polynomials of degree 4 over GF(2).
However, the reason we see it is more obvious if we look at the powers of scalar multiples of *B*.
First, recall that *f*\* takes us from a matrix over GF(4) to a matrix of matrices of GF(2).
Then define a map *g* that gives us degree 4 polynomials:

$$
\begin{gather*}
  g : \mathbb{F}_4^{2 \times 2} \rightarrow \mathbb{F}_2[\lambda]
  \\
  g = \text{charpoly} \circ \text{forget} \circ f^*
\end{gather*}
$$

```{haskell}
--| code-fold: true
--| layout-ncol: 3

g = fmap (`mod` 2) . charpoly . forget . fStar

showSeries varName var = "$$\\begin{array}{}" ++
  " & \\scriptsize " ++
  texifyMatrix var ++
  "\\\\" ++
  intercalate " \\\\ " [
      (if n == 1 then varName' else varName' ++ "^{" ++ show n ++ "}") ++
        "& \\overset{g}{\\mapsto} &" ++
        texPolyAsPositional' "\\lambda" (g $ var^n)
      | n <- [1..5]
    ] ++
  "\\end{array}$$" where
  varName' = if length varName == 1 then varName else "(" ++ varName ++ ")"

markdown $ showSeries "B" mBOrig
markdown $ showSeries "αB" (fmap (AlphaF4*) mBOrig)
markdown $ showSeries "α^2 B" (fmap (Alpha2F4*) mBOrig)
```

The matrices in the middle and rightmost columns both have order 15 inside GL(2, 4).
Correspondingly,  both 10011~λ~ = ~2~19 and 11001~λ~ = ~2~25 are primitive,
  and so have roots of order 15 over GF(2).


### A Field?

Since we have 15 matrices generated by the powers of one, you might wonder whether or not
  they can correspond to the nonzero elements of GF(16).
And they can!
In a sense, we've "borrowed" the order 15 elements from this "field" within GL(4, 2).
However, none of the powers of this matrix are the companion matrix of either ~2~19 or ~2~25.

<details>
<summary>
Haskell demonstration of the field-like-ness of these matrices
</summary>

All we really need to do is test additive closure, since the powers trivially commute and include the identity matrix.

```{haskell}
-- Check whether n x n matrices (mod p) have additive closure
-- Supplement the identity, even if it is not already present
hasAdditiveClosure :: Integral a => Int -> a -> [Matrix a] -> Bool
hasAdditiveClosure n p xs = all (`elem` xs') sums where
  -- Add in the zero matrix
  xs' = zero n:xs
  -- Calculate all possible sums of pairs (mod p)
  sums = map (fmap (`mod` p)) $ (+) <$> xs' <*> xs'

-- Generate the powers of x, then test if they form a field (mod p)
generatesField :: Integral a => Int -> a -> Matrix a -> Bool
generatesField n p x = hasAdditiveClosure n p xs where
  xs = map (fmap (`mod` p) . (x^)) [1..p^n-1]


print $ generatesField 4 2 $ forget $ fStar $ fmap (AlphaF4*) mBOrig
```
</details>

More directly, we might also observe that *α*^2^*B* is the companion matrix of
  an irreducible polynomial over GF(4), namely $q(x) = x^2 - \alpha x - \alpha$.

Both the "forgotten" matrices and the aforementioned companion matrices lie within GL(4, 2).
A natural question to ask is whether we can make fields by the following process:

1. Filter out all order-15 elements of GL(4, 2)
2. Partition the elements and their powers into their respective order-15 subgroups
3. Add the zero matrix into each class
4. Check whether all classes are additively closed (and are therefore fields)

In this case, it happens to be true, but proving this in general is difficult, and I haven't done so.


Expanding Dimensions
--------------------

Of course, we need not only focus on GF(4) -- we can just as easily work over GL(2, 2*r*) for other *r* than 2.
In this case, the internal matrices will be *r*×*r* while the external one remains 2×2.
But neither do we have to work exclusively with 2×2 matrices -- we can work over GL(*n*, 2^*r*^).
In either circumstance, the "borrowing" of elements of larger order still occurs.
This is summarized by the following diagram:

$$
\begin{CD}
  \underset{
    \scriptsize S \text{ (order $k$)}
  }{
    \text{SL}(n,2^r)
  }
    @>>>
    \underset{
      \scriptsize
      \begin{matrix}
        S \text{ (order $k$)} \\
        T \text{ (order $2^{nr}-1$)}
      \end{matrix}
    }{
      \text{GL}(n, 2^r)
    }
    @>{\text{forget} \circ f_{r}^*}>>
    {\text{GL}(nr, 2)}
    @<{f_{nr}}<<
    \underset{
      \scriptsize
      \begin{matrix}
        s \text{ (order $k$)} \\
        t \text{ (order $2^{nr}-1$)}
      \end{matrix}
    }{
      \mathbb{F}_{2^{nr}}
    }
\end{CD}
$$

Here, *f*~*r*~ is our map from GF(2^*r*^) to *r*×*r* matrices and *f*~*nr*~ is a similar map.
*r* must greater than 1 for us to properly make use of matrix arithmetic.
Similarly, *n* must be greater than 1 for the leftmost GL.
Thus, *nr* is a composite number.
Here, *k* is a proper factor of 2^*nr*^ - 1.
In the prior discussion, *k* was 5 and 2^*nr*^ - 1 was 15.

Recall that primitive polynomials over GF(2^*nr*^) have roots with order 2^*nr*^ - 1.
This number can *never* be prime, since the only primes of the form
  2^*p*^ - 1 are Mersenne primes -- *p* itself must be prime.
Thus, in GL of prime dimensions, we can never loan to a GL over a field
  of larger order with the same characteristic.
Conversely, GL(*nr* + 1, 2) trivially contains GL(*nr*, 2) by fixing a subspace.
So we do eventually see elements of order 2^*m*^ - 1 for either prime or composite *m*.


### Other Primes

This concern about prime dimensions is unique to characteristic 2.
For any other prime *p*, *p*^*m*^ - 1 is composite since it is at the very least even.
All other remarks about the above diagram should still hold for any other prime *p*.

In addition, the diagram where we found a correspondence between the orders of elements in
  GL(2, 2^2^) and GF(2^2×2^) via the characteristic polynomial also generalizes.
Though I have not proven it, I strongly suspect the following diagram commutes,
  at least in the case where *K* is a finite field:

$$
\begin{CD}
  (K^{r \times r})^{n \times n}
    @>{\text{charpoly}}>>
    (K^{r \times r})[\Lambda]
  \\
  @V{\text{id}}VV ~ @VV{\text{eval}_{\Lambda \mapsto \lambda I}}V
  \\
    -
    @. (K [\lambda])^{r \times r}
  \\
  @V{\text{forget}}VV  ~ @VV{\det}V
  \\
  K^{nr \times nr}
    @>>{\text{charpoly}}>
    K[\lambda]
\end{CD}
$$

Over larger primes, the gap between GL and SL may grow ever larger,
  but SL over a prime power field seems to inject into SL over a prime field.
If the above diagram is true, then the prior statement follows.


### Monadicity and Injections

The action of forgetting the internal structure may sound somewhat familiar if you know your Haskell.
Remember that for lists, we can do something similar
  -- converting `[[1,2,3],[4,5,6]]` to `[1,2,3,4,5,6]` is just a matter of applying `concat`.
This is an instance in which we know lists to behave like a [monad](https://wiki.haskell.org/Monad).
Despite being an indecipherable bit of jargon to newcomers, it just means we:

1. can apply functions inside the structure (for example, to the elements of a list),
2. have a sensible injection into the structure (creating singleton lists, called `return`), and
3. can reduce two layers to one (`concat`, or `join` for monads in general).
    - Monads are traditionally defined using the operator  `>>=`, but `join = (>>= id)`

Just comparing the types of `join :: Monad m => m (m a) -> m a`
  and `forget :: Matrix (Matrix a) -> Matrix a` suggests that `Matrix` (meaning square matrices)
  could be a monad, and further, one which respects addition and multiplication.
Of course, **this is only true when our internal matrices are all the same size**.
In the above diagrams, this restriction has applied, but should be stated explicitly
  since no dimension is specified by `Matrix a`.

Condition 2 gives us some trouble, though.
For one, only "numbers" (elements of a ring) can go inside matrices, which restricts
  where monadicity can hold.
More importantly, we have a *lot* of freedom in what dimension we choose to inject into.
For example, we might pick a `return` that uses 1×1 matrices (which add no additional structure).
We might also pick `return2`, which scalar-multiplies its argument to a 2×2 identity matrix instead.

Unfortunately, there's no good answer.
At the very least, we can close our eyes and pretend that we have a nice diagram:

$$
\begin{gather*}
  \begin{matrix}
    & L\underset{\text{degree } r}{/} K
    \\ \\
    \small f
      & \begin{matrix} | \\ \downarrow \end{matrix}
    \\ \\
    & K^{r \times r}
  \end{matrix}
  & \quad & \quad
  & \begin{matrix}
    & (L\underset{\text{degree } r}{/} K)^{n \times n}
    \\ \\
    \small f^* &
      \begin{matrix} | \\ \downarrow \end{matrix}
      & \searrow & \small \texttt{>>=} ~ f \qquad
    \\ \\
    & (K^{r \times r})^{n \times n}
      & \underset{\text{forget}} {\longrightarrow}
      & K {}^{nr \times nr}
  \end{matrix}
\end{gather*}
$$

As one last note on the monadicity of matrices, I *have* played around with an alternative `Matrix`
  type which includes scalars alongside proper matrices, which would allow for
  a simple canonical injection.
Unfortunately, it complicates `join` -- we just place the responsibility of sizing the internal matrices
  front-and-center since we can correspond internal scalars with identity matrices.


Closing
-------

At this point, I've gone on far too long about algebra.
One nagging curiosity makes me wonder whether the there are any diagrams like the following:

$$
\begin{matrix}
  & (L\underset{\text{degree } r}{/} K)^{n \times n}
    & & & & (L\underset{\text{degree } n}{/} K)^{r \times r}
  \\ \\
  \small f_1^*
    & \begin{matrix} | \\ \downarrow \end{matrix}
    & \searrow & & \swarrow
    & \begin{matrix} | \\ \downarrow \end{matrix}
    & \small f_2^*
  \\ \\
  & (K^{r \times r})^{n \times n}
    & \underset{\text{forget}} {\longrightarrow}
    & K {}^{nr \times nr}
    & \underset{\text{forget}}{\longleftarrow}
    & (K^{n \times n})^{r \times r}
\end{matrix}
$$

Or in English, whether "rebracketing" certain *nr* × *nr* matrices can be traced back to
  not only a degree *r* field extension, but also one of degree *n*.

The mathematician in me tells me to believe in well-defined structures.
Matrices are one such structure, with myriad applications.
However, the computer scientist in me laments that the application of these structures is
  buried in symbols and that layering them is at most glossed over.
There is clear utility and interest in doing so, otherwise the diagrams shown above would not exist.

Of course, there's plenty of reason *not* to go down this route.
For one, it's plainly inefficient -- GPUs are *built* on matrix operations being
  as efficient as possible, i.e., without the layering.
It's also inefficient to learn for people *just* learning matrices.
I'd still argue that the method is useful for learning about more complex topics, like field extensions.