diff --git a/posts/finite-field/1/index.qmd b/posts/finite-field/1/index.qmd new file mode 100644 index 0000000..fb6a38a --- /dev/null +++ b/posts/finite-field/1/index.qmd @@ -0,0 +1,513 @@ +--- +format: + html: + html-math-method: katex +--- + + + + +Exploring Finite Fields: Preliminaries +====================================== + +[Fields](https://en.wikipedia.org/wiki/Field_%28mathematics%29) are one of the basic structures in abstract algebra. Roughly, a field is a collection of elements paired with two operations, addition and multiplication, along with particular rules about their interactions. The most important elements of a field are 0 (the additive identity), 1 (the multiplicative identity), and -1 (which forms additive inverses). Moreover, multiplicative inverses must exist. + +Many people are already know about some fields such as the rational numbers $\mathbb{Q}$ and complex numbers $\mathbb{C}$. Finite fields also exist, the most familiar being $\mathbb{F_2} = \text{GF}(2)$, or the field with two elements. This field only contains the elements 0 and 1, with -1 being identical to 1. The addition and multiplication tables are consequently the simplest possible according to familiar rules. + +| + | 0 | 1 | +|---|---|---| +| 0 | 0 | 1 | +| 1 | 1 | 0 | + +| × | 0 | 1 | +|---|---|---| +| 0 | 0 | 0 | +| 1 | 0 | 1 | + +This field expresses the parity of sums and products of two integers, since: + +- even + even = even +- even + odd = odd +- odd + even = odd +- odd + odd = even + +And + +- even × even = even +- even × odd = even +- odd × even = even +- odd × odd = odd + + +In One's Prime +-------------- + +Two is not unique as the only possible order of a finite field -- all prime numbers are also candidates. In general, the field inherits the properties of integer arithmetic. The role of -1 taken up by *p* - 1, where *p* is the order of the field, and the field is referred to as GF(*p*). + +The additive inverse of an element *x* can be viewed in two ways: + +- Multiplying -1 in the field with x, (i.e., $(p - 1)x \mod p$) +- Counting backwards from zero, which is equal to p (i.e., $p - x$) + +The product of any two elements in the field cannot result in a multiple of *p*, which are all congruent to 0. If this were the case, then the order would share factors with one of the two terms. But the order is prime, so this is impossible. More strongly, multiplicative inverses [can be found algorithmically](https://en.wikipedia.org/wiki/Modular_multiplicative_inverse#Extended_Euclidean_algorithm), although it is a somewhat tricky task. + + +Polynomials +----------- + +Before we look at other finite fields, we have to look at polynomials. For a given field *K*, we can also consider polynomials with coefficients from elements in the field, K\[*x*\]. The structure that polynomials fit into is called a [*ring*](https://en.wikipedia.org/wiki/Ring_%28mathematics%29). Rings are slightly weaker than a fields, since there are not generally multiplicative inverses. The zero polynomial and constant polynomial 1 are the additive and multiplicative identities, respectively. + +Since GF(*p*) has a finite number of elements to consider, there are only so many choices for polynomial coefficients. Each degree has a finite number of polynomials, so it's much easier to list them out than it would be for the integers. Again, looking at polynomials over GF(2), we have: + +Degree | Polynomial *q(x)* | List of coefficients of *q(x)* (ascending) | *q*(2) | *q*(2) (Binary) +-------|-------------------|--------------------------------------------|--------|---------------- +1 | x | [0, 1] | 2 | 10 +1 | 1 + x | [1, 1] | 3 | 11 +2 | x^2 | [0, 0, 1] | 4 | 100 +2 | 1 + x^2 | [1, 0, 1] | 5 | 101 +2 | x + x^2 | [0, 1, 1] | 6 | 110 +2 | 1 + x + x^2 | [1, 1, 1] | 7 | 111 +3 | x^3 | [0, 0, 0, 1] | 8 | 1000 +3 | 1 + x^3 | [1, 0, 0, 1] | 9 | 1001 +3 | x + x^3 | [0, 1, 0, 1] | 10 | 1010 +3 | 1 + x + x^3 | [1, 1, 0, 1] | 11 | 1011 +... | ... | ... | ... | ... + + +### The Base-ics + +There is a very close correspondence between binary expansions and polynomials over GF(2). This is evident by comparing the list of coefficients in the polynomial (column 3) with the binary expansions of the polynomial evaluated at 2 (column 5). This gives a handy way of referring to polynomials (mod *p*) without having to write out each individual "x" or "+". In fact, this is commonly used to compactly compute with and refer to [polynomials used in cyclic redundancy checks](https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks). + +Again, 2 is not unique among primes. Polynomials over any prime field GF(*p*) can be expressed as integers in base *p*. + +
+ +Haskell implementation of duality between polynomials mod *p* and base *p* expansions of integers + +
+This implementation actually works for any base *b*, which is not necessarily prime. The only difference is that the coefficients lose "field-ness" for composite *b*. + +```{haskell} +-- | eval: false + +-- A polynomial is its ascending list of coefficients (of type a) +data Polynomial a = Poly { coeffs :: [a] } + +-- Interpret a number's base-b expansion as a polynomial +asPoly :: Int -> Int -> Polynomial Int +-- Build a list with f, which returns either Nothing +-- or Just (next element of list, next argument to f) +asPoly b = Poly . unfoldr f where + -- Divide x by b. Emit the remainder and recurse with the quotient. + f x | x /= 0 = Just $ swap $ divMod x b + -- If there's nothing left to divide out, terminate + | otherwise = Nothing + +-- Horner evaluation of a polynomial at the integer b +evalPoly :: Int -> Polynomial Int -> Int +-- Start with the highest coefficient +-- Multiply by b at each step and add the coefficient of the next term +evalPoly b (Poly p) = foldr (\y acc -> acc*b + y) 0 p + +-- evalPoly n . asPoly n = id :: Int -> Int +``` + +An interesting detail here is that the duality is expressed through `foldr` using multiplication and addition and `unfoldr` using divMod. +
+ + +### Mono, not Stereo + +With respect to their roots (which will soon become of primary interest), polynomials are *projective*. More directly, any scalar multiple of the polynomial has the same roots. For GF(2), this is insignificant, but over GF(5) for example, the following polynomials have the same roots: + +$$ +\begin{align*} +x^2 + 2 \mod 5 \quad &\longleftrightarrow \quad {_5 27} \\ +2x^2 + 4 \mod 5 \quad &\longleftrightarrow \quad {_5 54} \\ +3x^2 + 1 \mod 5 \quad &\longleftrightarrow \quad {_5 76} \\ +4x^2 + 3 \mod 5 \quad &\longleftrightarrow \quad {_5 103} +\end{align*} +$$ + +Only the first polynomial has a leading coefficient of 1, a condition which makes it a *monic* polynomial. It is preferable to work with monic polynomials since the product of two monic polynomials is also monic. + +An equivalent condition to this is that the integer the polynomial corresponds to corresponds to falls between 5^2 and 2×5^2 - 1 = 49. In general, monic polynomials mod *p* as integers obviously fall in the range *p*^2 and 2*p*^2 - 1. + +
+ +Haskell implementation of monic polynomials mod *p* + + +Again, nothing about this definition depends on the base being prime. + +```{haskell} +-- | eval: false + +-- All monic polynomials of degree d with coefficients mod n +monics :: Int -> Int -> [Polynomial Int] +monics n d = map (asPoly n) [n^d..2*(n^d) - 1] + +-- All monic polynomials with coefficients mod n, ordered by degree +allMonics :: Int -> [Polynomial Int] +allMonics n = concat [monics n d | d <- [1..]] +``` +
+ +As an aside, one can also read out monics by counting normally by using the digit alphabet {1, 0, -1, ..., -*p* + 2}. Unfortunately, these base-*p* expansions are more difficult to obtain algorithmically, and I'll leave this as an exercise to the reader. + + +### Irreducibles + +Over the integers, we can factor a number into primes. To decide if a number is prime, we just divide it (using an algorithm like long division) by numbers less than it and see if we get a nonzero remainder. + +Similarly, we can factor polynomials into *irreducible* polynomials, which have no "smaller" polynomial factors other than 1. More precisely, by "smaller", we mean those of lesser degree. For example, over the integers, the polynomial $x^2 - 1$ (degree 2) factors into $(x + 1)(x - 1)$ (both degree 1), but $x^2 + 1$ is irreducible. + +In general, a factorization of a polynomial over the integers implies a factorization of one over GF(*p*), since the coefficients for each factor may be taken mod *p*. However, the converse does not hold. Over GF(2), + +$$ +(x + 1)^2 = x^2 + 2x + 1 \equiv x^2 + 1 \mod 2 +$$ + +...but as just mentioned, the right-hand side is irreducible over the integers. + +Since we can denote polynomials by numbers, it may be tempting to freely switch between primes and irreducibles. However, irreducibles depend on the chosen modulus and do not generally correspond to the base p expansion of a prime. + +Irreducible over GF(2), *q(x)* | *q*(2) ([OEIS A014580](https://oeis.org/A014580)) | Prime +-------------------------------|---------------------------------------------------|------ +$x$ | 2 | 2 +$x + 1$ | 3 | 3 +$x^2 + x + 1$ | 7 | 5 +$x^3 + x + 1$ | 11 | 7 +$x^3 + x^2 + 1$ | 13 | 11 +$x^4 + x + 1$ | 19 | 13 +$x^4 + x^3 + 1$ | 25 | 17 +$x^4 + x^3 + x^2 + x + 1$ | 31 | 19 +$x^5 + x^2 + 1$ | 37 | 23 +$x^5 + x^3 + 1$ | 41 | 29 +$x^5 + x^3 + x^2 + x + 1$ | 47 | 31 + +The red entry in column 2 is not prime. Dually, the green entries in column 3 do not have binary expansions which correspond to irreducible polynomials over GF(2). + + +### Dividing and Sieving + +Just like integers, we can use [polynomial long division](https://en.wikipedia.org/wiki/Polynomial_long_division) with these objects to decide if a polynomial is irreducible. [Synthetic division](https://en.wikipedia.org/wiki/Synthetic_division) is an alternative which is slightly easier to implement (especially mod 2, where it is, again, used in CRCs). It only works for monic polynomials, but this is all we need. + +
+ +Haskell implementation of synthetic division + + +The algorithm is similar to table-less algorithms for CRCs, but we don't have the luxury of working at the bit level with XOR for addition. We also have to watch out for negation and coefficients other than 1 for when not working mod 2. + +```{haskell} +-- | eval: false + +-- Divide the polynomial ps by qs (coefficients in descending order by degree) +synthDiv' :: (Eq a, Num a) => [a] -> [a] -> ([a], [a]) +synthDiv' ps qs + | head qs /= 1 = error "Cannot divide by non-monic polynomial" + | otherwise = splitAt deg $ doDiv ps deg + where + -- Negate the denominator and ignore leading term + qNeg = map negate $ tail qs + -- The degree of the result, based on the degrees of the numerator and denominator + deg = max 0 (length ps - length qs + 1) + -- Pluck off the head of the list and add a shifted and scaled version of + -- qs to the tail of the list. Repeat this d times + doDiv xs 0 = xs + doDiv (x:xs) d = x:doDiv (zipAdd xs $ map (*x) qNeg) (d - 1) + +-- Use Polynomial (coefficients in ascending degree order) instead of lists +synthDiv :: (Eq a, Num a) => Polynomial a -> Polynomial a -> (Polynomial a, Polynomial a) +synthDiv (Poly p) (Poly q) = (Poly $ reverse quot, Poly $ reverse rem) where +(quot, rem) = synthDiv' (reverse p) (reverse q) +``` +
+ +Then, using our list of monic polynomials, we can use the same strategy for sieving out primes to find (monic) irreducibles. + +
+ +Haskell implementation of an irreducible polynomial (mod *p*) sieve + + +```{haskell} +-- | eval: false + +-- All irreducible monic polynomials with coefficients mod n +irreducibles :: Int -> [Polynomial Int] +irreducibles n = go [] $ monics n where + -- Divide the polynomial x by i, then take the remainder mod n + remModN x i = fmap (`mod` n) $ snd $ synthDiv x i + -- Find remainders of x divided by every irreducible in "is". + -- If any give the zero polynomial, then x is a multiple of an irreducible + notMultiple x is = and [not $ all (==0) $ coeffs $ remModN x i | i <- is] + -- Sieve out by notMultiple + go is (x:xs) + | notMultiple x is = x:go (x:is) xs + | otherwise = go is xs + +print $ take 10 $ irreducibles 2 +``` +
+ + +Matrices +-------- + +Just like polynomials over a finite field, we can also look at matrices. The most interesting matrices are square ones, since the product of two square matrices is another square matrix. Along with the zero matrix ($\bf 0_n$) as the additive identity and the identity matrix ($\bf 1_n$) as the multiplicative, square matrices also form a ring over the field K, denoted Kn×n. + +Square matrices are associated to a [determinant](https://en.wikipedia.org/wiki/Determinant), which is an element from the underlying field. Determinants are nice, since the determinant of the product of two matrices is the product of the determinants. The determinant can be implemented using [Laplace expansion](https://en.wikipedia.org/wiki/Laplace_expansion), which is also useful for inductive proofs. + + +
+ +Haskell implementation of Laplace expansion + + +Laplace expansion is ludicrously inefficient compared to other algorithms, and is only shown here due to its "straightforward" implementation and use in proof. Numeric computation will not be used to keep the arithmetic exact. + +```{haskell} +-- | eval: false + +newtype Matrix a = Mat { unMat :: Array (Int, Int) a } + +determinant :: (Num a, Eq a) => Matrix a -> a +determinant (Mat xs) = determinant' xs where + -- Evaluate (-1)^i without repeated multiplication + parity i = if even i then 1 else -1 + -- Map old array addresses to new ones when eliminating row 0, column i + rowMap i (x,y) = (x+1, if y >= i then y+1 else y) + -- Recursive determinant Array + determinant' xs + -- Base case: 1x1 matrix + | n == 0 = xs!(0,0) + -- Sum of cofactor expansions + | otherwise = sum $ map cofactor [0..n] where + -- Produce the cofactor of row 0, column i + cofactor i + | xs!(0,i) == 0 = 0 + | otherwise = (parity i) * xs!(0,i) * (determinant' $ minor i) + -- Furthest extent of the bounds, i.e., the size of the matrix + (_,(n,_)) = bounds xs + -- Build a new Array by eliminating row 0 and column i + minor i = ixmap ((0,0),(n-1,n-1)) (rowMap i) xs +``` +
+ + +### Back to Polynomials + +The [characteristic polynomial](https://en.wikipedia.org/wiki/Characteristic_polynomial) is a stronger invariant which follows from the determinant (and conveniently plays into the prior description of polynomials). It is defined as, for *λ* a scalar variable: + +$$ +\text{charpoly}(A) = p_A(\lambda) = \left| \lambda I - A \right| \\ ~ \\ += \left| +\begin{matrix*} +\lambda - a_{00} & -a_{01} & ... & -a_{0n} \\ +-a_{10} & \lambda - a_{11} & ... & -a_{1n} \\ +\vdots & \vdots & \ddots & \vdots \\ +-a_{n0} & -a_{n1} & ... & \lambda - a_{nn} \\ +\end{matrix*} +\right| +$$ + +Laplace expansion never gives *λ* a coefficient before recursing, so the characteristic polynomial is always monic. + +
+ +Haskell implementation of the characteristic polynomial + +Since `determinant` was defined for all `Num` and `Eq`, it can immediately be applied if these instances are defined for polynomials. + +```{haskell} +-- | eval: false +-- Num instance for polynomials omitted +-- instance (Num a, Eq a) => Num (Polynomial a) where +-- ... + +instance Eq a => Eq (Polynomial a) where + (==) (Poly xs) (Poly ys) = xs == ys + +charpoly :: Matrix Int -> Polynomial Int +charpoly xs = determinant $ eyeLambda |+| negPolyXs where + -- Furthest extent of the bounds, i.e., the size of the matrix + (_,(n,_)) = bounds $ unMat xs + -- Negative of input matrix, after being converted to polynomials + negPolyXs :: Matrix (Polynomial Int) + negPolyXs = fmap (\x -> Poly [-x]) xs + -- Identity matrix times lambda (encoded as Poly [0, 1]) + eyeLambda :: Matrix (Polynomial Int) + eyeLambda = fmap (\x -> (Poly [x] * Poly [0, 1])) $ eye (n+1) +``` +
+ +Computation using this definition is only good for illustrative purposes. The [Faddeev-LeVerrier algorithm](https://en.wikipedia.org/wiki/Faddeev%E2%80%93LeVerrier_algorithm) circumvents Laplace expansion entirely and happens to generate the determinant along the way. However, it has some problems: + +- It inverts the order in which the determinant and characteristic polynomial are defined +- It introduces division, which makes it unsuitable over mod *p* matrices directly + +Fortunately, we can just work with a mod *p* matrix over the integers and mod out at the end instead, as the following diagram conveys: + +$$ +\begin{matrix} +\mathbb{F}_p ^{n \times n} & \textcolor{green}{\hookrightarrow} & +\normalsize \mathbb{Z}^{n \times n} & +\overset{\mod p ~~}{\longrightarrow} & +\mathbb{F}_p^{n \times n} & \scriptsize \phantom{text{charpoly}} +\\ \\ & +\scriptsize \textcolor{green}{\text{charpoly (FL)}} & +\textcolor{green}{\downarrow} & & +\textcolor{red}{\downarrow} & \scriptsize \textcolor{red}{\text{charpoly (LE)}} +\\ \\ & & +\mathbb{Z}[\lambda] & +\textcolor{green}{\underset{\mod p ~~}{\longrightarrow}} & +\mathbb{F}_p[\lambda] & \scriptsize \phantom{text{charpoly}} +\end{matrix} +$$ + +The top row are matrices and the bottom row are polynomials. To get to the bottom-right, which contains the characteristic polynomials mod *p* matrices, we can avoid the red arrow and follow the path in green instead. + + +### Friends Among Matrices + +In the reverse direction, a matrix with a specific characteristic polynomial can be constructed from a polynomal. The matrix is called the [companion matrix](https://en.wikipedia.org/wiki/Companion_matrix), and is defined as + +$$ +p(\lambda) = \lambda^n + p_{n-1}\lambda^{n-1} + ... + p_1 \lambda + p_0 \\ ~ \\ +C_{p(\lambda)} = \left( \begin{matrix} +0 & 1 & 0 & ... & 0 \\ +0 &0 & 1 & ... & 0 \\ +\vdots &\vdots & \vdots & \ddots & \vdots \\ +0 & 0 & 0 & ... & 1 \\ +-p_0 & -p_1 & -p_2 & ... & -p_{n-1} +\end{matrix} \right) = +\left( \begin{matrix} +\overrightharpoon 0_{n-1} & \bold{1}_{n-1} \\ +-p_0& -(\overrightharpoon{p}_{1:n-1})^T +\end{matrix} \right) \\ ~ \\ +\text{charpoly}(C_{p}) = p_{C_{p}}(\lambda) = p(\lambda) +$$ + +The definition of the companion matrix only depends on elements having an additive inverse, which is always true in a field. Therefore, there are always matrices over a field that have a monic polynomial as their characteristic polynomial. + +Proving that the companion matrix has the characteristic polynomial it was constructed from can be done via Laplace expansion: + +$$ +p_{0:n-1}(\lambda) = \left| \begin{matrix} +\textcolor{red}{\lambda} & -1 & 0 & ... & 0 \\ +0 & \lambda & -1 & ... & 0 \\ +\vdots &\vdots & \vdots & \ddots & \vdots \\ +0 & 0 & 0 & ... & -1 \\ +\textcolor{green}{p_0} & p_1 & p_2 & ... & \lambda + p_{n-1} +\end{matrix} \right| +\\ ~ \\ = +\textcolor{green}{p_0} \cdot (-1)^{n-1} +\left| \begin{matrix} +-1 & 0 & ... & 0 \\ +\lambda & -1 & ... & 0 \\ +\vdots & \vdots & \ddots & \vdots \\ +0 & 0 & ... & -1 +\end{matrix} \right| ++ \textcolor{red}{\lambda} +\left| \begin{matrix} +\lambda & -1 & ... & 0 \\ +\vdots & \vdots & \ddots & \vdots \\ +0 & 0 & ... & -1 \\ +p_1 & p_2 & ... & \lambda + p_{n-1} +\end{matrix} \right| +\\ ~ \\ = +\textcolor{green}{p_0} \cdot (-1)^{n-1} \cdot (-1)^{n-1} + +\textcolor{red}{\lambda} \cdot p_{1:n-1}(\lambda) +\\ ~ \\ = +p_0 + \lambda(p_1 + \lambda (...(p_{n-1} + \lambda )...))) +$$ + +Pleasantly, this yields the Horner form, which was used above to evaluate polynomials. + +
+ +Haskell implementation of the companion matrix + +```{haskell} +-- | eval: false + +companion :: Polynomial Int -> Matrix Int +companion (Poly ps) + | last ps' /= 1 = error "Cannot find companion matrix of non-monic polynomial" + | otherwise = Mat $ array ((0,0), (n-1,n-1)) $ lastRow ++ shiftI where + -- The degree of the polynomial, as well as the size of the matrix + n = length ps' - 1 + -- Remove trailing 0s from ps + ps' = reverse $ dropWhile (==0) $ reverse ps + -- Address/value tuples for a shifted identity matrix: + -- 1s on the diagonal just above the main diagonal, 0s elsewhere + shiftI = map (\p@(x,y) -> (p, if y == x + 1 then 1 else 0)) $ range ((0,0),(n-2,n-1)) + -- Address/value tuples for the last row of the companion matrix: + -- ascending powers of the polynomial + lastRow = zipWith (\x y -> ((n-1, x), y)) [0..n-1] $ map negate ps' + +-- (charpoly . companion) = id :: Polynomial Int -> Polynomial Int +``` +
+ + +Field Extensions +---------------- + +Aside from those of degree 1, the irreducible polynomials over a field cannot be factored into monomials over the field. In other words, irreducibles have roots which do not exist as elements of the field. A *field extension* formalizes the notion by which one can make a larger field from another by adding the roots. + +Using $x^2 + 1$ (over the integers) again, even over an actual field like $\mathbb{R}$, the polynomial is still irreducible. On the other hand, it can be factored into $(x + i)(x - i)$ over $\mathbb{C}$. We can construct the latter field from the from if an extra number i exists alongside everything in $\mathbb{R}$ such that *i*^2^ = -1. Then, form all possible products and sums by taking linear combinations of powers of i less than the degree (in this case, 0 and 1). + +The equation that *i* obeys can be rewritten as $i^2 + 1 = 0$, which is the original polynomial, evaluated at *i*. In order to refer explicitly to the construction of the bigger field from the polynomial, we write $\mathbb{R}[x] / (x^2 + 1) \cong \mathbb{C}$. *Technically*, the left hand side refers to something else (cosets of polynomials, from which we extract the canonical member *i*), but this description is good enough. + + +### The Power of Primes + +We can extend a finite field in the same way. Over GF(2), the smallest irreducible of degree 2 is $x^2 + x + 1$. Using the same logic as before, we construct $\mathbb{F}_2[x] / (x^2 + x + 1) \cong \mathbb{F}_2[\alpha]$. The new element α is a root of the polynomial and obeys the relations: + +$$ +\alpha^2 + \alpha + 1 = 0 \\ + \alpha^2 = -\alpha - 1 \equiv \alpha + 1 \mod 2 \\ +\alpha^3 = \alpha^2 + \alpha= (\alpha +1) + \alpha \equiv 1 \mod 2 +$$ + +Just like *i*, only powers of α less than 2 (again, 0 and 1) are necessary to express elements of the field. Skipping a few steps, we can accumulate all possible sums and products over this new field into two new tables: + +| + | 0 | 1 | *α* | *α* + 1 | +|---------|---------|---------|---------|---------| +| 0 | 0 | 1 | *α* | *α* + 1 | +| 1 | 1 | 0 | *α* + 1 | *α* | +| *α* | *α* | *α* + 1 | 0 | 1 | +| *α* + 1 | *α* + 1 | *α* | 1 | 0 | + +| × | 0 | 1 | *α* | *α* + 1 | +|---------|---------|---------|---------|---------| +| 0 | 0 | 0 | 0 | 0 | +| 1 | 0 | 1 | *α* | *α* + 1 | +| *α* | 0 | *α* | *α* + 1 | 1 | +| *α* + 1 | 0 | *α* + 1 | 1 | *α* | + +As you might expect, the resulting field has 4 elements, so it's called $\mathbb{F}_4 = \text{GF}(4)$. In general, when adjoining an irreducible of degree d to GF(p), the resulting field has pd elements, naturally denoted $\mathbb{F}_{p^d} = \text{GF}(p^d)$. p is called the characteristic of the field, and denotes how many repeated additions are needed to get to 0. From the above table, it's clear that the characteristic is 2 since 1 + 1 = α + α = (α + 1) + (α + 1) = 0. + + +### ...and beyond? + +All of this is manageable when you're adjoining a root of a degree 2 polynomial like *α* or *i*, but things get difficult when you start to work with higher degrees. The powers of the root form the basis for a *d*-dimensional vector space over GF(*p*) (hence the order of the field being *p*^*d*^). Proceeding as before, we'd have to be able to: + +- recognize equality in the new field based on sums of powers of roots (times elements of the field) +- have a canonical method of expressing other elements after adjoining a root + - ideally, handle both with an algorithm that gives canonical forms from noncanonical ones +- know when we've found every element of the new field + +These problems make it difficult to study prime power fields on a computer without the use of a CAS like Maple or Mathematica. They're capable of taking care of these issues symbolically, working with the expressions in the same way we have (or at least appearing to do so). As someone who likes to do things himself, implementing a CAS from scratch seemed a little too cumbersome. Furthermore, even a more direct approach using the previously-mentioned "canonical members of cosets of polynomials" was more annoying than I was willing to put up with. + +Fortunately, there's a detour that makes it much easier to dodge all of these problems, and it has some interesting consequences. Join me in [the next post]() for an direct, non-symbolic way to work with prime power fields. diff --git a/posts/finite-field/2/index.qmd b/posts/finite-field/2/index.qmd new file mode 100644 index 0000000..3afd69e --- /dev/null +++ b/posts/finite-field/2/index.qmd @@ -0,0 +1,709 @@ +--- +format: + html: + html-math-method: katex +jupyter: python3 +--- + + + + +Exploring Finite Fields, Part 2: Matrix Boogaloo +================================================ + +In the [last post](), we discussed finite fields, polynomials and matrices over them, and the typical, symbolic way of extending fields with polynomials. This post will will focus on circumventing symbolic means with numeric ones. + + +More about Matrices (and Polynomials) +------------------------------------- + +Recall the definition of polynomial evaluation. Since a polynomial is defined with respect to a certain structure (e.g., the integers), we expect to only be able to evaluate the polynomial within that structure. + +$$ +K[x] \times K \overset{\text{eval}}{\longrightarrow} K +$$ + +However, there's nothing wrong with evaluating polynomials with another polynomial, as long as they're defined over the same structure. After all, we can take powers of polynomials and scalar-multiply them with coefficients from *K*. The same holds for matrices, or any bigger structure *F* over *K* which has these properties. + +$$ +\begin{align*} +K[x] \times K[x] &\overset{\text{eval}_{poly}}{\longrightarrow} K[x] \\ ~ \\ +K[x] \times K^{n \times n} &\overset{\text{eval}_{mat}}{\longrightarrow} K^{n \times n} \\ ~ \\ +K[x] \times F(K) &\overset{\text{eval}_F}{\longrightarrow} F(K) +\end{align*} +$$ + +Essentially, this means we can extend a polynomial into new structures by evaluating it in certain ways: + +$$ +\begin{align*} +& \phantom{I} \begin{align*} +p &: K[x] \\ +p(x) &= x^n + p_{n-1}x^{n-1} + ... + p_1 x + p_0 & +\end{align*} & +\text{$x$ is a scalar indeterminate} \\ ~ \\ +& \begin{align*} +P &: (K[x])^{m \times m} \\ +P(x I) &= (x I)^n + (p_{n-1})(x I)^{n-1} + ... & + \\ +\phantom{= p} & + p_1(x I)+ p_0 I +\end{align*} & +\begin{align*} + \text{$x$ is a scalar indeterminate,} \\ +\text{$P(x I)= p(x) I$ is a } \\ +\text{matrix of polynomials in $x$} +\end{align*} \\ +\\ +& \begin{align*} +\hat P &: K^{m \times m}[X] \\ +\hat P(X) &= X^n + (p_{n-1}I)X^{n-1} + ... \\ + & + (p_1 I) X + (p_0 I) +\end{align*} & +\begin{align*} +\text{$X$ is a matrix indeterminate} \\ +\hat P(X) \text{ is a polynomial over matrices} +\end{align*} +\end{align*} +$$ + +Or by using types instead of the more abstract notation above, we can describe functions that convert p to P and $\hat P$. + +```{.haskell} +asPolynomialMatrix :: Polynomial Int -> Matrix (Polynomial Int) +asMatrixPolynomial :: Polynomial Int -> Polynomial (Matrix Int) +``` + +### Cayley-Hamilton Theorem + +When evaluating the characteristic polynomial of a matrix *with* that matrix, something strange happens. Continuing from the previous article, using $x^2 + x + 1$ and its companion matrix, we have: + +$$ +p(x) = x^2 + x + 1 \qquad C_{p} = C = +\left( \begin{matrix} +0 & 1 \\ +-1 & -1 +\end{matrix} \right) \\ ~ \\ +\hat P(C) = C^2 + C + (1 \cdot I) = +\left( \begin{matrix} +-1 & -1 \\ +1 & 0 +\end{matrix} \right) + +\left( \begin{matrix} +0 & 1 \\ +-1 & -1 +\end{matrix} \right) + +\left( \begin{matrix} +1 & 0 \\ +0 & 1 +\end{matrix} \right) \\ ~ \\ += \left( \begin{matrix} +0 & 0 \\ +0 & 0 +\end{matrix} \right) +$$ + +The result is the zero matrix. This tells us that, at least in this case, the matrix *C* is a root of its own characteristic polynomial. + +By the [Cayley-Hamilton theorem](https://en.wikipedia.org/wiki/Cayley%E2%80%93Hamilton_theorem), this is true in general, no matter the degree of *p*, no matter its coefficients, and importantly, no matter the choice of field. This is more powerful than it would otherwise seem. For one, factoring a polynomial "inside" a matrix turns out to give the same answer as factoring a polynomial over matrices. + +:::: {layout-ncol = "2"} +::: {} +$$ +P(xI) = \left( \begin{matrix} +x^2 + x + 1 & 0 \\ +0 & x^2 + x + 1 +\end{matrix}\right) \\ ~ \\ = (xI - C)(xI - C') \\ ~ \\ += +\left( \begin{matrix} +x & -1 \\ +1 & x + 1 +\end{matrix} \right) +\left( \begin{matrix} +x - a & -b \\ +-c & x - d +\end{matrix} \right) \\ ~ \\ +\begin{align*} +x(x-a) + c &= x^2 + x + 1 \\ +\textcolor{green}{x(-b) - (x - d)} &\textcolor{green}{= 0} \\ +\textcolor{blue}{(x - a) + (x + 1)(-c)} &\textcolor{blue}{= 0} \\ +(-b) + (x + 1)(x - d) &= x^2 + x + 1 \\ +\end{align*} +\\ ~ \\ +\textcolor{green}{(-b -1)x +d = 0} \implies b = -1, ~ d = 0 \\ +\textcolor{blue}{(1 - c)x - a - c = 0} \implies c = 1, ~ a = -1 +\\ ~ \\ +C' = +\left( \begin{matrix} +-1 &-1 \\ +1 & 0 +\end{matrix} \right) +$$ +::: + +::: {} +$$ +\hat P(X) = X^2 + X + 1I \\ ~ \\ += (X - C)(X - C') \\ ~ \\ += X^2 - (C + C')X + CC' +\\ ~ \\ \implies \\ ~\\ +C + C' = -I, ~ C' = -I - C \\ ~ \\ +CC' = I, ~ C^{-1} = C' \\ ~ \\ +C' = \left( \begin{matrix} +-1 & -1 \\ +1 & 0 +\end{matrix} \right) +$$ +::: +:::: + +It's important to not that a matrix factorization is not unique. *Any* matrix with a given characteristic polynomial can be used as a root of that polynomial. Of course, choosing one root affects the other matrix roots. + + +### Moving Roots + +All matrices commute with the identity and zero matrices. A less obvious fact is that all of the matrix roots *also* commute with one another. By the Fundamental Theorem of Algebra, [Vieta's formulas](https://en.wikipedia.org/wiki/Vieta%27s_formulas) state: + +$$ +\hat P(X) = +\prod_{[i]_n} (X - \Xi_i) = +(X - \Xi_0) (X - \Xi_1)...(X - \Xi_{n-1}) \\ += \left\{ \begin{align*} & +\phantom{+} X^n \\ +& - (\Xi_0 + \Xi_1 + ... + \Xi_{n-1}) X^{n-1} \\ +& + (\Xi_0 \Xi_1+ \Xi_0 \Xi_2 + ... + \Xi_0 \Xi_{n-1} + \Xi_1 \Xi_2 + ... \Xi_{n-2} \Xi_{n-1})X^{n-2} \\ +& \qquad \vdots \\ +& + (-1)^n \Xi_0 \Xi_1 \Xi_2...\Xi_n +\end{align*} \right. +\\ += X^n -\sigma_1([\Xi]_n)X^{n-1} + \sigma_2([\Xi]_n)X^{n-2} + ... + (-1)^n \sigma_n([\Xi]_n) +$$ + +The product range \[*i*\]~*n*~ means that the terms are ordered from 0 to *n* - 1 over the index given. On the bottom line, *σ* are [elementary symmetric polynomials](https://en.wikipedia.org/wiki/Elementary_symmetric_polynomial) and \[*Ξ*\]~*n*~ is the list of root matrices from *Ξ*~*0*~ to Ξ~*n-1*~. + +By factoring the matrix with the roots in a different order, we get another factorization. It suffices to only focus on *σ*~2~, which has all pairwise products. + +$$ +\pi \in S_n \\ \qquad +\pi \circ \hat P(X) = \prod_{\pi ([i]_n)} (X - \Xi_i) \\ ~ \\ += X^n +- \sigma_1 \left(\pi ([\Xi]_n) \vphantom{^{1}} \right)X^{n-1} + ++ \sigma_2 \left(\pi ([\Xi]_n) \vphantom{^{1}} \right)X^{n-2} + ... ++ (-1)^n \sigma_n \left(\pi ([\Xi]_n) \vphantom{^{1}} \right) +\\ ~ \\ +\\ ~ \\ +(0 ~ 1) \circ \hat P(X) = (X - \Xi_{1}) (X - \Xi_0)(X - \Xi_2)...(X - \Xi_{n-1}) +\\ += X^n + ... + \sigma_2(\Xi_1, \Xi_0, \Xi_2, ...,\Xi_{n-1})X^{n-2} + ... \\ +\\ ~ \\ ~ \\ +\begin{array}{} +e & (0 ~ 1) & (1 ~ 2) & ... & (n-2 ~~ n-1) \\ \hline +\textcolor{red}{\Xi_0 \Xi_1} & \textcolor{red}{\Xi_1 \Xi_0} & \Xi_0 \Xi_1 & & \Xi_0 \Xi_1\\ +\Xi_0 \Xi_2 & \Xi_0 \Xi_2 & \Xi_0 \Xi_2 & & \Xi_0 \Xi_2 \\ +\Xi_0 \Xi_3 & \Xi_0 \Xi_3 & \Xi_0 \Xi_3 & & \Xi_0 \Xi_3 \\ +\vdots & \vdots & \vdots & & \vdots \\ +\Xi_0 \Xi_{n-1} & \Xi_0 \Xi_{n-1} & \Xi_{0} \Xi_{n-1} & & \Xi_{n-1} \Xi_0\\ +\textcolor{green}{\Xi_1 \Xi_2} & \Xi_1 \Xi_2 & \textcolor{green}{\Xi_2 \Xi_1} & & \Xi_1 \Xi_2 \\ +\vdots & \vdots & \vdots & & \vdots \\ +\textcolor{blue}{\Xi_{n-2} \Xi_{n-1}} & \Xi_{n-2} \Xi_{n-1} & \Xi_{n-2} \Xi_{n-1} & & \textcolor{blue}{\Xi_{n-1} \Xi_{n-2}} \\ +\end{array} +$$ + +The "[path swaps]()" shown commute only the adjacent elements. By contrast, the permutation (0 2) commutes *Ξ*~0~ past both *Ξ*~1~ and *Ξ*~2~. But since we already know *Ξ*~0~ and *Ξ*~1~ commute by the above list, we learn at this step that *Ξ*~0~ and *Ξ*~2~ commute. This can be repeated until we reach the permutation (0 *n*-1) to prove commutativity between all pairs. + + +### Matrix Fields? + +The above arguments tell us that if *p* is irreducible, we can take its companion matrix *C*~*p*~ and work with its powers in the same way we would a typical root. Irreducible polynomials cannot have a constant term 0, otherwise *x* could be factored out. The constant term is equal to the determinant of the companion matrix (up to sign), so *C*~*p*~ is invertible. We get commutativity for free, since it follows from associativity that all powers of *C*~*p*~ commute. + +This narrows the ring of matrices to a full-on field. Importantly, it absolves us from the need to symbolically render elements using a power of the root. Instead, they can be adjoined by going from scalars to matrices. We can also find every element in the field arithmetically. Starting with a root, every element, produce new elements taking its matrix powers. Then, scalar-multiply them and add them to elements of the field which are already known. For finite fields, we can repeat this process with the new matrices until we have all *p*^*d*^ elements. + + +GF(8) +----- + +This is all rather abstract, so let's look at an example before we proceed any further. The next smallest field of characteristic 2 is GF(8). We can construct this field from the two irreducible polynomials of degree 3 over GF(2): + +$$ +q(x) = x^3 + x + 1 = 1011_x \sim {}_2 11 \qquad +C_q = \left( \begin{matrix} +0 & 1 & 0 \\ +0 & 0 & 1 \\ +1 & 1 & 0 +\end{matrix} \right) \mod 2 \\ ~ \\ +r(x) = x^3 + x^2 + 1 =1101_x \sim {}_2 13 \qquad +C_r = \left( \begin{matrix} +0 & 1 & 0 \\ +0 & 0 & 1 \\ +1 & 0 & 1 +\end{matrix} \right) \mod 2 \\ +$$ + +Notice how the bit strings for either of these polynomials is the other, reversed. Arbitrarily, let's work with Cr. The powers of this matrix, mod 2, are as follows: + +$$ +(C_r)^1 = \left( \begin{matrix} +0 & 1 & 0 \\ +0 & 0 & 1 \\ +1 & 0 & 1 +\end{matrix} \right) +\quad +(C_r)^2 = \left( \begin{matrix} +0 & 0 & 1 \\ +1 & 0 & 1 \\ +1 & 1 & 1 +\end{matrix} \right) +\quad +(C_r)^3 = \left( \begin{matrix} +1 & 0 & 1 \\ +1 & 1 & 1 \\ +1 & 1 & 0 +\end{matrix} \right) +\\ +(C_r)^4 = \left( \begin{matrix} +1 & 1 & 1 \\ +1 & 1 & 0 \\ +0 & 1 & 1 +\end{matrix} \right) \quad +(C_r)^5 = \left( \begin{matrix} +1 & 1 & 0 \\ +0 & 1 & 1 \\ +1 & 0 & 0 +\end{matrix} \right) \quad +(C_r)^6 = \left( \begin{matrix} +0 & 1 & 1 \\ +1 & 0 & 0 \\ +0 & 1 & 0 +\end{matrix} \right) +\\ +(C_r)^7 = \left( \begin{matrix} +1 & 0 & 0 \\ +0 & 1 & 0 \\ +0 & 0 & 1 +\end{matrix} \right) = I = (C_r)^0 \quad +(C_r)^8 = \left( \begin{matrix} +0 & 1 & 0 \\ +0 & 0 & 1 \\ +1 & 0 & 1 +\end{matrix} \right) = C_r +$$ + +As a reminder, these matrices are taken mod 2, so the elements can only be 0 or 1. The seventh power of *C*~*r*~ is just the identity matrix, meaning that the eighth power is the original matrix. This means that *C*~*r*~ is cyclic of order 7 with respect to self-multiplication mod 2. Along with the zero matrix, this fully characterizes GF(8). + +If we picked *C*~*q*~ instead, we would have gotten different matrices. I'll omit writing them here, but we get the same result: *C*~*q*~ is also cyclic of order 7. Since every nonzero element of the field can be written as a power of the root, the root (and the polynomial) is termed [primitive](https://en.wikipedia.org/wiki/Primitive_polynomial_%28field_theory%29). + + +### Condensing + +Working with matrices directly, as a human, is very cumbersome. While it makes computation explicit, it makes presentation difficult. One of the things in which we know we should be interested is the characteristic polynomial, since it is central to the definition and behavior of the matrices. Let's focus only on the characteristic polynomial for successive powers of *C*~*r*~ + +$$ +C_r = \left( \begin{matrix} +0 & 1 & 0 \\ +0 & 0 & 1 \\ +1 & 0 & 1 +\end{matrix} \right) \mod 2 \\ ~ \\ +\begin{array}{} +\text{charpoly}((C_r)^1) &=& \color{blue} x^3 + x^2 + 1 &=& \color{blue} 1101_x \sim {}_2 13 = r +\\ +\text{charpoly}((C_r)^2) &=& \color{blue} x^3 + x^2 + 1 &=& \color{blue} 1101_x \sim {}_2 13 = r +\\ +\text{charpoly}((C_r)^3) &=& \color{red} x^3 + x + 1 &=& \color{red} 1011_x \sim {}_2 11 = q +\\ +\text{charpoly}((C_r)^4) &=& \color{blue} x^3 + x^2 + 1 &=& \color{blue} 1101_x \sim {}_2 13 = r +\\ +\text{charpoly}((C_r)^5) &=& \color{red} x^3 + x + 1 &=& \color{red} 1011_x \sim {}_2 11 = q +\\ +\text{charpoly}((C_r)^6) &=& \color{red} x^3 + x + 1 &=& \color{red} 1011_x \sim {}_2 11 = q +\\ +\text{charpoly}((C_r)^7) &=& x^3 + x^2 + x + 1 &=& 1111_x \sim {}_2 15 = (x+1)^3 +\end{array} +$$ + +Somehow, even though we start with one characteristic polynomial, the other manages to work its way in here. Both polynomials are of degree 3 and have 3 matrix roots (distinguished in red and blue). + +If we chose to use *C*~*q*~, we'd actually get the same sequence backwards (starting with ~2~11). It's beneficial to remember that 6, 5, and 3 can also be written as 7 - 1, 7 - 2, and 7 - 4. This makes it clear that the powers of 2 (the field characteristic) less than the 8 (the order of the field) play a role with respect to both the initial and terminal items. + + +### Factoring + +Intuitively, you may try using the roots to factor the matrix into powers of *C*~*r*~. This turns out to work: + +$$ +\hat R(X) \overset?= (X - C_r)(X - (C_r)^2)(X - (C_r)^4) \\ +\hat Q(X) \overset?= (X - (C_r)^3)(X - (C_r)^5)(X - (C_r)^6) \\ ~ \\ +\textcolor{red}{ \sigma_1([(C_r)^i]_{i \in [1,2,4]}) } = C_r + (C_r)^2 + (C_r)^4 = \textcolor{red}I \\ +\textcolor{brown}{ \sigma_1([(C_r)^i]_{i \in [3,5,6]}) } = (C_r)^3 + (C_r)^5 + (C_r)^6 = \textcolor{brown}0 \\ ~ \\ +\begin{align*} +\color{blue} \sigma_2([(C_r)^i]_{i \in [1,2,4]}) &= (C_r)(C_r)^2 + (C_r)(C_r)^4 + (C_r)^2(C_r)^4 \\ +&= (C_r)^3 + (C_r)^5 + (C_r)^6 = \color{blue}0 \\ +\color{cyan} \sigma_2([(C_r)^i]_{i \in [3,5,6]}) &= (C_r)^3(C_r)^5 + (C_r)^3(C_r)^6 + (C_r)^5(C_r)^6 \\ +&= (C_r)^8 + (C_r)^9 + (C_r)^{11} \\ +&= (C_r)^1 + (C_r)^2 + (C_r)^4 = \color{cyan} I \\ +\end{align*} +\\ ~ \\ +\textcolor{green}{ \sigma_3([(C_r)^i]_{i \in [1,2,4]}) } = (C_r)(C_r)^2(C_r)^4 = \textcolor{green}I \\ +\textcolor{lightgreen}{ \sigma_3([(C_r)^i]_{i \in [3,5,6]}) }= (C_r)^3(C_r)^5(C_r)^6 = \textcolor{lightgreen}I +\\ ~ \\ +\hat R(X) = X^3 + \textcolor{red}IX^2 + \textcolor{blue}0X + \textcolor{green}I \\ +\hat Q(X) = X^3 + \textcolor{brown}0X^2 + \textcolor{cyan}IX + \textcolor{lightgreen}I +$$ + +We could have factored our polynomials differently if we used *C*~*q*~ instead. However, the effect of splitting both polynomials into monomial factors is the same. + + +GF(16) +------ + +GF(8) is simple to study, but too simple to study the sequence of characteristic polynomials alone. Let's widen our scope to GF(16). There are three irreducible polynomials of degree 3 over GF(2). + +$$ +s(x) = x^4 + x + 1 = 10011_x \sim {}_2 19 \quad +C_s = \left( \begin{matrix} +0 & 1 & 0 & 0 \\ +0 & 0 & 1 & 0 \\ +0 & 0 & 0 & 1 \\ +1 & 1 & 0 & 0 +\end{matrix} \right) \mod 2 +\\ +t(x) = x^4 + x^3 + 1 = 11001_x \sim {}_2 25 \quad +C_t = \left( \begin{matrix} +0 & 1 & 0 & 0 \\ +0 & 0 & 1 & 0 \\ +0 & 0 & 0 & 1 \\ +1 & 0 & 0 & 1 +\end{matrix} \right) \mod 2 +\\ +u(x) = x^4 + x^3 + x^2 + x + 1 = 11111_x \sim {}_2 31 \quad +C_u = \left( \begin{matrix} +0 & 1 & 0 & 0 \\ +0 & 0 & 1 & 0 \\ +0 & 0 & 0 & 1 \\ +1 & 1 & 1 & 1 +\end{matrix} \right) \mod 2 +$$ + +Again, *s* and *t* form a pair under the reversal of their bit strings, while *u* is palindromic. Both *C*~*s*~ and *C*~*t*~ are cyclic of order 15, so *s* and *t* are primitive polynomials. Using *s* = ~2~19 to generate the field, the powers of its companion matrix *C*~*s*~ have the following characteristic polynomials: + +```{python} +#| echo: false + +from IPython.display import Markdown +from tabulate import tabulate + +charpolys = [19, 19, 31, 19, 21, 31, 25, 19, 31, 21, 25, 31, 25, 25, 17] +charpolyformat = lambda x: f"~2~{x}" + +Markdown(tabulate( + [[ + "charpoly((*C*~*s*~)^*m*^)", + *[charpolyformat(charpoly) for charpoly in charpolys] + ]], + headers=["*m*", *[i + 1 for i in range(15)]], +)) +``` + +The polynomial ~2~19 occurs at positions 1, 2, 4, and 8. These are obviously powers of 2, the characteristic of the field. Similarly, the polynomial *t* = ~2~25 occurs at positions 14 (= 15 - 1), 13 (= 15 - 2), 11 (= 15 - 4), and 7 (= 15 - 8). We'd get the same sequence backwards if we used *C*~*t*~ instead, just like in GF(8). + + +### Non-primitive + +The polynomial *u* = ~2~31 occurs at positions 3, 6, 9, and 12 -- multiples of 3, which is a factor of *15*. It follows that the roots of *u* are cyclic of order 5, so this polynomial is irreducible, but *not* primitive. + +Naturally, $\hat U(X)$ can be factored as powers of (*C*~*s*~)^3^. We can also factor it more naively as powers of *C*~*u*~. Either way, we get the same sequence. + +:::: {layout-ncol = "2"} +::: {} +```{python} +#| echo: false +upowers = [31, 31, 31, 31, 17] + +Markdown(tabulate( + [[ + "charpoly((*C*~*s*~)^*3m*^)", + *[f"~2~{charpoly}" for charpoly in charpolys[2::3]] + ], [ + "charpoly((*C*~*u*~)^*m*^)", + *[f"~2~{upower}" for upower in upowers] + ]], + headers=["*m*", *[i + 1 for i in range(5)]], +)) +``` + +Both of the matrices in column 5 happen to be the identity matrix. It follows that this root is only cyclic of order 5. + +The polynomials ~2~19 and ~2~25 are reversals of one another and the sequences that their companion matrices generate end one with another -- in this regard, they are dual. However, ~2~31 = 11111~x~ is a palindrome and its sequence ends where it begins, so it is self-dual. +::: + +::: {width = "33%"} +$$ +(C_u)^1 =\left( \begin{matrix} +0 & 1 & 0 & 0 \\ +0 & 0 & 1 & 0 \\ +0 & 0 & 0 & 1 \\ +1 & 1 & 1 & 1 +\end{matrix} \right) +\\ ~ \\ +(C_u)^2 =\left( \begin{matrix} +0 & 0 & 1 & 0 \\ +0 & 0 & 0 & 1 \\ +1 & 1 & 1 & 1 \\ +1 & 0 & 0 & 0 +\end{matrix} \right) +\\ ~ \\ +(C_u)^3 =\left( \begin{matrix} +0 & 0 & 0 & 1 \\ +1 & 1 & 1 & 1 \\ +1 & 0 & 0 & 0 \\ +0 & 1 & 0 & 0 +\end{matrix} \right) +\\ ~ \\ +(C_u)^4 =\left( \begin{matrix} +1 & 1 & 1 & 1 \\ +1 & 0 & 0 & 0 \\ +0 & 1 & 0 & 0 \\ +0 & 0 & 1 & 0 \\ +\end{matrix} \right) +\\ ~ \\ +(C_u)^5 =\left( \begin{matrix} +1 & 0 & 0 & 0 \\ +0 & 1 & 0 & 0 \\ +0 & 0 & 1 & 0 \\ +0 & 0 & 0 & 1 \\ +\end{matrix} \right) \\ += I = (C_u)^0 +$$ +::: +:::: + + +### Non-irreducible + +In addition to the three irreducibles, a fourth polynomial, ~2~21 = 10101~x~, also appears in the sequence on entries 5 and 10 -- multiples of 5, which is also a factor of 15. Like ~2~31, this polynomial is palindromic. This polynomial is *not* irreducible mod 2, and factors as: + +$$ +{}_2 21 \sim 10101_x = x^4 + x^2 + 1 = (x^2 + x + 1)^2 \mod 2 \\ ~ \\ +(X - (C_s)^5)(X - (C_s)^{10}) = X^2 + ((C_s)^5 + (C_s)^{10})X + (C_s)^{15} \\ += X^2 + IX + I +$$ + +Just like how the fields we construct are powers of a prime, this extra element is a power of a smaller irreducible. This is unexpected, but perhaps not surprising. + +Something a little more surprising is that the companion matrix is cyclic of degree *6*, rather than of degree 3 like the matrices encountered in GF(8). The powers of its companion matrix are: + + + +We can think of the repeated sequence as ensuring that there are enough roots of ~2~21. The Fundamental Theorem of Algebra states that there must be 4 roots. For *numbers*, we'd allow duplicate roots with multiplicities greater than 1, but the matrix roots are all distinct. + +Basic group theory tells us that as a cyclic group, the matrix's first and fifth powers (in red) are pairs of inverses. The constant term of the characteristic polynomial is the product of all four roots and, as a polynomial over matrices, must be some nonzero multiple of the identity matrix. Since the red roots are a pair of inverses, the blue roots are, too. + + +GF(32) +------ + +GF(32) turns out to be special. There are six irreducible polynomials of degree 5 over GF(2). Picking one of them at random, ~2~37, and looking at the polynomial sequence it generates, we see: + +```{python} +#| echo: false +gf32powers = [ + 37, 37, 61, 37, 55, 61, 47, 37, 55, 55, 59, 61, 59, 47, 41, + 37, 61, 55, 47, 55, 59, 59, 41, 61, 47, 59, 41, 47, 41, 41, 51, +] +gf32colors = { + 37: "red", + 61: "blue", + 55: "yellow", + 47: "orange", + 59: "purple", + 41: "green", +} +gf32format = lambda x: f"~2~{x}" + +Markdown(tabulate( + [[ + "charpoly((*C*~*u*~)^*m*^)", + "-", + *[gf32format(gf32power) for gf32power in gf32powers[:15]] + ]], + headers=["*m*", *[i for i in range(16 + 1)]], +)) +``` +```{python} +#| echo: false +Markdown(tabulate( + [[ + "charpoly((*C*~*u*~)^*m*^)", + *[gf32format(gf32power) for gf32power in gf32powers[:-17:-1]] + ]], + headers=["*m*", *[i for i in reversed(range(16, 32))]], +)) +``` + +31 is prime, so we don't have any sub-patterns that appear on multiples of factors. In fact, all six irreducible polynomials are present in this table. The pairs in complementary colors form pairs under reversing the polynomials: +~2~37 and ~2~41, +~2~61 and ~2~47, +and ~2~55 and ~2~59. + +Since their roots have order 31, these polynomials are actually the distinct factors of *x*^31^ - 1 mod 2: + +$$ +x^{31} -1 = (x-1)(x^{30} +x^{29} + ... + x + 1) \\ +(x^{30} +x^{29} + ... + x + 1) = \left\{ +\begin{align*} +&\phantom\cdot (x^5 + x^2 + 1) &\sim \quad {}_2 37 \\ +&\cdot (x^5 + x^3 + 1) &\sim \quad {}_2 41 \\ +&\cdot (x^5 + x^4 + x^3 + x^2 + 1) &\sim \quad {}_2 61 \\ +&\cdot (x^5 + x^3 + x^2 + x + 1) &\sim \quad {}_2 47 \\ +&\cdot (x^5 + x^4 + x^2 + x + 1) &\sim \quad {}_2 55 \\ +&\cdot (x^5 + x^4 + x^3 + x + 1) &\sim \quad {}_2 59 +\end{align*} +\right. +$$ + +This is a feature special to fields of characteristic 2. 2 is the only prime number whose powers can be one more than another prime, since all other prime powers are one more than even numbers. 31 is a [Mersenne prime](https://en.wikipedia.org/wiki/Mersenne_prime), so all integers less than 31 are coprime to it. Thus, there is no room for the "extra" entries we observed in GF(16) which that occurred on factors of 15 = 16 - 1. No entry can be irreducible (but not primitive) or the power of an irreducible of lower degree. In other words, *only primitive polynomials exist of degree* p *if 2^p^ - 1 is a Mersenne prime*. + + +### Counting Irreducibles + +The remark about coprimes to 31 may inspire you to think of the [totient function](https://en.wikipedia.org/wiki/Euler%27s_totient_function). We have *φ*(2^5^ - 1) = 30 = 5⋅6, where 5 is the degree and 6 is the number of primitive polynomials. We also have *φ*(24 - 1) = 8 = 4⋅2 and *φ*(23 - 1) = 6 = 3⋅2. In general, it is true that there are *φ*(*pm* - 1) / *m* primitive polynomials of degree m over GF(p). + + +Polynomial Reversal +------------------- + +We've only been looking at fields of characteristic 2, where the meaning of "palindrome" and "reversed polynomial" is intuitive. Let's look at an example over characteristic 3. One primitive of degree 2 is ~3~14, which gives rise to the following sequence over GF(9): + +```{python} +#| echo: false +gf9powers = [14, 10, 14, 16, 17, 10, 17, 13] +gf9format = lambda x: f"~3~{x}" + +Markdown(tabulate( + [[ + "charpoly((*C*~*14*~)^*m*^)", + *[gf9format(gf9power) for gf9power in gf9powers] + ]], + headers=["*m*", *[i + 1 for i in range(8)]], +)) +``` + +The table suggests that ~3~14 = 112~x~ = x^2^ + x + 2 and ~3~17 = 122~x~ = x^2^ + 2x + 2 are reversals of one another. More naturally, you'd think that 112~x~ reversed is 211~x~. But remember that we prefer to work with monic polynomials. By multiplying the polynomial by the multiplicative inverse of the leading coefficient (in this case, 2), we get 422~x~ ≡ 122~x~ mod 3. This is a rule that applies over larger characteristics in general. + +Note that ~3~16 is 121~x~ = x^2^ + 2x + 1 and ~3~13 = 111~x~ = x^2^ + x + 1 = x^2^ - 2x + 1, both of which have factors over GF(3). + + +Power Graphs +------------ + +We can study the interplay of primitives, irreducibles, and their powers by converting our sequences into (directed) graphs. Each node in the graph represents a characteristic polynomial that appears over the field; call the one under consideration *a*. If the sequence of polynomials generated by *C*~*a*~ contains contains another polynomial *b*, then there is an edge from *a* to *b*. + +We can do this for every GF(*p*^*m*^). Let's start with the first few fields of characteristic 2. We get the following graphs: + +![]() + +All nodes connect to the node corresponding to the identity matrix, since all roots are cyclic. Also, since all primitive polynomials are interchangeable with one another, they are all interconnected and form a [complete](https://en.wikipedia.org/wiki/Complete_graph) clique. This means that, excluding the identity node, the graphs for fields of order one more than a Mersenne prime are just the complete graphs. + +Since all of the graphs share the identity node as a feature -- a node with incoming edges from every other node -- its convenient to omit it. Here are a few more of these graphs after doing so, over fields of other characteristics: + +:::: {} +::: {} +![]() +GF(9) +::: + +::: {} +![]() +GF(25) +::: + +::: {} +![]() +GF(49) +::: + +::: {} +![]() +GF(121) +::: + +::: {} +![]() +GF(27) +::: + +::: {} +![]() +GF(125) +::: + +::: {} +![]() +GF(343) +::: +:::: + + +### Spectra + +Again, since visually interpreting graphs is difficult, we can study an invariant. From these graphs of polynomials, we can compute *their* characteristic polynomials (to add another layer to this algebraic cake) and look at their spectra. + +It turns out that a removing a fully-connected node (like the one for the identity matrix) has a simple effect on characteristic polynomial of a graph: it just removes a factor of *x*. Here are a few of the (identity-reduced) spectra, arranged into a table. + +| Characteristic | Order | Spectrum | Remark +| ---------------|-------|--------------------------------------|---------- +| 2 | 4 | 0 | +| | 8 | -1, 1 | Mersenne +| | 16 | 0^2^, -1, 1 | +| | 32 | -1^5^, 5 | Mersenne +| 3 | 9 | 0^2^, -1, 1 | +| | 27 | 0, -1^6^, 3^2^ | Pseudo-Mersenne? +| 5 | 25 | 0^3^, -1^6^, 1^3^, 3 | +| | 125 | 0, -1^38^, 1, 9^2^, 19 | Prime power in spectrum +| 7 | 49 | 0^2, -1^17^, 1^4^, 3^2^, 7 | +| | 343 | 0, -1^106^, 1^4^, 5^2^, 11^2^, 35^2^ | Composite in spectrum +| 11 | 121 | 0^4^, -1^49^, 1^2^, 3^6^, 7^2^, 15 | Composite in spectrum + +Incredibly, all spectra shown are composed exclusively of integers, and thus, each of these graphs are integral graphs. Moreover, it does not appear that any integer sequences that one may try extracting from this table (for example, the multiplicity of -1) can be found in the [Online Encyclopedia of Integer Sequences](https://oeis.org/). + +From what I was able to tell, the following subgraphs were *also* integral over the range I tested: + +- the induced subgraph of vertices corresponding to non-primitives +- the complement of the previous graph with respect to the whole graph +- the induced subgraph of vertices corresponding only to irreducibles + +Unfortunately, proving any such relationship is out of the scope of this post (and my abilities). + + +Closing +------- + +This concludes the first foray into using matrices as elements of prime power fields. It is a subject which, using the tools of linear algebra, makes certain aspects of field theory more palatable and constructs some objects with fairly interesting properties. + +One of the most intriguing parts to me is the sequence of polynomials generated by a companion matrix. Though I haven't proven it, I suspect that it suffices to study only the sequence generated by a primitive polynomial. It seems to be possible to get the non-primitive sequences by looking at the subsequences where the indices are multiples of a factor of the length of the sequence. But this means that the entire story about polynomials and finite fields can be foregone entirely, and the problem instead becomes one of number theory. + +The [next post]() will focus on an "application" of matrix roots to other areas of abstract algebra. Diagrams made with Geogebra and NetworkX (GraphViz). diff --git a/posts/finite-field/3/index.qmd b/posts/finite-field/3/index.qmd new file mode 100644 index 0000000..f51aa19 --- /dev/null +++ b/posts/finite-field/3/index.qmd @@ -0,0 +1,900 @@ +--- +format: + html: + html-math-method: katex +jupyter: python3 +--- + + + + +Exploring Finite Fields, Part 3: Roll a d20 +=========================================== + +In the [previous post](), we focused on constructing finite fields using *n*×*n* matrices. These matrices came from from primitive polynomials of degree *n* over GF(*p*), and could be used to do explicit arithmetic over GF(*p*^*n*^). In this post, we'll look at a way to apply this in describing certain groups. + + +Weakening the Field +------------------- + +Recall the way we defined GF(4) in the first post. We took the irreducible polynomial *p*(*x*) = *x*^2^ + *x* + 1, called its root *α*, and created addition and multiplication tables spanning the four elements. After the second post, we can do this more cleverly by mapping *α* to the companion matrix *C*~*p*~ over GF(2). + +$$ +f : \mathbb{F_4} \longrightarrow \mathbb{F}_2 {}^{2 \times 2} +\\ ~ \\ +0 \mapsto \left(\begin{matrix} +0 & 0 \\ 0 & 0 +\end{matrix}\right) +\quad +1 \mapsto \left(\begin{matrix} +1 & 0 \\ 0 & 1 +\end{matrix}\right) = I +\quad +\alpha \mapsto \left(\begin{matrix} +0 & 1 \\ 1 & 1 +\end{matrix}\right) = C_p +\\ ~ \\ +\textcolor{red}{\alpha} + \textcolor{blue}{1} = \alpha^2 \mapsto +\left(\begin{matrix} +1 & 1 \\ 1 & 0 +\end{matrix}\right) = +\textcolor{red} { +\left(\begin{matrix} +0 & 1 \\ 1 & 1 +\end{matrix}\right) +} ++ +\textcolor{blue}{ +\left(\begin{matrix} +1 & 0 \\ 0 & 1 +\end{matrix}\right) +}\mod 2 +$$ + +In the images of *f*, the zero matrix has determinant 0 and all other elements have determinant 1. Therefore, the product of any two nonzero matrices always has determinant 1, and a nonzero determinant means the matrix is invertible. This means that the non-zero elements of the field form their own group with respect to multiplication. Here, they form a cyclic group of order 3, since *C*~*p*~^3^ = *I* mod 2. This is also true using symbols, and we've already agreed that *α*^3^ = 1. + + +### Other Matrices + +Essentially, this means we can extend a polynomial into new structures by evaluating it in certain ways: + +However, there are more 2×2 matrices over GF(2) than just these. There are two possible values in four locations, so there are 24 = 16 matrices, or 12 more than we've identified. + +$$ +\begin{array}{c|c} +\#\{a_{ij} = 1\} & \det = 0 & \det = 1 \\ +\hline +1 & +\left(\begin{matrix} +0 & 1 \\ 0 & 0 +\end{matrix}\right) +\quad +\left(\begin{matrix} +1 & 0 \\ 0 & 0 +\end{matrix}\right) +\quad +\left(\begin{matrix} +0 & 0 \\ 0 & 1 +\end{matrix}\right) +\quad +\left(\begin{matrix} +0 & 0 \\ 1 & 0 +\end{matrix}\right) +\\ +2 & +\left(\begin{matrix} +1 & 1 \\ 0 & 0 +\end{matrix}\right) +\quad +\left(\begin{matrix} +0 & 0 \\ 1 & 1 +\end{matrix}\right) +\quad +\left(\begin{matrix} +0 & 1 \\ 0 & 1 +\end{matrix}\right) +\quad +\left(\begin{matrix} +1 & 1 \\ 0 & 0 +\end{matrix}\right) +& \textcolor{red}{ +\left(\begin{matrix} +0 & 1 \\ 1 & 0 +\end{matrix}\right) +} +\\ +3 & & +\textcolor{red}{ +\left(\begin{matrix} +1 & 1 \\ 0 & 1 +\end{matrix}\right) +} +\quad +\textcolor{red}{ +\left(\begin{matrix} +1 & 0 \\ 1 & 1 +\end{matrix}\right) +} +\\ +4 & +\left(\begin{matrix} +1 & 1 \\ 1 & 1 +\end{matrix}\right) +\end{array} +$$ + +The matrices in the right column (in red) have determinant 1, which means they can *also* multiply with our field-like elements without producing a singular matrix. This forms a larger group, of which our field's multiplication group is a subgroup. However, it is *not* commutative, since matrix multiplication is not commutative in general. + +The group of all six matrices of determinant 1 is called the [*general linear group*](https://en.wikipedia.org/wiki/General_linear_group) of degree 2 over GF(2), written GL(2, 2). We can sort the elements into classes by their order, or the number of times we have to multiply them before getting to the identity matrix (mod 2): + +$$ +\begin{array}{} +\text{Order 1} & \text{Order 2} & \text{Order 3} \\ +\hline +\left(\begin{matrix} +1 & 0 \\ +0 & 1 +\end{matrix}\right) +& +\begin{align*} +\left(\begin{matrix} +1 & 1 \\ +0 & 1 +\end{matrix}\right) +\\ +\left(\begin{matrix} +1 & 0 \\ +1 & 1 +\end{matrix}\right) +\\ +\left(\begin{matrix} +0 & 1 \\ +1 & 0 +\end{matrix}\right) +\end{align*} +& +\begin{align*} +\left(\begin{matrix} +0 & 1 \\ +1 & 1 +\end{matrix}\right) +\\ +\left(\begin{matrix} +1 & 1 \\ +1 & 0 +\end{matrix}\right) +\end{align*} +\end{array} +$$ + +If you've studied enough group theory, you know that there are two groups of order 6: the cyclic group of order 6, *C*~6~, and the symmetric group on three elements, *S*~3~. Since the former group has order-6 elements, this group must be isomorphic to the latter. Since the group is small, it's not too difficult to construct an isomorphism between the two. Writing the elements of *S*~3~ in [cycle notation](), we have: + +$$ +e \mapsto \left(\begin{matrix} +1 & 0 \\ +0 & 1 +\end{matrix}\right) +\\ ~ \\ +(1 ~ 2) \mapsto \left(\begin{matrix} +1 & 1 \\ +0 & 1 +\end{matrix}\right) +\qquad +(1 ~ 3) \mapsto \left(\begin{matrix} +1 & 0 \\ +1 & 1 +\end{matrix}\right) +\qquad +(2 ~ 3) \mapsto \left(\begin{matrix} +0 & 1 \\ +1 & 0 +\end{matrix}\right) +\\ ~ \\ +(1 ~ 2 ~ 3) \mapsto \left(\begin{matrix} +0 & 1 \\ +1 & 1 +\end{matrix}\right) +\qquad +(3 ~ 2 ~ 1) \mapsto \left(\begin{matrix} +1 & 1 \\ +1 & 0 +\end{matrix}\right) +$$ + + +Bigger Linear Groups +-------------------- + +Of course, there is nothing special about GF(2) in this definition. For any field *K*, the general linear group GL(*n*, *K*) is composed of invertible *n*×*n* matrices under matrix multiplication. + +For fields other than GF(2), a matrix can have a determinant other than 1. Since the determinant is multiplicative, the product of two determinant 1 matrices also has determinant 1. Therefore, the general linear group has a subgroup, the [*special linear group*](https://en.wikipedia.org/wiki/Special_linear_group) SL(*n*, *K*), consisting of these matrices. + + +
+ +Haskell implementation of GL and SL for prime fields + +This implementation will be based on the `Matrix` type from the first post. Assume we have already defined matrix multiplication and addition. + +```{.haskell} +data Matrix a = Mat { unMat :: Array (Int, Int) a } + +-- instance Functor Matrix +-- instance Num a => Num (Matrix a) + +-- Partition a list into lists of length n +reshape :: Int -> [a] -> [[a]] +reshape n = unfoldr (reshape' n) where + reshape' n x = if null x then Nothing else Just $ splitAt n x + +-- Convert list of lists to Matrix +-- Abuses listArray working across rows, then columns +toMatrix :: [[a]] -> Matrix a +toMatrix l = Mat $ listArray ((0,0),(n-1,m-1)) $ concat l where + m = length $ head l + n = length l + +-- Convert Matrix to list of lists +fromMatrix :: Matrix a -> [[a]] +fromMatrix (Mat m) = let (_,(_,n)) = bounds m in reshape (n+1) $ elems m +``` + +With helper functions out of the way, we can move on to generating all matrices (mod *n*) before filtering for matrices with nonzero determinant (in the case of GL) and determinant 1 (in the case of SL). + +```{.haskell} +allMatrices :: Int -> Int -> Matrix Int +-- All m x m matrices (mod n) +allMatrices m n = map toMatrix $ sequence $ replicate m vectors where + -- Construct all vectors mod n using base-n expansions and padding + vectors = [pad $ coeffs $ asPoly n l | l <- [1..n^m-1]] + -- Pad xs to length m with zero + pad xs = xs ++ replicate 0 (m - length xs) + +-- All matrices, but paired with their determinants +matsWithDets :: Int -> Int -> [(Matrix Int, Int)] +matsWithDets m n = map (\x -> (x, determinant x `mod` n)) $ allMatrices m n + +-- Nonzero determinants +mGL m n = map fst $ filter (\(x,d) -> d /= 0) $ matsWithDets' m n +-- Determinant is 1 +mSL m n = map fst $ filter (\(x,d) -> d == 1) $ matsWithDets' m n +``` +
+ + +### Projectivity + +Another important matrix group is the [*projective general linear group*](https://en.wikipedia.org/wiki/Projective_linear_group), PGL(*n*, *K*). In this group, two matrices are considered equal if one is a scalar multiple of the other. Equivalently, the elements *are* these equivalence classes, and the product of two classes is the set of all possible products of items from one class with items from the other. + +Both this and the determinant 1 constraint can apply at the same time, forming the *projective special linear group*, PSL(*n*, *K*). + +For GF(2), all of these groups are the same, since the only nonzero determinant and scalar multiple is 1. Therefore, it's beneficial to contrast SL and PGL with another example. + +Let's arbitrarily examine GL(2, 5). Since 4 squares to 1 (mod 5) and we're working with 2×2 matrices, the determinant is unchanged when a matrix is scalar-multiplied by 4. These multiples are identified in PSL. On the other hand, in PGL, there are classes of matrices with determinant 2 and 3, which do not square to 1. These classes are exactly the ones which are "left out" of PSL. + +$$ +\begin{matrix} +\boxed{ +\begin{gather*} +\large \text{GL}(2, 5) +\\ +\underset{\det = 4}{ +\left(\begin{matrix} +0 & 1 \\ +1 & 1 +\end{matrix} \right) +}, +\textcolor{red}{ +\underset{\det = 1}{ +\left(\begin{matrix} +0 & 2 \\ +2 & 2 +\end{matrix} \right) +}}, +\underset{\det = 2}{ +\left(\begin{matrix} +1 & 0 \\ +0 & 2 +\end{matrix} \right) +}, +\underset{\det = 3}{ +\left(\begin{matrix} +2 & 0 \\ +0 & 4 +\end{matrix} \right) +}, +... +\end{gather*} +} +& \twoheadrightarrow & +\boxed{ +\begin{gather*} +\large \text{PGL}(2,5) +\\ +\underset{\det = 1, ~4}{ +\textcolor{red}{ +\left\{ +\left(\begin{matrix} +0 & 1 \\ +1 & 1 +\end{matrix} \right), +\left(\begin{matrix} +0 & 2 \\ +2 & 2 +\end{matrix} \right), +... +\right\} +}} +\\ +\underset{\det = 2, ~ 3}{ +\left\{ +\left(\begin{matrix} +1 & 0 \\ +0 & 2 +\end{matrix} \right), +\left(\begin{matrix} +2 & 0 \\ +0 & 4 +\end{matrix} \right), +... +\right\} +} \\ +... +\end{gather*} +} +\\ ~ \\ +\boxed{ +\begin{gather*} +\large \text{SL}(2,5) +\\ +\textcolor{red}{ +\left(\begin{matrix} +0 & 2 \\ +2 & 2 +\end{matrix} \right) +}, +\left(\begin{matrix} +0 & 3 \\ +3 & 3 +\end{matrix} \right), +... +\end{gather*} +} +& \twoheadrightarrow & +\boxed{ +\begin{gather*} +\large \text{PSL}(2,5) +\\ +\textcolor{red}{ +\left\{ +\left(\begin{matrix} +0 & 2 \\ +2 & 2 +\end{matrix} \right), +\left(\begin{matrix} +0 & 3 \\ +3 & 3 +\end{matrix} \right), +... +\right\} +} +... +\end{gather*} +} +\end{matrix} +$$ + +
+ +Haskell implementation of PGL and PSL for prime fields + +PGL and PSL require special equality. It's certainly possible to write a definition which makes the classes explicit, as its own new type. We could then define equality on this type through `Eq`. This is rather inefficient, though, so I'll choose to work with the representatives instead. + +```{.haskell} +import Data.List (nubBy) + +scalarTimes :: Int -> Int -> Matrix Int -> Matrix Int +-- Scalar-multiply a matrix (mod p) +scalarTimes n k = fmap ((`mod` n) . (*k)) + +projEq :: Int -> Matrix Int -> Matrix Int -> Bool +-- Construct all scalar multiples mod n, then check if ys is any of them. +-- This is ludicrously inefficient, and only works for fields. +projEq n xs ys = ys `elem` [scalarTimes n k xs | k <- [1..n-1]] + +-- Strip out duplicates in GL and SL with projective equality +mPGL m n = nubBy (projEq n) $ mGL m n +mPSL m n = nubBy (projEq n) $ mSL m n +``` +
+ + +### Exceptional Isomorphisms + +When *K* is a finite field, the smaller PSLs turn out specify some interesting groups. We've studied the case of PSL(2, 2) being isomorphic to *S*~3~ already, but it is also the case that: + +$$ +\begin{align*} +&\text{PSL}(2,3) \cong A_4 & & \text{(order 24)} +\\ ~ \\ +&\text{PSL}(2,4) \cong \text{PSL}(2,5) \cong A_5 & & \text{(order 60)} + \\ ~ \\ +&\text{PSL}(2,7) \cong \text{PSL}(3,2) & & \text{(order 168)} +\end{align*} +$$ + +These relationships can be proven abstractly (and frequently are!). However, I always found myself wanting. For PSL(2, 3) and *A*~4~, it's trivial to assign elements to one another by hand. But *A*~5~ is getting untenable, to say nothing of PSL(2, 7). In these circumstances, it's a good idea to leverage the computer. + + +Warming Up: *A*~5~ and PSL(2, 5) +-------------------------------- + +*A*~5~, the alternating group on 5 elements, is composed of the [even](https://en.wikipedia.org/wiki/Parity_of_a_permutation) permutations of 5 elements. It also happens to describe the rotations of an icosahedron. Within the group, there are three kinds of elements: + +- The product of two 2-cycles, such as a = (1 2)(3 4) + - On an icosahedron, this corresponds to a 180 degree rotation (or more precisely, 1/2 of a turn) about an edge +- 5-cycles, such as b = (1 2 3 4 5) + - This corresponds to a 72 degree rotation (1/5 of a turn) around the center of a face +- 3-cycles, such as ab = (2 4 5) + - This corresponds to a 120 degree rotation (1/3 of a turn) around a vertex + +It happens to be the case that all elements of the group can be expressed as a product between *a* and *b* -- they generate the group. + + +### Mapping to Matrices + +To create a correspondence with PSL(2, 5), we need to identify permutations with matrices. Obviously, the identity permutation goes to the identity matrix. Then, since *a* and *b* generate the group, we can search for two matrices which obey the same relations (under projective equality, since we're working in PSL). One such correspondence is: + +$$ +\begin{array}{} +\begin{gather*} +A = \left(\begin{matrix} +1 & 1 \\ +3 & 4 +\end{matrix} \right) +\qquad +A^2 = \left(\begin{matrix} +4 & 0 \\ +0 & 4 +\end{matrix}\right) = +4 \left(\begin{matrix} +1 & 0 \\ +0 & 1 +\end{matrix}\right) +\qquad +\end{gather*} +\\ ~ \\ \hline \\ +\begin{gather*} +B = \left(\begin{matrix} +0 & 2 \\ +2 & 2 +\end{matrix} \right) +\qquad +B^2 = \left(\begin{matrix} +4 & 4 \\ +4 & 3 +\end{matrix}\right) +\qquad +B^3 = \left(\begin{matrix} +3 & 1 \\ +1 & 4 +\end{matrix}\right) +\\ +B^4 = \left(\begin{matrix} +2 & 3 \\ +3 & 0 +\end{matrix}\right) +\qquad +B^5 = \left(\begin{matrix} +1 & 0 \\ +0 & 1 +\end{matrix}\right) +\end{gather*} +\\ ~ \\ \hline \\ +\begin{gather*} +(AB) = \left(\begin{matrix} +2 & 4 \\ +3 & 4 +\end{matrix} \right) +\qquad +(AB)^2 = \left(\begin{matrix} +1 & 4 \\ +3 & 3 +\end{matrix}\right) +\qquad +(AB)^3 = \left(\begin{matrix} +4 & 0 \\ +0 & 4 +\end{matrix}\right) +\end{gather*} = +4 \left(\begin{matrix} +1 & 0 \\ +0 & 1 +\end{matrix}\right) +\end{array} +$$ + + +
+ +Haskell implementation using B as a generator to find candidates for A + +```{.haskell} +orderWith :: Eq a => (a -> a -> a) -> (a -> Bool) -> a -> Int +-- Repeatedly apply f to p, until the predicate z +-- (usually equality to some quantity) becomes True. +-- Get the length of the resulting list +orderWith f z p = (+1) $ length $ takeWhile (not . z) $ iterate (f p) p + +-- Order with respect to PSL(2, 5): using matrix multiplication (mod 5) +-- and projective equality to the identity matrix +orderPSL25 = orderWith (\x -> fmap (`mod` 5) . (x |*|))) (projEq 5 $ eye 2) + +-- Only order 2 elements of PSL(2, 5) +psl25_order2 = filter ((==2) . orderPSL25) $ mPSL 2 5 + +-- Start with B as a generator +psl25_gen_B = toMatrix [[0,2],[2,2]] + +-- Find an order 2 element whose product with `psl25_gen_B` has order 3 +psl25_gen_A_candidates = filter ((==3) . orderPSL25 . (psl25_gen_B |*|)) \ + psl25_order2 + +-- Candidate matrices: +-- +-- [1,1] +-- [3,4] +-- +-- [1,3] +-- [1,4] +-- +-- [2,0] +-- [0,3] +-- +-- [2,0] +-- [4,3] +-- +-- [2,4] +-- [0,3] +``` + +If you're unsatisfied with starting from *B*, realize that we could have filtered out only the order 5 elements of PSL(2, 5) (`filter ((==5) . psl25Order) $ mPSL 2 5`), and picked any element from this list to start. +
+ + +We now have a correspondence between three elements of *A*~5~ and PSL(2, 5). We can "run" both sets of the generators until we associate all elements to one another. This is most visually appealing to see as a Cayley graph: + +::: {} +![]() +Cayley graph showing an isomorphism between A5 and PSL(2, 5).
+Order-2 elements are red, order-3 elements are green, and order-5 elements are blue.
+Purple arrows are order-5 generators, orange arrows are order-2 generators. +::: + + +PSL(2, 4) +--------- + +We could do the same for PSL(2, 4), but we can't just work modulo 4 -- remember, the elements of GF(4) are 0, 1, *α*, and *α*^2^. It follows that GL(2, 4) is composed of (invertible) matrices of those elements, and SL(2, 4) is composed of matrices with determinant 1. + +$$ +\begin{matrix} +\boxed{ +\begin{gather*} +\large \text{GL}(2, 4) +\\ +\textcolor{red}{ +\underset{\det = 1}{ +\left(\begin{matrix} +0 & 1 \\ +1 & 1 +\end{matrix} \right) +}}, +\underset{\det = \alpha + 1}{ +\left(\begin{matrix} +0 & \alpha \\ +\alpha & \alpha +\end{matrix} \right) +}, +\underset{\det = \alpha}{ +\left(\begin{matrix} +1 & 0 \\ +0 & \alpha +\end{matrix} \right) +}, +\textcolor{red}{ +\underset{\det = 1}{ +\left(\begin{matrix} +\alpha & 0 \\ +0 & \alpha^2 +\end{matrix} \right) +}}, +... +\end{gather*} +} +\\ ~ \\ +\boxed{ +\begin{gather*} +\large \text{SL}(2,4) +\\ +\textcolor{red}{ +\left(\begin{matrix} +0 & 1 \\ +1 & 1 +\end{matrix} \right) +}, +\textcolor{red}{ +\left(\begin{matrix} +\alpha & 0 \\ +0 & \alpha^2 +\end{matrix} \right) +}, +... +\end{gather*} +} +\end{matrix} +$$ + +Scalar multiplication by *α* multiplies the determinant by *α*^2^; by *α*^2^ multiplies the determinant by *α*^4^ = *α*. Thus, SL(2, 4) is also PSL(2, 4), since no scalar multiple has determinant 1. + +Let's start by looking at an order-5 matrix over PSL(2, 4). We'll call this matrix *B*' to correspond with our order-5 generator in PSL(2, 5). + +$$ +\begin{gather*} + B' = \left(\begin{matrix} + 0 & \alpha \\ + \alpha^2 & \alpha^2 + \end{matrix} \right) + \qquad + (B')^2 = \left(\begin{matrix} + 1 & 1 \\ + \alpha & \alpha^2 + \end{matrix}\right) + \qquad + (B')^3 = \left(\begin{matrix} + \alpha^2 & 1 \\ + \alpha & 1 + \end{matrix}\right) + \\ + (B')^4 = \left(\begin{matrix} + \alpha^2 & \alpha \\ + \alpha^2 & 0 + \end{matrix}\right) + \qquad + (B')^5 = \left(\begin{matrix} + 1 & 0 \\ + 0 & 1 + \end{matrix}\right) +\\ ~ \\ +\det B' = 0\alpha^2 - \alpha^3 = 1 +\end{gather*} +$$ + + +We need to be able to do three things over GL(2, 4) on a computer: + +- multiply matrices over GF(4), +- compare those matrices, +- compute their determinant, and +- be able to systematically write down all of them + +It would then follow for us to repeat what we did with with SL(2, 5). But as I've said, working symbolically is hard for computers, and the methods described for prime fields do not work in general with prime power fields. Fortunately, we're amply prepared to find a solution. + + +### Bootstrapping Matrices + +Recall that the elements of GF(4) can also be written as the zero matrix, the identity matrix, *C*~*p*~, and *C*~*p*~^2^ (where *C*~*p*~ is the companion matrix of *p*(x) and again, *p*(x) = *x*^2^ + *x* + 1). This means we can also write elements of GL(2, 4) as matrices *of matrices*. Arithmetic works exactly the same as it does symbolically -- we just replace all instances of *α* in *B*' with *C*~*p*~. + +$$ +f^* : \mathbb{F}_4 {}^{2 \times 2} \rightarrow (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2} +\\ ~ \\ +\begin{align*} + \bar {B'} = f^*(B') &= \left(\begin{matrix} + f(0) & f(\alpha) \\ + f(\alpha^2) & f(\alpha^2) + \end{matrix} \right) = \left(\begin{matrix} + {\bf 0} & C_p \\ + C_p {}^2 & C_p {}^2 + \end{matrix} \right) \\ + &= \left(\begin{matrix} + \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) & + \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \\ + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) & + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) + \end{matrix} \right) + \\ ~ \\ + (f^*(\bar B'))^2 &= \left(\begin{matrix} + ({\bf 0})({\bf 0}) + C_p {}^3 & ({\bf 0})C_p +C_p {}^3 \\ + ({\bf 0})C_p {}^2 + C_p {}^4 & C_p {}^3 + C_p {}^4 + \end{matrix} \right) \\ + &= \left(\begin{matrix} + I & I \\ + C_p {} & C_p {}^2 + \end{matrix} \right) = \left(\begin{matrix} + f(1) & f(1) \\ + f(\alpha) & f(\alpha^2) + \end{matrix} \right) = + f^*((\bar B')^2) +\end{align*} +$$ + +Make no mistake, this is *not* a [block matrix](https://en.wikipedia.org/wiki/Block_matrix), at least not a typical one. Namely, the layering means that the determinant (which signifies its membership in SL) is another matrix: + +$$ +\begin{align*} + \det( f^*(B') ) &= {\bf 0} (C_p {}^2) - (C_p)(C_p {}^2) \\ + &= + \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) - + \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) \\ + &= \left(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix} \right) \mod 2 \\ + &= I = f(\det(B')) +\end{align*} +$$ + +Since *B*' is in SL(2, 4), the determinant is unsurprisingly *f*(1) = I. The (matrix) determinants of *f*\* applied to other elements of GL(2, 4) could just as well be *f*(*α*) = *C*~*p*~ or *f*(*α*^2^) = *C*~*p*~^2^. + + +### Implementation + +Using this method, we can implement PSL(2, 4) directly. All we need to do is find all possible 4-tuples of **0**, *I*, *C*~*p*~, and *C*~*p*~^2^, then arrange each into a 2x2 matrix. Multiplication follows from the typical definition and the multiplicative identity is just *f*\*(*I*). + + +
+ +Haskell implementation of PSL(2, 4) + +```{.haskell} +import Data.List (findIndex) + +-- Matrices which obey the same relations as the elements of GF(4) +zero_f4 = zero 2 +one_f4 = eye 2 +alpha_f4 = toMatrix [[0,1],[1,1]] +alpha2_f4 = toMatrix [[1,1],[1,0]] + +-- Gathered into a list +field4 = [zero_f4, one_f4, alpha_f4, alpha2_f4] + +-- Convenient show function for these matrices +showF4 x = case findIndex (==x) field4 of + Just 0 -> "0" + Just 1 -> "1" + Just 2 -> "α" + Just 3 -> "α^2" + Nothing -> "N/A" + +-- Identity matrix over GF(4) +psl_24_identity = toMatrix [[one_f4, zero_f4], [zero_f4, one_f4]] + +-- All possible matrices over GF(4) +-- Create a list of 4-lists of elements from GF(4), then +-- Shape them into 2x2 matrices +f4_matrices = map (toMatrix . reshape 2) $ sequence $ replicate 4 field4 + +-- Sieve out those which have a determinant of 1 in the field +mPSL24 = filter ((==one_f4) . (fmap (`mod` 2)) . laplaceDet) $ f4_matrices +``` +
+ +Now that we can generate the group, we can finally repeat what we did with PSL(2, 5). All we have to do is filter out order-2 elements, then further filter for those which have an order-3 product with *B*'. + +
+ +Haskell implementation using *B*' as a generator to find candidates for *A*' + +```{.haskell} +-- Order with respect to PSL(2, 4): using matrix multiplication (mod 2) +-- and projective equality to the identity matrix +orderPSL24 = orderWith (\x -> fmap (fmap (`mod` 2)) . (x*))) (== psl_24_identity) + +-- Only order 2 elements of PSL(2, 4) +psl24_order2 = filter ((==2) . orderPSL24) $ mPSL24 + +-- Start with B as a generator +psl24_gen_B = toMatrix [[zero_f4, alpha_f4], [alpha2_f4, alpha2_f4]] + +-- Find an order 2 element whose product with `psl24_gen_B` has order 3 +psl24_gen_A_candidates = filter ((==3) . orderPSL24 . (psl24_gen_B*)) + psl24_order2 + +-- Candidate matrices: +-- +-- ["0","1"] +-- ["1","0"] +-- +-- ["0","α^2"] +-- ["α","0"] +-- +-- ["1","0"] +-- ["1","1"] +-- +-- ["1","α^2"] +-- ["0","1"] +-- +-- ["α","1"] +-- ["α","α"] +``` +
+ +Finally, we can decide on an *A*', the order-2 generator with the properties we wanted. + +$$ +\begin{array}{} + \begin{gather*} + A' = \left(\begin{matrix} + 0 & \alpha^2 \\ + \alpha & 0 + \end{matrix} \right) + \qquad + (A')^2 = \left(\begin{matrix} + 1 & 0 \\ + 0 & 1 + \end{matrix}\right) + \end{gather*} + \\ ~ \\ \hline \\ + \begin{gather*} + A'B' = + \left(\begin{matrix} + \alpha & \alpha \\ + 0 & \alpha^2 + \end{matrix} \right) + \qquad + (A'B')^2 = + \left(\begin{matrix} + \alpha^2 & \alpha \\ + 0 & \alpha + \end{matrix} \right) + \qquad + (A'B')^3 = + \left(\begin{matrix} + 1 & 0 \\ + 0 & 1 + \end{matrix} \right) + \qquad + \end{gather*} +\end{array} +$$ + +Then, we can arrange them on a Cayley graph in the same way as PSL(2, 5): + + +::: {} +![]() +Cayley graph showing an isomorphism between *A*^5^ and PSL(2, 4).
+Colors indicate the same thing as in the previous diagram. +::: + + +Closing +------- + +This post addresses my original goal in implementing finite fields, namely computationally finding an explicit map between *A*^5^ and PSL(2, 4). I believe the results are a little more satisfying than attempting to wrap your head around group-theoretic proofs. That's not to discount the power and astounding amount of work that goes into the latter method. It does tend to leave things rather opaque, however. + +If you'd prefer a more interactive diagram showing the above isomorphisms, I've gone to the liberty of creating a hoverable SVG: + +![]() + +This post slightly diverts our course from the previous one's focus on fields. The [next one]() will focus on more results regarding the treatment of layered matrices. The algebraic consequences of this structure are notable in and of themselves, and are entirely obfuscated by the usual interpretation of block matrices. + +Diagrams created with Geogebra and Inkscape. diff --git a/posts/finite-field/4/index.qmd b/posts/finite-field/4/index.qmd new file mode 100644 index 0000000..b4cb072 --- /dev/null +++ b/posts/finite-field/4/index.qmd @@ -0,0 +1,636 @@ +--- +format: + html: + html-math-method: katex +jupyter: python3 +--- + + +Exploring Finite Fields, Part 4: The Power of Forgetting +======================================================== + +The [last post]() in this series focused on understanding some small linear groups and implementing them on the computer over both a prime field and prime power field. + +The prime power case was particularly interesting. First, we adjoined the roots of a polynomial to the base field, GF(2). Rather than the traditional means of adding new symbols like *α*, we used companion matrices, which behave the same arithmetically. For example, for the smallest prime power field, GF(4), we use the polynomial *p*(*x*) = *x*^2^ + *x* + 1, and map its symbolic roots (*α* and *α*^2^), to matrices over GF(2): + +$$ +f : \mathbb{F}_4 \longrightarrow \mathbb{F}_2 {}^{2 \times 2} +\\ ~ \\ +\begin{gather*} +f(0) = {\bf 0} = \left(\begin{matrix}0 & 0 \\ 0 & 0 \end{matrix}\right) & +f(1) = I = \left(\begin{matrix}1 & 0 \\ 0 & 1 \end{matrix}\right) \\ +f(\alpha) = C_p = \left(\begin{matrix}0 & 1 \\ 1 & 1 \end{matrix}\right) & + f(\alpha^2) = C_p {}^2 = \left(\begin{matrix}1 & 1 \\ 1 & 0 \end{matrix}\right) +\end{gather*} +\\ ~ \\ +f(a + b)= f(a) + f(b), \quad f(ab) = f(a)f(b) +$$ + +Finally, we constructed GL(2, 4) using matrices of matrices -- not [block matrices](https://en.wikipedia.org/wiki/Block_matrix)! This post will focus on studying this method in slightly more detail. + + +Reframing the Path Until Now +---------------------------- + +In the above description, we already mentioned larger structures over GF(2), namely polynomials and matrices. Since GF(4) can itself be described with matrices over GF(2), we can generalize *f* to give us two more maps: + +- *f*\*, which converts matrices over GF(4) to double-layered matrices over GF(2), and +- *f*^*•*^ which converts polynomials over GF(4) to polynomials of matrices over GF(2) + + +### Matrix Map + +We examined the former map briefly in the previous post. More explicitly, we looked at a matrix *B* in SL(2, 4) which had the property that it was cyclic of order five. Then, to work with it without relying on symbols, we simply applied *f* over the contents of the matrix. + +$$ +\begin{gather*} +f^* : \mathbb{F}_4 {}^{2 \times 2} \longrightarrow + (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2} +\\ ~ \\ +B = \left(\begin{matrix} + 0 & \alpha \\ \alpha^2 & \alpha^2 + \end{matrix} \right) +\\ +B^* = f^*(B) = \left(\begin{matrix} + f(0) & f(\alpha) \\ f(\alpha^2) & f(\alpha^2) + \end{matrix} \right) = + \left(\begin{matrix} + \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) & + \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \\ + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) & + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) + \end{matrix} \right) +\end{gather*} +$$ + +We can do this because a matrix contains values in the domain of *f*, thus uniquely determining a way to change the internal structure (what Haskell calls a [functor](https://wiki.haskell.org/Functor)). Furthermore, due to the properties of *f*, it and *f*\* obey an important relationship with the determinant: it commutes, as in the following diagram: + +$$ +\begin{gather*} +f(\det(B)) = f(1) = I =\det(B^*)= \det(f^*(B)) +\\ ~ \\ +\begin{matrix} +& \mathbb{F}_4 {}^{2 \times 2} & +\overset{\det} {\longrightarrow} & +\mathbb{F}_4 \\ +\\ +\small f^* & | & & | & f\\ +& \downarrow & & \downarrow \\ +& & & & \\ +& +(\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2} & +\underset{\det}{\longrightarrow} & +\mathbb{F}_2 {}^{2 \times 2} \\ +\end{matrix} +\end{gather*} +$$ + +It should be noted that the determinant strips off the *outer* matrix. We could also consider the map **det**\* , where we apply the determinant to the internal matrices (in Haskell terms, `fmap det`). This map isn't as nice though, since: + +$$ +\begin{align*} +\det {}^*(B^*) +&= \left(\begin{matrix} + \det \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) & + \det \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \\ + \det \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) & + \det \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) + \end{matrix} \right) += \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) +\\ ~ \\ +&\neq \left(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix} \right) += \det(B^*) +\end{align*} +$$ + + +### Polynomial Map + +Much like how we can change the internal structure of matrices, we can do the same for polynomials. For the purposes of demonstration, we'll work with *b* = *λ*^2^ + *α*^2^*λ* + 1 , the characteristic polynomial of *B*, since it has coefficients in the domain of *f*. We define the extended map *f*^*•*^ as: + +$$ +\begin{gather*} +f^{\bullet} : \mathbb{F}_4[\lambda] \longrightarrow + \mathbb{F}_2 {}^{2 \times 2}[\Lambda] \\ +f^{\bullet} (\lambda) = \Lambda \qquad +f^{\bullet}(a) = f(a), \quad a \in \mathbb{F}_4 +\\ ~ \\ +\begin{align*} +b^{\bullet} += f^{\bullet}(b) +&= f^{\bullet}(\lambda^2) +&&+&& f^{\bullet}(\alpha^2)f^{\bullet}(\lambda) +&&+&& f^{\bullet}(1) \\ +&= \Lambda^2 +&&+&& \left(\begin{matrix} 1 & 1 \\ 1 & 0\end{matrix}\right) \Lambda +&&+&& \left(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}\right) +\end{align*} +\end{gather*} +$$ + +Since we're looking at the characteristic polynomial of *B*, we might as well also look at the characteristic polynomial of *B*\*, its image under *f*\*. We already looked at the determinant of this matrix, which is the constant term of the characteristic polynomial (up to sign). Therefore, it's probably not surprising that *f*^*•*^ and the characteristic polynomial commute in a similar fashion to the determinant. + +$$ +\begin{align*} +b^* +&= \text{charpoly}(f^*(B)) += \text{charpoly} + \left(\begin{matrix} + \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) & + \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \\ + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) & + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) + \end{matrix} \right) \\ +&= \Lambda^2 + + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) \Lambda + + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) += f^{\bullet}(\text{charpoly}(B)) += b^\bullet +\end{align*} +\\ ~ \\ +\begin{matrix} +& \mathbb{F}_4 {}^{2 \times 2} & +\overset{\text{charpoly}} {\longrightarrow} & +\mathbb{F}_4[\lambda] \\ +\\ +\small f^* & | & & | & f^\bullet\\ +& \downarrow & & \downarrow \\ +& & & & \\ +& +(\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2} & +\underset{\text{charpoly}}{\longrightarrow} & +(\mathbb{F}_2 {}^{2 \times 2})[\Lambda] \\ +\end{matrix} +$$ + +It should also be mentioned that **charpoly**\*, taking the characteristic polynomials of the internal matrices, does *not* obey the same relationship. For one, the type is wrong: the codomain is a matrix *containing* polynomials, rather than a polynomial over matrices. + +There *does* happen to be an isomorphism between the two structures (the reverse direction of which we'll discuss momentarily). But even by converting to the proper type, we already have a counterexample in the constant term from taking det* earlier. + +$$ +\begin{align*} +\text{charpoly}^*(B^*) +&= \left(\begin{matrix} + \text{charpoly} \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) & + \text{charpoly} \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \\ + \text{charpoly} \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) & + \text{charpoly} \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) + \end{matrix} \right) +\\ +&= \left(\begin{matrix} +\lambda^2 & \lambda^2 + \lambda + 1 \\ +\lambda^2 + \lambda + 1 & \lambda^2 + \lambda + 1 +\end{matrix} \right) \\ +&\cong +\left(\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix} \right) \Lambda^2 ++ \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \Lambda ++ \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) +\\ ~ \\ +&\neq f^{\bullet}(\text{charpoly}(B)) +\end{align*} +$$ + + +Forgetting +---------- + +Clearly, layering matrices has several advantages over how we usually interpret block matrices. But what happens if we *do* "forget" about the internal structure? + +$$ +\begin{gather*} + \text{forget} : (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2} +\longrightarrow \mathbb{F}_2 {}^{4 \times 4} +\\ ~ \\ + \hat B = \text{forget}(B^*) = + \text{forget}\left(\begin{matrix} + \left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \right) & + \left(\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix} \right) \\ + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) & + \left(\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix} \right) + \end{matrix} \right) = + \left(\begin{matrix} + 0 & 0 & 0 & 1 \\ + 0 & 0 & 1 & 1 \\ + 1 & 1 & 1 & 1 \\ + 1 & 0 & 1 & 0 + \end{matrix} \right) +\end{gather*} +$$ + +
+ +Haskell implementation of `forget` + +```{.haskell} +forget :: Matrix (Matrix a) -> Matrix a +-- Massively complicated point-free way to forget matrices: +-- 1. Convert internal matrices to lists of lists +-- 2. Convert the external matrix to a list of lists +-- 3. There are now four layers of lists. Transpose the second and third. +-- 4. Concat the new third and fourth layers together +-- 5. Concat the first and second layers together +-- 6. Convert the list of lists back to a matrix +forget = toMatrix . concat . fmap (fmap concat . transpose) . + fromMatrix . fmap fromMatrix +``` + +To see why this is the structure, remember that we need to work with rows of the external matrix at the same time. We'd like to read across the whole row, but this involves descending into two matrices. The `fmap transpose` allows us to collect rows in the way we expect. For example, for the above matrix, We get `[[[0,0],[0,1]], [[0,0],[1,1]]]` after the transposition, which are the first two rows, grouped by the matrix they belonged to. Then, we can finally get the desired row by `fmap (fmap concat)`ing the rows together. Finally, we `concat` once more to undo the column grouping. +
+ +Like *f*, `forget` preserves addition and multiplication, a fact already appreciated by block matrices. Further, by *f*, the internal matrices multiply the same as elements of GF(4). Hence, this shows us directly that GL(2, 4) is a subgroup of GL(4, 2). + +However, an obvious difference between layered and "forgotten" matrices is the determinant and characteristic polynomial: + +$$ +\begin{align*} + \det {B^*} &= \left(\begin{matrix}1 & 0 \\ 0 & 1\end{matrix}\right) + \\ ~ \\ + \det {\hat B} &= 1 +\end{align*} +\qquad +\begin{align*} + \text{charpoly}(B^*) &= + \Lambda^2 + + \left(\begin{matrix}1 & 1 \\ 1 & 0 \end{matrix}\right)\Lambda + + \left(\begin{matrix}1 & 0 \\ 0 & 1\end{matrix}\right) + \\ ~ \\ + \text{charpoly}(\hat B) &= + \lambda^4 + \lambda^3 + \lambda^2 + \lambda + 1\\ +\end{align*} +$$ + + +### Another Path to the Forgotten + +It's a relatively simple matter to move between determinants, since it's straightforward to identify 1 and the identity matrix. However, a natural question to ask is whether there's a way to reconcile or coerce the matrix polynomial into the "forgotten" one. + +First, let's formally establish a path from matrix polynomials to a matrix of polynomials. We need only use our friend from the [second post]() -- polynomial evaluation. Simply evaluating a matrix polynomial at *λI* converts our matrix indeterminate (*Λ*) into a scalar one (*λ*). + +$$ +\begin{align*} +\text{eval}_{\Lambda \mapsto \lambda I} &: +(\mathbb{F}_2 {}^{2 \times 2})[\Lambda] +\rightarrow (\mathbb{F}_2[\lambda]) {}^{2 \times 2} \\ &:: \quad +r(\Lambda) \mapsto r(\lambda I) +\\ ~ \\ + \text{eval}_{\Lambda \mapsto \lambda I}(\text{charpoly}(B^*)) &= + (\lambda I)^2 + + \left(\begin{matrix}1 & 1 \\ 1 & 0 \end{matrix}\right)(\lambda I) + + \left(\begin{matrix}1 & 0 \\ 0 & 1\end{matrix}\right) + \\ + &= + \left(\begin{matrix} + \lambda^2 + \lambda + 1 & \lambda \\ + \lambda & \lambda^2 + 1 + \end{matrix}\right) +\end{align*} +$$ + +Since a matrix containing polynomials is still a matrix, we can then take its determinant. What pops out is exactly what we were after, and we can arrange our maps into another diagram: + +$$ +\begin{align*} + \det(\text{eval}_{\Lambda \mapsto \lambda I}(\text{charpoly}(B^*))) &= + (\lambda^2 + \lambda + 1)(\lambda^2 + 1) - \lambda^2 \\ + &= \lambda^4 + \lambda^3 + \lambda^2 + \lambda + 1 \\ + &= \text{charpoly}(\hat B) +\end{align*} \\ ~ \\ +\begin{matrix} +& (\mathbb{F}_2 {}^{2 \times 2})^{2 \times 2} & +\overset{\text{charpoly}} {\longrightarrow} & +(\mathbb{F}_2 {}^{2 \times 2})[\Lambda] \\ +\\ +& & &\downarrow & \small \text{eval}_{\Lambda \mapsto \lambda I} \\ +& | \\ +\small \text{forget}& | & & (\mathbb{F}_2 [\lambda])^{2 \times 2} \\ +& \downarrow \\ +& & & \downarrow & \det \\ +\\ +& +\mathbb{F}_2 {}^{4 \times 4} & +\underset{\text{charpoly}}{\longrightarrow} & +\mathbb{F}_2[\lambda] \\ +\end{matrix} \\ ~ \\ +\text{charpoly} \circ \text{forget} = +\det \circ ~\text{eval}_{\Lambda \mapsto \lambda I} \circ\text{charpoly} +$$ + +
+ +Haskell demonstration of this commutation + +Fortunately, the implementation of `charpoly` using Laplace expansion already works with numeric matrices. Therefore, we need only define the special eval: + +```{.haskell} +toMatrixPolynomial :: Num a => Polynomial (Matrix a) -> Matrix (Polynomial a) +-- Collect our coefficient matrices into a single matrix of polynomials +toMatrixPolynomial (Poly ps) = Mat $ array rs values where + -- Technically, we're always working with square matrices, but we should + -- always use the largest bounds available. + (is,js) = unzip $ map mDims ps + rs = ((0,0),(maximum is - 1,maximum js - 1)) + -- Address a matrix. This needs defaulting to zero to be fully correct + -- with respect to the range given by `rs` + access b (Mat m) = m!b + -- Build the value at an address by addressing over the coefficients + -- ps is already in rising coefficient order, so our values are too. + values = map (\r -> (r, Poly $ map (access r) $ ps)) (range rs) +``` + +Now we can simply observe: + +```{.haskell} +field4 = [zero 2, eye 2, toMatrix [[0,1],[1,1]], toMatrix [[1,1],[1,0]]] + +mB = toMatrix $ [[field4!!0, field4!!2], [field4!!3, field4!!3]] + +-- >>> mapM_ print $ fromMatrix $ forget mB +-- -- [0,0,0,1] +-- -- [0,0,1,1] +-- -- [1,1,1,1] +-- -- [1,0,1,0] + +-- >>> fmap (`mod` 2) $ charpoly $ forget mB +-- -- 1x^4 + 1x^3 + 1x^2 + 1x + 1 +-- >>> fmap (`mod` 2) $ determinant $ toMatrixPolynomial $ charpoly mB +-- -- 1x^4 + 1x^3 + 1x^2 + 1x + 1 +``` +
+ +It should be noted that we do *not* get the same results by taking the determinant after applying charpoly\*, indicating that the above method is "correct". + +$$ +\begin{align*} + + \text{charpoly}^*(B^*) &= \left(\begin{matrix} + \lambda^2 & \lambda^2 + \lambda + 1 \\ + \lambda^2 + \lambda + 1 & \lambda^2 + \lambda + 1 + \end{matrix}\right) +\\ ~ \\ + \det( \text{charpoly}^*(B^*)) &= +\lambda^2(\lambda^2 + \lambda + 1) - (\lambda^2 + \lambda + 1)^2 \\ +&= \lambda^3 + 1 \mod 2 +\end{align*} +$$ + + +### Cycles and Cycles + +Since we can get *λ*^4^ + *λ*^3^ + *λ*^2^ + *λ* + 1 in two ways, it's natural to assume this polynomial is significant in some way. In the language of the the second post, the polynomial can also be written as ~2~31, whose root we determined was cyclic of order 5. This happens to match the order of *B* in GL(2, 4). + +Perhaps this is unsurprising, since there are only so many polynomials of degree 4 over GF(2). However, the reason we see it is more obvious if we look at the powers of scalar multiples of B. First, recall that *f*\* takes us from matrices over GF(4) to matrices of matrices of GF(2). Then define a map g that gives us degree 4 polynomials: + +$$ +g : \mathbb{F}_4^{2 \times 2} \rightarrow \mathbb{F}_2[\lambda] \\ +g = \text{charpoly} \circ \text{forget} \circ f^* +\\~ \\ +\begin{array}{ccc|ccc|ccc} +& \scriptsize \left(\begin{matrix} +0 & \alpha \\ \alpha^2 & \alpha^2 +\end{matrix}\right) & & +& \scriptsize \left(\begin{matrix} +0 & \alpha^2 \\ 1 & 1 +\end{matrix}\right) & & +& \scriptsize \left(\begin{matrix} +0 & 1 \\ \alpha & \alpha +\end{matrix}\right) +\\ +B & +\overset{g}{\mapsto} & +11111_\lambda & +\alpha B & +\overset{g}{\mapsto} & +10011_\lambda & +\alpha^2 B & +\overset{g}{\mapsto} & +11001_\lambda +\\ +B^2 & +\overset{g}{\mapsto} & +11111_\lambda & +(\alpha B)^2 & +\overset{g}{\mapsto} & +10011_\lambda & +(\alpha^2 B)^2 & +\overset{g}{\mapsto} & +11001_\lambda +\\ +B^3 & +\overset{g}{\mapsto} & +11111_\lambda & +(\alpha B)^3 & +\overset{g}{\mapsto} & +11111_\lambda & +(\alpha^2 B)^3 & +\overset{g}{\mapsto} & +11111_\lambda +\\ +B^4 & +\overset{g}{\mapsto} & +11111_\lambda & +(\alpha B)^4 & +\overset{g}{\mapsto} & +10011_\lambda & +(\alpha^2 B)^4 & +\overset{g}{\mapsto} & +11001_\lambda +\\ +B^5 & +\overset{g}{\mapsto} & +10001_\lambda & +(\alpha B)^5 & +\overset{g}{\mapsto} & +10101_\lambda & +(\alpha^2 B)^5 & +\overset{g}{\mapsto} & +10101_\lambda +\end{array} +$$ + +The matrices in the middle and rightmost columns both have order 15 inside GL(2, 4). Correspondingly, both 10011~λ~ = ~2~19 and 11001~λ~ = ~2~25 are primitive, and so have roots of order 15 over GF(2). + + +### A Field? + +Since we have 15 matrices generated by the powers of one, you might wonder whether or not they can correspond to the nonzero elements of GF(16). And they can! In a sense, we've "borrowed" the order 15 elements from this "field" within GL(4, 2). However, none of the powers of this matrix are the companion matrix of either ~2~19 or ~2~25. + +
+ +Haskell demonstration of the field-like-ness of these matrices + + +All we really need to do is test additive closure, since the powers trivially commute and include the identity matrix. + +```{.haskell} +hasAdditiveClosure :: Integral a => Int -> a -> [Matrix a] -> bool +-- Check whether n x n matrices (mod p) have additive closure +-- Supplement the identity, even if it is not already present +hasAdditiveClosure n p xs = all (`elem` xs') sums where + -- Add in the zero matrix + xs' = zero n:xs + -- Calculate all possible sums of pairs (mod p) + sums = map (fmap (`mod` p)) $ (+) <$> xs' <*> xs' + + +generatesField :: Integral a => Int -> a -> Matrix a -> bool +-- Generate the powers of x, then test if they form a field (mod p) +generatesField n p x = hasAdditiveClosure n p xs where + xs = map (fmap (`mod` p) . (x^)) [1..p^n-1] + +alphaB = toMatrix [[zero 2, field4!!3],[eye 2, eye 2]] + +-- >>> mapM_ $ print $ fromMatrix $ forget alphaB +-- -- [0,0,1,1] +-- -- [0,0,1,0] +-- -- [1,0,1,0] +-- -- [0,1,0,1] +-- +-- >>> generatesField 4 2 $ forget $ alphaB +-- -- True +``` +
+ +More directly, we might also observe that *α*^2^*B* is the companion matrix of an irreducible polynomial over GF(4), namely *q*(*x*) = *x*^2^ - *αx* - *α*. + +Both the "forgotten" matrices and the aforementioned companion matrices lie within GL(4, 2). A natural question to ask is whether we can make fields by the following process: + +1. Filter out all order-15 elements of GL(4, 2) +2. Partition the elements and their powers into their respective order-15 subgroups +3. Add the zero matrix into each class +4. Check whether all classes are additively closed (and are therefore fields) + +In this case, it happens to be true, but proving this in general is difficult, and I haven't done so. + + +Expanding Dimensions +-------------------- + +Of course, we need not only focus on GF(4) -- we can just as easily work over GL(2, 2*r*) for other *r* than 2. In this case, the internal matrices will be *r*×*r* while the external one remains 2×2. But neither do we have to work exclusively with 2×2 matrices -- we can work over GL(*n*, 2^*r*^). In either circumstance, the "borrowing" of elements of larger order still occurs. This is summarized by the following diagram: + +$$ +\begin{gather*} +&& & \scriptsize \text{forget} \circ f_1^* & & \scriptsize f_2 \\ +\text{SL}(n,2^r) & \hookrightarrow & +\text{GL}(n, 2^r) & \hookrightarrow & +\text{GL}(nr, 2) &\hookleftarrow & \mathbb{F}_{2^{nr}} \\ ~ \\ +\underset{\text{order } k}S & & +\underset{\text{order } k}S, \underset{\text{order } 2^{nr}-1}T &&&& +\underset{\text{order } k}s, \underset{\text{order } 2^{nr}-1}{t} +\end{gather*} +$$ + +Here, *f*~1~ is our map from GF(2^*r*^) to *r*×*r* matrices and *f*~2~ is a similar map. *r* must greater than 1 for us to properly make use of matrix arithmetic. Similarly, *n* must be greater than 1 for the leftmost GL. Thus, *nr* is a composite number. Here, *k* is a proper factor of 2^*nr*^ - 1. In the prior discussion, *k* was 5 and 2^*nr*^ - 1 was 15. + +Recall that primitive polynomials over GF(2^*nr*^) have roots with order 2^*nr*^ - 1. This number can *never* be prime, since Mersenne primes are the only primes of the form 2^*p*^ - 1 -- *p* itself must be prime. Thus, in GL of prime dimensions, we can never loan to a GL over a field of larger order with the same characteristic. Conversely, GL(*nr* + 1, 2) trivially contains GL(*nr*, 2) by fixing a subspace. So we do eventually see elements of order 2^*m*^ - 1 for either prime or composite *m*. + + +### Other Primes + +This concern about prime dimensions is unique to characteristic 2. For any other prime *p*, *p*^*m*^ - 1 is composite since it is at the very least even. All other remarks about the above diagram should still hold for any other prime *p*. + +In addition, our earlier diagram where we correspond the order of an element in GL(2, 2^2^) with the order of an element in GF(2^2•2^) via the characteristic polynomial also generalizes. Though I have not proven it, I strongly suspect the following diagram commutes, at least in the case where *K* is a finite field: + +$$ +\begin{matrix} +& (K^{r \times r})^{n \times n} & +\overset{\text{charpoly}} {\longrightarrow} & +(K^{r \times r})[\Lambda] \\ +\\ +& & &\downarrow & \small \text{eval}_{\Lambda \mapsto \lambda I} \\ +& | \\ +\small \text{forget}& | & & (K [\lambda])^{r \times r} \\ +& \downarrow \\ +& & & \downarrow & \det \\ +\\ +& +K {}^{nr \times nr} & +\underset{\text{charpoly}}{\longrightarrow} & +K[\lambda] \\ +\end{matrix} +\\ ~ \\ +\text{charpoly} \circ \text{forget} = +\det \circ ~ \text{eval}_{\Lambda \mapsto \lambda I} \circ\text{charpoly} +$$ + +Over larger primes, the gap between GL and SL may grow ever larger, but SL over a prime power field seems to inject into SL over a prime field. If the above diagram is true, then the prior statement follows. + + +### Monadicity and Injections + +The action of forgetting the internal structure may sound somewhat familiar if you know your Haskell. Remember that for lists, we can do something similar -- converting `[[1,2,3],[4,5,6]]` to `[1,2,3,4,5,6]` is just a matter of applying `concat`. But this is an instance in which we know lists to behave like a [monad](https://wiki.haskell.org/Monad). Despite being an indecipherable bit of jargon to newcomers, it just means we: + +1. can apply functions inside the structure (for example, to the elements of a list), +2. have a sensible injection into the structure (creating singleton lists, called `return`), and +3. can reduce two layers to one (concat, or join for monads in general). + - Monads are traditionally defined using the operator `>>=`, but `join = (>>= id)` + +Just comparing the types of `join :: Monad m => m (m a) -> m a` and `forget :: Matrix (Matrix a) -> Matrix a` suggests that `Matrix` (meaning square matrices) could be a monad, and further, one which respects addition and multiplication. Of course, **this is only true when our internal matrices are all the same size**. In the above diagrams, this restriction has applied, but should be stated explicitly since no dimension is specified by `Matrix a`. + +However, we run into difficulty at condition 2. For one, only "numbers" (elements of a ring) can go inside matrices. This restricts where monadicity can hold. More importantly, we have a *lot* of freedom in what dimension we choose to inject into. For example, we might pick a `return` that uses 1×1 matrices (which add no additional structure). We might also pick `return2`, which scalar-multiplies its argument to a 2×2 identity matrix instead. + +Unfortunately, there's no good answer. At the very least, we can close our eyes and pretend that we have a nice diagram: + +$$ +\begin{gather*} +\begin{matrix} +& L\underset{\text{degree } r}{/} K +\\ \\ +\small f & +\begin{matrix} +| \\ \downarrow +\end{matrix} +\\ \\ +& K^{r \times r} +\end{matrix} +& \quad & \quad & +\begin{matrix} +& (L\underset{\text{degree } r}{/} K)^{n \times n} +\\ \\ +\small f^* & +\begin{matrix} +| \\ \downarrow +\end{matrix} +& \searrow & \small \texttt{>>=} ~ f \qquad +\\ \\ +& (K^{r \times r})^{n \times n} & +\underset{\text{forget}} {\longrightarrow} & +K {}^{nr \times nr} +\end{matrix} +\end{gather*} +$$ + +As one last note on the monadicity of matrices, I *have* played around with an alternative `Matrix` type which includes scalars alongside proper matrices, which would allow for a simple canonical injection. Unfortunately, it complicates `join` -- we just place the responsibility of sizing the internal matrices front-and-center since we can correspond internal scalars with identity matrices. + + +Closing +------- + +At this point, I've gone on far too long about algebra. One nagging curiosity makes me wonder whether the there are any diagrams like the following: + +$$ +\begin{matrix} +& (L\underset{\text{degree } r}{/} K)^{n \times n} &&& +& (L\underset{\text{degree } n}{/} K)^{r \times r} +\\ \\ +\small f_1^* & +\begin{matrix} +| \\ \downarrow +\end{matrix} +& \searrow & & \swarrow & +\begin{matrix} +| \\ \downarrow +\end{matrix} & f_2^* +\\ \\ +& (K^{r \times r})^{n \times n} & +\underset{\text{forget}} {\longrightarrow} & +K {}^{nr \times nr} & +\underset{\text{forget}}{\longleftarrow} & +(K^{n \times n})^{r \times r} +\end{matrix} +$$ + +Or in English, whether "rebracketing" certain *nr*×*nr* matrices can be traced back to not only a degree *r* field extension, but also one of degree *n*. + +The mathematician in me tells me to believe in well-defined structures. Matrices are one such structure, with myriad applications. However, the computer scientist in me laments that the application of these structures is buried in symbols and that layering them is at most glossed over. There is clear utility and interest in doing so, otherwise the diagrams shown above would not exist. + +Of course, there's plenty of reason *not* to go down this route. For one, it's plainly inefficient -- GPUs are *built* on matrix operations being as efficient as possible, i.e., without the layering. It's also inefficient to learn for people *just* learning matrices. I'd still argue that the method is efficient for learning about more complex topics, like field extensions.