tag:blogger.com,1999:blog-91221596955741127552024-03-14T05:52:13.782+01:00Jack's BlackboardInteresting Computer Science related posts.Giacomo Alzettahttp://www.blogger.com/profile/16994410213478654078noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-9122159695574112755.post-48757833141239414962013-04-27T10:46:00.000+02:002013-07-27T07:43:43.869+02:00Pseudoprimes: from Fermat to the AKS test<h3>
The definitions of pseudoprimes </h3>
<br />
The idea behind the AKS primality tests is not an original idea from its authors, but it can be traced back to <a href="http://en.wikipedia.org/wiki/Pierre_de_Fermat">Pierre de Fermat</a>, in the 1600.<br />
<br />
The fundamental idea that Fermat had, and that is used in the AKS test, but also in the Miller-Rabin and Solovay-Strassen probabilistic tests, is the definition of a "pseudoprime in base \(b\)". Basically Fermat applied finite-group algebra to obtain information about the primality of a number, using the following definition:<br />
<br />
<blockquote class="tr_bq">
Let \(n \in \mathbb{N}\) be an odd, non-prime number and let \(a \in \mathbb{N}\) such that \(GCD(n, a) = 1\), then \(n\) is a pseudoprime in base \(a\) if:<br />
\[<br />
a^{n-1} \equiv 1 \bmod n<br />
\]</blockquote>
We can easily see that this definition is strongly related to Fermat's little theorem.<br />
<br />
This definition gives an immediate idea on how to check if a number is a prime: we simply take some bases \(a\) at random and check if the given number is a pseudoprime in that base. If it isn't, then the number is not a prime, otherwise it <i>may</i> be a prime number. The more bases we test and the bigger gets the probability of \(n\) being a prime number.<br />
<br />
Unfortunately this test doesn't work, since there exist the so-called <a href="http://en.wikipedia.org/wiki/Carmichael_number">Charmicael numbers</a> which are composite numbers that are pseudoprime in every base.<br />
<br />
The Solovay-Strassen and Miller-Rabin probabilistic tests follow exactly this idea. They give a stronger definition of pseudoprime and test if the given number is a pseudoprime with different bases.<br />
<br />
For example the Solovay-Strassen test uses this definition of pseudoprime, called <a href="http://en.wikipedia.org/wiki/Euler_pseudoprime">Euler's pseudoprime</a>:<br />
<br />
<blockquote class="tr_bq">
Let \(n \in \mathbb{N}\) be an odd, non-prime, number and let \(b \in \mathbb{N}\) be a number such that \(GCD(n,b) = 1\), then \(n\) is an Euler pseudoprime in base \(b\) if:<br />
\[<br />
b^{\frac{n-1}{2}} \equiv \left(\frac{b}{n}\right) \bmod n<br />
\]</blockquote>
<br />
Where \(\left(\frac{a}{b}\right)\) is the <a href="http://en.wikipedia.org/wiki/Jacobi_symbol">Jacobi Symbol</a>.<br />
<br />
It's quite easy to show that if \(n\) is an Euler pseudoprime in base \(b\) then it is also a pseudoprime in base \(b\), but, most importantly, it holds the following theorem:<br />
<br />
<blockquote class="tr_bq">
Let \(n \in \mathbb{N}\) be an odd, non-prime, number. Let<br />
\[<br />
B = \left\{b \in \mathbb{N} \mid b < n \wedge b^{\frac{n-1}{2}} \equiv \left(\frac{b}{n}\right) \bmod n \right\}<br />
\]<br />
<br />
Then \(|B| \leq \frac{\phi(n)}{2}\).</blockquote>
<br />
From this theorem it immediately follows this corollary:<br />
<br />
<blockquote class="tr_bq">
Let \(n \in \mathbb{N}\) be an odd number and let \(m \in \mathbb{N}\) be a positive number. Then, the probability that \(n\) is not a prime when it is an Euler pseudoprime for \(m\) different bases in the range \(\{1, \dots, n-1\}\) is less than or equal to \(2^{-m}\).</blockquote>
<br />
this means that simply testing for about \(50\) or \(100\) bases gives a really precise answer. The Miller-Rabin uses an even stronger definition of pseudoprime, called <a href="http://en.wikipedia.org/wiki/Strong_pseudoprime">Strong pseudoprime</a>:<br />
<br />
Let \(n \in \mathbb{N}\) be an odd, non-prime, number and let \(b < n\) be a natural number coprime with \(n\). Let \(s, t \in \mathbb{N}\) be numbers such as \(n = 2^st + 1\) with \(t\) odd. Then \(n\) is a Strong pseudoprime in base \(b\) if:<br />
\[<br />
b^t \equiv 1 \bmod n<br />
\]<br />
Or if:<br />
\[<br />
\exists r \in \mathbb{N}, r < s \mid b^{2^rt} \equiv -1 \bmod n<br />
\]<br />
<br />
It's easy to show that every Strong pseudoprime is an Euler's and Fermat's pseudoprime and it also holds that if \(n\) is an Euler pseudoprime, then it is a Strong pseudoprime only if \(n \equiv 3 \bmod 4\). Hence this definition of pseudoprime is strictly stronger than the definition of Euler pseudoprime.<br />
<br />
In fact we can show that if \(n\) is an odd number and it is a Strong pseudoprime for \(m\) different bases, then the probability that \(n\) is not a prime are less than or equal to \(4^{-m}\).<br />
<br />
<h3>
The AKS definition of pseudoprimes </h3>
Also the AKS uses this kind of definition of a pseudoprime. The difference between this test and the former algorithms is that the AKS pseudoprime are related to a polynomial extension of Fermat's little theorem:<br />
<br />
<blockquote class="tr_bq">
Let \(a \in \mathbb{Z}_0, n \in \mathbb{N}, n \geq 2\) such that \(a\) and \(n\) are coprime. Then \(n\) is prime if and only if:<br />
\[<br />
(x + a)^n \equiv x^n + a \bmod n <br />
\] </blockquote>
<br />
The problem with this theorem is that the polynomial has an <i>exponential number of terms</i>, hence it doesn't provide an efficient way to check for primality.<br />
<br />
The idea of the AKS authors is to reduce the above definition in a smaller polynomial Ring. In particular what they wanted to obtain was something like the following:<br />
<br />
<blockquote class="tr_bq">
Pseudo-theorem: Let \(a \in \mathbb{Z}_0, n \in \mathbb{N}, n \geq 2\) such that \(a\) and \(n\) are coprime and let \(r \in \mathbb{N}\) be a "big enough number" with \(r \leq \log^k{n}\) for some fixed \(k \in \mathbb{N}\). Then \(n\) is prime if and only if:<br />
\[<br />
(x + a)^n \equiv x^n + a \; \text{in} \; \mathbb{Z}_n[x]/(x^r - 1)<br />
\]</blockquote>
<br />
Unfortunately such a theorem doesn't hold. But it holds a similar theorem where, instead of using a single base \(b\), it's enough to test a "small" number of bases.<br />
In particular this is the AKS theorem:<br />
<br />
<blockquote class="tr_bq">
Let \(n > 1 \in \mathbb{N}\) such that \(n\) is not a power of a prime with exponent bigger than \(1\). Let \(r < n \in \mathbb{N}\) such that:<br />
\[<br />
r \leq [16\log_2^5{n}] + 1, \qquad ord_r(n) > 4\log_2^2{n}<br />
\] <br />
If \(n\) doesn't have prime factors smaller or equal to \(r\), then \(n\) is prime if and only if:<br />
\[<br />
(X + a)^n \equiv X^n + a \; \text{in} \; \mathbb{Z}_n [X]/(X^r-1) \quad \forall a = 1, \dots, [2\sqrt{\phi(r)}\log_2{n}]<br />
\] </blockquote>
So this immediately suggest a pseudocode for testing primality:<br />
<br />
<div class="highlight">
<pre><span class="k">def</span> <span class="nf">test_AKS</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">n</span> <span class="o"><</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">False</span>
<span class="k">elif</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">True</span>
<span class="k">if</span> <span class="n">find_int_power</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">False</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">find_r</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="k">for</span> <span class="n">divisor</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span>
<span class="k">if</span> <span class="n">n</span> <span class="o">%</span> <span class="n">divisor</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">False</span>
<span class="k">if</span> <span class="n">n</span> <span class="o"><</span> <span class="n">r</span><span class="o">**</span><span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">True</span>
<span class="n">a_upper_bound</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">phi</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="o">**</span> <span class="o">.</span><span class="mi">5</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">a_upper_bound</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span>
<span class="k">if</span> <span class="p">((</span><span class="n">x</span> <span class="o">+</span> <span class="n">a</span><span class="p">)</span> <span class="o">**</span> <span class="n">n</span><span class="p">)</span> <span class="o">!=</span> <span class="p">(</span><span class="n">x</span> <span class="o">**</span> <span class="n">n</span> <span class="o">+</span> <span class="n">a</span><span class="p">)</span> <span class="o">%</span> <span class="p">(</span><span class="n">x</span> <span class="o">**</span> <span class="n">r</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">False</span>
<span class="k">return</span> <span class="bp">True</span> </pre>
<pre> </pre>
<pre> </pre>
</div>
(Where the condition on the if block inside the last loop is actually pseudocode where the polynomial modulus applies to the whole thing. Remember: we <i>cannot</i> evaluate something like \((x + a)^n\).) <br />
<br />
The important fact about this algorithm is that we can prove to have a complexity of \(\mathcal{O}(\log^{12}{n})\) and hence it proves that testing primality is a problem in \(\mathcal{P}\).<br />
<br />
The problem about this algorithm is the really big exponent of the complexity. Furthermore the constants hidden in the big-O notation are pretty big, due to the complexity of operations with polynomials. Using an efficient implementation of polynomials is fundamental.Giacomo Alzettahttp://www.blogger.com/profile/16994410213478654078noreply@blogger.com0tag:blogger.com,1999:blog-9122159695574112755.post-48155908511753116932012-12-30T00:00:00.000+01:002012-12-30T00:00:07.626+01:00Determine powers of a prime and other helper functionsThe first steps of the AKS primality are used to detect <a href="http://en.wikipedia.org/wiki/Prime_power">prime-powers</a> with exponent bigger than one. This step can be done in many ways, and almost any algorithm is efficient enough to be used into the AKS test.<br />
The algorithm used here wont impact the time of the AKS test on primes, since <i>all</i> the time is taken in the last loop that checks the polynomial congruence.<br />
<br />
In my implementation of the AKS test I used the simplest algorithm I could think of. The number \(n\) can be, at most, a power of exponent \(\log_2{n}\), thus we can loop over the exponents from \(2\) to \(\log_2{n}\) and for every exponent \(e\) do a bisection search to find the base. If there exist an integer base \(b\) such that \(b^e = n\) then \(n\) is a power of a prime, hence not a prime. Otherwise, if no such \(b\) exist for every exponent, the number is not a power of a prime.<br />
<br />
The resultant code is this:<br />
<br />
<br />
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">math</span>
<span class="k">def</span> <span class="nf">find_int_power</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="sd">"""Return two integers m > 0, k > 1 such that n = m^k. (*n* > 1)</span>
<span class="sd"> If two such integers do not exist return None.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">n</span> <span class="o"><</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s">'This function requires n > 1.'</span><span class="p">)</span>
<span class="n">h</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">h</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">n</span>
<span class="k">while</span> <span class="n">b</span> <span class="o">-</span> <span class="n">a</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">m</span> <span class="o">=</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span> <span class="o">//</span> <span class="mi">2</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">m</span> <span class="o">**</span> <span class="n">k</span> <span class="o">-</span> <span class="n">n</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">res</span><span class="p">:</span>
<span class="k">return</span> <span class="n">m</span><span class="p">,</span> <span class="n">k</span>
<span class="k">elif</span> <span class="n">res</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">m</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">m</span>
<span class="k">if</span> <span class="n">m</span> <span class="o">**</span> <span class="n">k</span> <span class="o">==</span> <span class="n">n</span><span class="p">:</span>
<span class="k">return</span> <span class="n">m</span><span class="p">,</span> <span class="n">k</span>
<span class="k">return</span> <span class="bp">None</span> </pre>
</div>
<br />
Most versions of the AKS test require to compute Euler's \(\varphi\) function on the exponent of the polynomial. This again can be done in the naive way, without impacting much the algorithm efficiency:<br />
<br />
<div class="highlight">
<pre><span class="k">def</span> <span class="nf">factorize</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="sd">"""Yield the factors of *n* with the given multiplicity."""</span>
<span class="n">exp</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="ow">not</span> <span class="n">n</span> <span class="o">&</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">exp</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">n</span> <span class="o">//=</span> <span class="mi">2</span>
<span class="k">if</span> <span class="n">exp</span><span class="p">:</span>
<span class="k">yield</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span>
<span class="k">for</span> <span class="n">div</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">n</span><span class="o">**.</span><span class="mi">5</span> <span class="o">+</span> <span class="mf">1.5</span><span class="p">),</span> <span class="mi">2</span><span class="p">):</span>
<span class="n">exp</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="ow">not</span> <span class="n">n</span> <span class="o">%</span> <span class="n">div</span><span class="p">:</span>
<span class="n">exp</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">n</span> <span class="o">//=</span> <span class="n">div</span>
<span class="k">if</span> <span class="n">exp</span><span class="p">:</span>
<span class="k">yield</span> <span class="p">(</span><span class="n">div</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span>
<span class="k">if</span> <span class="n">n</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="k">yield</span> <span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">phi</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="sd">"""Calculates the euler's function at value *n*."""</span>
<span class="n">tot</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">div</span><span class="p">,</span> <span class="n">exp</span> <span class="ow">in</span> <span class="n">factorize</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="n">tot</span> <span class="o">*=</span> <span class="p">(</span><span class="n">div</span><span class="o">**</span><span class="p">(</span><span class="n">exp</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="o">*</span> <span class="p">(</span><span class="n">div</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">tot</span>
</pre>
</div>
<br />
Later, in an other post, we will see that, the timings of these functions do not affect at all the total running time of the algorithm.Giacomo Alzettahttp://www.blogger.com/profile/16994410213478654078noreply@blogger.com0tag:blogger.com,1999:blog-9122159695574112755.post-1260118839008920882012-10-10T12:00:00.000+02:002013-09-03T11:37:56.975+02:00Zero-Suppressed Binary Decision Diagrams and PolynomialsIn these days I read a lot of articles about polynomial representation.<br />
I was interested in finding an efficient representation for polynomials, to be used in my AKS implementation.<br />
<br />
Most of these representation are quite straightforward, like using arrays of coefficients, arrays of <i>(coefficient, exponent)</i> pairs or simply using integers in which <i>n</i> bits represent a coefficient.<br />
<br />
I decided to <a href="http://stackoverflow.com/questions/12396151/optimizing-polynomial-implementation">ask</a> on StackOverflow how could I optimize my current polynomial implementation, and in a comment pointed out that <a href="http://www-alg.ist.hokudai.ac.jp/%7Eminato/">Minato</a> wrote an <a href="http://cecs.uci.edu/%7Epapers/compendium94-03/papers/1995/edt95/pdffiles/09a_3.pdf">article</a> in which he described a new way to represent polynomials.<br />
<br />
I've decided to give this idea a try and, even though I don't think it will get my AKS faster, I found it really interesting.<br />
<br />
Minato's
representation is really different from all the "classic"
representations. The idea is to represent the polynomial as a DAG(<a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">Directed Acyclic Graph</a>), in particular a special form of BDD(<a href="http://en.wikipedia.org/wiki/Binary_decision_diagram">Binary Decision Diagram</a>), which is called <a href="http://en.wikipedia.org/wiki/Zero-suppressed_decision_diagram">Zero-Suppressed BDD</a>.<br />
The
result is that it is possible to represent polynomials with millions of
terms using only thousands of nodes(actually it could be even less,
depending on the polynomial).<br />
<br />
Since the operations, such as addition or equality, have a complexity proportional to the <i>size</i>
of the graph, and not to the number of terms, so the representation
seems promising(it can be thousands of times faster if the polynomials
are particularly "regular").<br />
<br />
<h3>
Brief description of ZBDDs</h3>
<br />
Each node of a ZBDD has a label <i>v</i>
and two outgoing edges, the 1-edge and the 0-edge(which will be drawn
with a dashed line in the images). The 1-edge is connected to a child
node, which is called the <i>high child</i>, while the 0-edge connects to the <i>low child</i>. So we can represent a node \(A\) as a triplet \((v, low, high)\).<br />
<br />
The ZBDD is a <i>rooted</i>
DAG, so there is a node that has indegree 0 and such that there is a
path from this node to any other node in the graph(and it is the only
node with such property). The order of the labels is fixed, so that in
any path from the root different labels always appear in the same order.<br />
In a ZBDD there are also two special nodes, called <i>sinks</i>, which are labelled \(\top\) and \(\bot\). The first is also called the <i>true sink</i>, while the latter is called the <i>false sink</i>. These two nodes have outdegree 0. <br />
<br />
The graph must also abide two reduction rules:<br />
<br />
<ol>
<li>In a graph \(G\) there must not be two nodes \(A\) and \(B\) such
that \(v_A = v_B\) and \(low_A = low_B\) and \(high_A = high_B\)[no
isomorphic subgraphs]</li>
<li>Every node whose high child is the false sink should be removed(eventually attaching its low child to its parent).</li>
</ol>
It's quite simply to abide this rules, we just need a procedure,
which I'll call ZUNIQUE, that controls node creation. Whenever we want
to create a node \((v, low, high)\) we call ZUNIQUE and this procedure
checks if there exist an isomorphic node(and returns it in such case),
or, if the high child is the false sink it returns the low child.<br />
<br />
An example of a ZBDD is the following:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-NF1i3hvK_e0/UG2oa8iNgTI/AAAAAAAAAC0/lpljruBMekc/s1600/example_zdd.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-NF1i3hvK_e0/UG2oa8iNgTI/AAAAAAAAAC0/lpljruBMekc/s1600/example_zdd.png" /></a></div>
Operations on these graphs are quite easy to write in a recursive fashion.<br />
Let us suppose that we want to apply a binary operation \(\diamond\)(which I'll call <i>meld</i>) between two ZBDDs. We know the results of \(\top \diamond \top, \top \diamond \bot, \bot \diamond \top, \bot \diamond \bot \).<br />
<br />
An algorithm, MELD(\(F\),\(G\)), to apply such generic operation to two ZBDDs \(F\) and \(G\) can be described by these steps:<br />
<br />
<ol>
<li>If \(F\) and \(G\) are sinks, return \(F \diamond G \), since it's a base case.</li>
<li>If \(v_F = v_G\) then return ZUNIQUE(\(v_F\), MELD(\(low_F\),\(low_G\)), MELD(\(high_F\), \(high_G\)))</li>
<li>else if \(v_F < v_G\), then return ZUNIQUE(\(v_F\), MELD(\(low_F\), \(G\)), MELD(\(high_F\), \(G\)))</li>
<li>otherwise return ZUNIQUE(\(v_G\), MELD(\(low_G\), \(F\)), MELD(\(high_G\), \(F\)))</li>
</ol>
If we replace the base case operations we can produce the result of any boolean binary operation, such as AND, OR, XOR.<br />
<br />
<br />
<h3>
Polynomial representation</h3>
Minato had a simple yet brilliant idea.
Let us consider the polynomial \(x^4 + x^3 + x\). Since any natural
number \(n\) can be written uniquely as a sum of different powers of
two, we can rewrite it as \(x^4 + x^2 \cdot x^1 + x^1\).<br />
Now we
can consider \(x^4, x^2\) and \(x^1\) as three different boolean
variables. Grouping \(x^1\) we get \(x^4 + x^1\cdot(x^2 + 1)\), now if
we replace sums with 0-edges and products with 1-edges we obtain the
following ZBDD:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-cBtrX5kZexE/UG2v9PvMKkI/AAAAAAAAADE/VNtym6ClCdI/s1600/simple_pol.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-cBtrX5kZexE/UG2v9PvMKkI/AAAAAAAAADE/VNtym6ClCdI/s1600/simple_pol.png" /></a></div>
If
we consider the paths from the root \(x^1\) to the sinks we can
re-obtain the original polynomial in this way: if two nodes are
connected with a 1-edge multiply the two labels together. Otherwise if
they are connected by a 0-edge simply skip that label. Then sum the
results for all the paths.<br />
<br />
So we can see that we have
the path \(x^1 - x^2 - \top \) which is replaced by \(x^1 \cdot x^2
\cdot 1 = x^3\), then there is the path \(x^1 - x^2 \cdots \top\),
which is replaced by \(x^1 \cdot 1 = x^1\), and there is also the path
\(x^1 \cdots x^4 - \top \) which yields \(x^4\). The path \(x^1\cdots
x^4 \cdots \bot\) yields \(0\). So if we sum these results together we
get \(x^4 + x^3 + x^1 +0\), which is our polynomial.<br />
<br />
To represent the integer coefficients we can use the same trick.<br />
Every
natural number can be written uniquely as sum of powers of two, then
every power of two can be considered as a boolean variable and be used as
\(x^i\) before.<br />
<br />
<h4>
Is the representation compact?</h4>
You may wonder if this funny representation is compact and or efficient.<br />
For example if we take the polynomial \(24x^7 + 4x^6 + 3x^3 + 16x^2 + 15x\), the resulting ZBDD is:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-K-qZfKJA-J8/UG9GNBO2y0I/AAAAAAAAADU/AGU6WSqgGAY/s1600/complex_poly.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-K-qZfKJA-J8/UG9GNBO2y0I/AAAAAAAAADU/AGU6WSqgGAY/s320/complex_poly.png" height="320" width="256" /></a></div>
And
it does not seem so compact. It has 15 nodes but it represent a
polynomial with 8 terms(and the first one is zero). But if we consider a
polynomial such as \(257 x^{55}+769 x^{54}+8 x^{52}+257 x^{43}+769
x^{42}+8 x^{40}+257 x^{23}+769 x^{22}+8 x^{20}+257 x^{11}+769 x^{10}+8
x^8\), then we obtain the
following ZBDD:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-lM2g2RoGZYk/UG__j1MoAHI/AAAAAAAAAD0/Opsi-Qst16o/s1600/good_poly.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-lM2g2RoGZYk/UG__j1MoAHI/AAAAAAAAAD0/Opsi-Qst16o/s320/good_poly.png" height="320" width="282" /></a></div>
<br />
This
ZBDD contains only 12 nodes and represents a polynomial of degree 55,
with 12 terms. This polynomial would require 56 "slots" to be
represented as array of coefficients and would require 12 <i>pairs</i> to
represent it as an array of coefficient-exponent pairs, but in that case
the operations would be slower. Also, we can modify it slightly to
obtain something like this:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-0aK5C-vvbN0/UHADHupcZUI/AAAAAAAAAEE/5r7WZDeh-d8/s1600/good_poly2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-0aK5C-vvbN0/UHADHupcZUI/AAAAAAAAAEE/5r7WZDeh-d8/s320/good_poly2.png" height="320" width="318" /></a></div>
Which represents \(257
x^{55}+769 x^{54}+8 x^{52}+257 x^{43}+769 x^{42}+8
x^{40}+16x^{37}+16x^{33}+16x^{32}+257 x^{23}\)<br />
\(+769 x^{22}+8 x^{20}+257
x^{11}+769 x^{10}+8 x^8+16x^5+16x+16\), a polynomial with 18 terms using only 15 nodes.<br />
<br />
This
may seem a small gain, but it becomes really important when we want to
deal with big polynomials. For example the polynomial \(\prod_{k=1}^{8}{(x_k + 1)^8}\) which
has \(9^8= 43046721\)) terms can be represented with only \(26279\)
nodes which is about four times the square root of the number of nodes.<br />
<br />
<h3>
Operations on Polynomials</h3>
<br />
We will now see that it's pretty easy
to devise algorithms that compute the sum or product of two polynomials
represented as ZBDDs.<br />
<br />
First of all we have to devise an
algorithm that computes the product of a polynomial and a variable. The
algorithm MULVAR(\(F\),\(v\)) is pretty straightforward if we think carefully about this example:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-l1QCJcVQons/UHARTOgkqpI/AAAAAAAAAEU/w9YYl0YRY5s/s1600/mul_by_var.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-l1QCJcVQons/UHARTOgkqpI/AAAAAAAAAEU/w9YYl0YRY5s/s320/mul_by_var.png" height="196" width="320" /> </a> </div>
<div class="separator" style="clear: both; text-align: left;">
Basically
we have to notice that to multiply a polynomial whose root is labelled
\(v\) we simply have to swap its children and to multiply the new low
child(which previously was the high child) by \(v^2\). This allows us to
write a simple recursive function, like the following: </div>
<ol>
<li>If \(v < v_F\) then return ZUNIQUE(\(v\), false-sink, \(F\))</li>
<li>Else if \(v = v_F\) then return ZUNIQUE(\(v\), MULVAR(\(high_F\), \(v^2\)), \(low_F\))</li>
<li>Else return ZUNIQUE(\(v_F\), MULVAR(\(low_F\), \(v\)), MULVAR(\(high_F\), \(v\)))</li>
</ol>
<br />
<h4>
Addition</h4>
If we consider two polynomials \(F\) and \(G\), we can
easily see that if their graphs do not share any subgraph, then their
sum is their union(which we can compute as \(F\) OR \(G\), using the
ZBDD algorithm).<br />
If there are common subgraphs then the merging wouldn't count them twice.<br />
So
what we have to do is writing \(F + G\) as \(F \oplus G + 2 \times (F
\wedge G) \), where \(oplus\) is the XOR of two ZBDDs, \(\wedge\) is
their AND and we can compute \(2 \times F\) multiplying \(F\) by the
variable \(2\).<br />
<br />
<br />
<h4>
Multiplication</h4>
Let \(F\) and \(G\) be polynomials such that \(v_F = v_G = v\), then we can compute their producting with the relation:<br />
\[<br />
F \times G = (low_F \times low_G) + (high_F \times high_G) \times v^2 + ((high_F \times low_G) + (low_F \times high_G)) \times v<br />
\]<br />
<br />
We can already compute the expression, since it uses only addition and multiplication by a variable(plus the recursive calls).<br />
<br />
What
happens if we have \(F \times G\) and \(v_F \neq v_G\)? Suppose \(v_F
< v_G\) than we should simply multiply \(low_F\) and \(high_F\) by
\(G\) and we would be done.<br />
<br />
<h3>
An implementation in Python</h3>
Writing an implementation in python
is quite straightforward. It takes only about 130 lines of code. The
only thing that we should decide is where to put the "caching system".
We could create a "Polynomial" factory that will return unique
polynomials, just like ZUNIQUE, but giving the opportunity to create
more factories, or we can just provide a single factory.<br />
<br />
My implementation uses a single factory, which is actually provided as the <span style="font-family: "Courier New",Courier,monospace;">__new__</span> method. All operations can be cached using a simple decorator(to avoid rewriting boiler-plate code every time).<br />
<br />
<br />
<pre class="brush:python">def memoize(name=None, symmetric=True):
def cached(meth):
if name is None:
name_ = meth.__name__.strip('_')
else:
name_ = name
def decorator(self, other):
try:
return self.CACHE[name_][(self, other)]
except KeyError:
result = meth(self,other)
self.CACHE[name_][(self,other)] = result
if symmetric:
self.CACHE[name_][(other,self)] = result
return result
return decorator
return cached
class Poly(object):
CACHE = {
'new': {},
'and': {},
'or': {},
'xor': {},
'add': {},
'mul': {},
}
def __new__(cls, label, low=None, high=None):
if high is not None and high.label is False:
return low
try:
return cls.CACHE['new'][(label, low, high)]
except KeyError:
r = cls.CACHE['new'][(label, low, high)] = object.__new__(cls, label, low, high)
return r
def __init__(self, label, low=None, high=None):
self.label, self.low, self.high = label, low, high
@memoize()
def __and__(self, other):
if other.is_terminal():
f, g = other, self
else:
f, g = self, other
if f.is_terminal():
return g if f.label is True else f
elif f.label == g.label:
return Poly(f.label, f.low & g.low, f.high & g.high)
return f.low & g.low if f.label < g.label else f & g.low
@memoize()
def __or__(self, other):
if other.is_terminal():
f, g = other, self
else:
f, g = self, other
if f.is_terminal():
return (g if g.label is not False else f) if f.label is True else g
elif f.label == g.label:
return Poly(f.label, f.low | g.low, f.high | g.high)
return (Poly(f.label, f.low | g, f.high) if f.label < g.label else
Poly(g.label, f | g.low, g.high))
@memoize()
def __xor__(self, other):
if other.is_terminal():
f, g = other, self
else:
f, g = self, other
if f.is_terminal():
return (Poly(f.label ^ g.label) if g.is_terminal() else
Poly(g.label, f ^ (g.low), g.high))
elif f.label == g.label:
return Poly(f.label, f.low ^ g.low, f.high ^ g.high)
return (Poly(f.label, f.low ^ g, f.high) if f.label < g.label else
Poly(g.label, f ^ g.low, g.high))
@memoize()
def __mul__(self, other):
if isinstance(other, Poly):
if other.is_terminal():
f, g = other, self
else:
f, g = self, other
if f.is_terminal():
return g if f.label is True else f
elif f.label == g.label:
return (f.low * g.low +
(f.high * g.high) * (f.label[0], f.label[1]*2) +
(f.high * g.low + f.low * g.high) * f.label)
return (Poly(f.label, f.low * g, f.high * g) if f.label < g.label else
Poly(g.label, f * g.low, f * g.high))
else:
if self.is_terminal():
return (Poly(other, Poly(False), Poly(True)) if self.label is True
else self)
elif self.label < other:
return Poly(self.label, self.low * other, self.high * other)
elif self.label == other:
return Poly(self.label, self.high * (other[0], other[1]*2), self.low)
return Poly(other, Poly(False), self)
@memoize()
def __add__(self, other):
if self.is_terminal() and other.is_terminal():
return self | other
xor = self ^ other
intersect = self & other
if intersect.is_terminal() and not intersect.label:
result = xor
else:
result = xor + (intersect * ('2', 1))
return result
def is_terminal(self):
return self.low is self.high is None
</pre>
<br />
<br />Giacomo Alzettahttp://www.blogger.com/profile/16994410213478654078noreply@blogger.com0tag:blogger.com,1999:blog-9122159695574112755.post-4860041860386558482012-09-13T16:30:00.000+02:002012-09-14T10:11:27.842+02:00Plans for future postsI probably wont have much time to post anything until the 15<sup>th</sup> of October.<br />
<br />
During the following week I plan to publish some interesting posts about my AKS algorithm implementation in python.<br />
<br />
This algorithm was a really good challenge, for different reasons:<br />
<br />
<ul>
<li>If
you want to understand how it works you have to study some
algebra(actually not a lot of algebra, since it uses only basic algebra
facts). I hadn't ever studied algebra at school, so this was a nice
discovery</li>
<li>The algorithm is of great theoretical importance(being
the only polynomial-time primality-testing algorithm for general
numbers), but it's actually really slow, and it's quite hard to get a
decent implementation.</li>
<li>Trying to optimize this algorithm I came across some really nice techniques, such as window multiplication for polynomials </li>
</ul>
Giacomo Alzettahttp://www.blogger.com/profile/16994410213478654078noreply@blogger.com0