Jekyll2018-06-24T20:16:23+00:00http://www.pwills.com/Peter WillsSource for pwills.comPeter Willspeter@pwills.comInverse Transform Sampling in Python2018-06-24T00:00:00+00:002018-06-24T00:00:00+00:00http://www.pwills.com/blog/posts/2018/06/24/sampling<p>When doing data work, we often need to sample random variables. This is easy to
do if one wishes to sample from a Gaussian, or a uniform random variable, or a
variety of other common distributions, but what if we want to sample from an
arbitrary distribution? There is no obvious way to do this within
<code class="highlighter-rouge">scipy.stats</code>. So, I build a small library, <a href="https://www.github.com/peterewills/itsample"><code class="highlighter-rouge">inverse-transform-sample</code></a>,
that allows for sampling from arbitrary user provided distributions. In use, it
looks like this:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span> <span class="c"># unit Gaussian, not normalized</span>
<span class="kn">from</span> <span class="nn">itsample</span> <span class="kn">import</span> <span class="n">sample</span>
<span class="n">samples</span> <span class="o">=</span> <span class="n">sample</span><span class="p">(</span><span class="n">pdf</span><span class="p">,</span><span class="mi">1000</span><span class="p">)</span> <span class="c"># generate 1000 samples from pdf </span></code></pre></figure>
<p>The code is available <a href="https://www.github.com/peterewills/itsample">on GitHub</a>. In this post, I’ll outline the theory of
<a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling">inverse transform sampling</a>, discuss computational details, and outline some
of the challenges faced in implementation.</p>
<h2 id="introduction-to-inverse-transform-sampling">Introduction to Inverse Transform Sampling</h2>
<p>Suppose we have a probability density function \(p(x)\), which has an
associated cumulative density function (CDF) \(F(x)\), defined as usual by</p>
<script type="math/tex; mode=display">F(x) = \int_{-\infty}^x p(s)ds.</script>
<p>Recall that the cumulative density function \(F(x)\) tells us <em>the probability
that a random sample from \(p\) is less than or equal to x</em>.</p>
<p>Let’s take a second to notice something here. If we knew, for some x, that
\(F(x)=t\), then drawing \(x\) from \(p\) is in some way <strong>equivalent to
drawing \(t\) from a uniform random variable on \([0,1]\)</strong>, since the CDF for
a uniform random variable is \(F_u(t) = t\).<sup id="fnref:fnote1"><a href="#fn:fnote1" class="footnote">1</a></sup></p>
<p>That realization is the basis for inverse transform sampling. The procedure is:</p>
<ol>
<li>Draw a sample \(t\) uniformly from the inverval \([0,1]\).</li>
<li>Solve the equation \(F(x)=t\) for \(x\) (invert the CDF).</li>
<li>Return the resulting \(x\) as the sample from \(p\).</li>
</ol>
<h2 id="computational-considerations">Computational Considerations</h2>
<p>Most of the computational work done in the above algorithm comes in at step 2,
in which the CDF is inverted.<sup id="fnref:fnote2"><a href="#fn:fnote2" class="footnote">2</a></sup> Consider Newton’s method, a typical
routine for finding numerical solutions to equations: the approach is iterative,
and so the function to be inverted, in our case the CDF \(F(x)\), is evaluated
many times. Now, in our case, since \(F\) is a (numerically computed) integral
of \(p\), this means that we will have to run our numerical quadrature routine
once for each evaluation of \(F\). Since we need <em>many</em> evaluations of \(F\)
for a single sample, this can lead to a significant slowdown in sampling.</p>
<p>Again, the pain point here is that our CDF \(F(x)\) is slow to evaluate,
because each evaluation requires numerical quadrature. What we need is an
approximation of the CDF that is fast to evaluate, as well as accurate.</p>
<h3 id="chebyshev-approximation-of-the-cdf">Chebyshev Approximation of the CDF</h3>
<p>I snooped around on the internet a bit, and found <a href="https://github.com/scipy/scipy/issues/3747">this feature request</a> for
scipy, which is related to this same issue. Although it never got off the
ground, I found an interesting link to <a href="https://arxiv.org/pdf/1307.1223.pdf">a 2013 paper by Olver & Townsend</a>, in
which they suggest using Chebyshev polynomials to approximate the PDF. The
advantage of this approach is that the integral of a series of Chebyshev
polynomials is known analytically - that is, if we know the Chebyshev expansion
of the PDF, we automatically know the Chebyshev expansion of the CDF as
well. This should allow us to rapidly invert the (Chebyshev approximation of
the) CDF, and thus sample from the distribution efficiently.</p>
<h3 id="other-approaches">Other Approaches</h3>
<p>There are also less mathematically sophisticated approaches that immediately
present themselves. One might consider solving \(F(x)=t\) on a grid of \(t\)
values, and then building the function \(F^{-1}(x)\) by interpolation. One
could even simply transform the provided PDF into a histogram, and then use the
functionality built in to <code class="highlighter-rouge">scipy.stats</code> for sampling from a provided histogram
(more on that later). However, due to time constraints,
<code class="highlighter-rouge">inverse-transform-sample</code> only includes the numerical quadrature and Chebyshev
approaches.</p>
<h2 id="implementation-in-python">Implementation in Python</h2>
<p>The implementation of this approach is not horribly sophisticated, but in
exchange it exhibits that wonderful readability characteristic of Python
code. The complexity is the highest in the methods implementing the
Chebyshev-based approach; those without a background in numerical analysis may
wonder, for example, why the function is evaluted on <a href="https://en.wikipedia.org/wiki/Chebyshev_nodes">that particularly strange
set of nodes</a>.</p>
<p>In the quadrature-based approach, both the numerical quadrature and root-finding
are both done via <code class="highlighter-rouge">scipy</code> library (<code class="highlighter-rouge">scipy.integrate.quad</code> and
<code class="highlighter-rouge">scipy.optimize.root</code>, respectively). When using this approach, one can set the
boundaries of the PDF to be infinite, as <code class="highlighter-rouge">scipy.integrate.quad</code> supports
improper integrals. In the <a href="https://github.com/peterewills/itsample/blob/master/example.ipynb">notebook of examples</a>, we show that the samples
generated by this approach do, at least in the eyeball norm, conform to the
provided PDF. As we expected, this approach is slow - it takes about 7 seconds to generate
5,000 samples from a unit normal.</p>
<p>As with the quadrature and root-finding, pre-rolled functional from <code class="highlighter-rouge">scipy</code> was
used to both compute and evaluate the Chebyshev approximants. When approximating
a PDF using Chebyshev polynomials, finite bounds must be provided. A
user-determined tolerance determines the order of the Chebyshev approximation;
however, rather than computing a true error, we simply use the size of the last
few coefficients of the Chebyshev coefficients as an approximation. Since this
approach differs from the previousl only in the way that the CDF is constructed,
we use the same function <code class="highlighter-rouge">sample</code> for both approaches; an option
<code class="highlighter-rouge">chebyshev=True</code> will generate a Chebyshev approximant of the CDF, rather than
using numerical quadrature.</p>
<p>I hoped that the Chebyshev approach would improve on this by an order of
magnitude or two; however, my hopes were thwarted. The implementation of the
Chebyshev approach is faster by perhaps a factor of 2 or 3, but does not offer
the kind of improvement I had hoped for. What happened? In testing, a single
evaluation of the Chebyshev CDF was not much faster than a single evaluation of
the quadrature CDF. The advantage of the Chebyshev CDF comes when one wishes to
evaluate a long, vectorized set of inputs; in this case, the Chebyshev CDF is
orders of magnitude faster than quadrature. But <code class="highlighter-rouge">scipy.optimize.root</code> does not
appear to take advantage of vectorization, which makes sense - in simple
iteration schemes, the value at which the next iteration occurs depends on the
outcome of the current iteration, so there is not a simple way to vectorize the
algorithm.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I suspect that the reason this feature is absent from large-scale library like
<code class="highlighter-rouge">scipy</code> and <code class="highlighter-rouge">numpy</code> is that it is difficult to build a sampler that is both fast
and accurate over a large enough class of PDFs. My approach sacrifices speed;
other approximation schemes may be very fast, but may not provide the accuracy
guarantees needed by some users.</p>
<p>What we’re left with is a library that is useful for generating small numbers
(less than 100,000) of samples. It’s worth noting that in the work of Olver &
Townsend, they seem to be able to use the Chebyshev approach to sample orders of
magnitude faster than my impelmentation, but sadly their Matlab code is nowhere
to be found in the Matlab library <a href="http://www.chebfun.org/"><code class="highlighter-rouge">chebfun</code></a>, which is the location
advertised in their work. Presumably they implemented their own root-finder, or
Chebyshev approximation scheme, or both. There’s a lot of space for improvement
here, but I simply ran out of time and energy on this one; if you feel inspired,
<a href="https://github.com/peterewills/itsample#contributing">fork the repo</a> and submit a pull request!</p>
<!-------------------------------- FOOTER ---------------------------->
<!-- Wish we could put this in _includes/scripts.html. But it doesn't run from -->
<!-- there. It needs to be run at the bottom of the file, rather than at the -->
<!-- top; perhaps that has something to do with it. Anyways, I'll just include -->
<!-- this chunk of HTML at the footer of all my posts, even though its fugly. -->
<div id="disqus_thread"></div>
<script>
/**
* RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS.
* LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://disqus.com/admin/universalcode/#configuration-variables*/
/*
var disqus_config = function () {
this.page.url = PAGE_URL; // Replace PAGE_URL with your page's canonical URL variable
this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page's unique identifier variable
};
*/
(function() { // DON'T EDIT BELOW THIS LINE
var d = document, s = d.createElement('script');
s.src = 'https://pwills-com.disqus.com/embed.js';
s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
<div class="footnotes">
<ol>
<li id="fn:fnote1">
<p>This is only true for \(t\in [0,1]\). For \(t<0\),
\(F_u(t)=0\), and for \(t>1\), \(F_u(t)=1\). <a href="#fnref:fnote1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:fnote2">
<p>The inverse of the CDF is often called the percentile point function,
or PPF. <a href="#fnref:fnote2" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Peter Willspeter@pwills.comWhen doing data work, we often need to sample random variables. This is easy to do if one wishes to sample from a Gaussian, or a uniform random variable, or a variety of other common distributions, but what if we want to sample from an arbitrary distribution? There is no obvious way to do this within scipy.stats. So, I build a small library, inverse-transform-sample, that allows for sampling from arbitrary user provided distributions. In use, it looks like this:The Meaning of Entropy2018-02-06T00:00:00+00:002018-02-06T00:00:00+00:00http://www.pwills.com/blog/posts/2018/02/06/entropy<p><strong>Entropy</strong> is a word that we see a lot in various forms. It’s classical use
comes from thermodynamics: e.g. “the entropy in the universe is always
increasing.” With the recent boom in statistics and machine learning, the word
has also seen a surge in use in information-theoretic contexts: e.g. “minimize
the cross-entropy of the validation set.”</p>
<p>It’s been an ongoing investigation for me, trying to figure out just what the
hell this information-theoretic entropy is all about, and how it connects to
the notion I’m familiar with from statistical mechanics. Reading through the
wonderful book <a href="https://www.amazon.com/Data-Analysis-Bayesian-Devinderjit-Sivia/dp/0198568320">Data Analysis: a Bayesian Tutorial</a> by D. S. Sivia, I
found the first connection between these two notions that really clicked for
me. I’m going to run through the basic argument here, in the hope that
reframing it in my own words will help me understand it more thoroughly.</p>
<h2 id="entropy-in-thermodynamics">Entropy in Thermodynamics</h2>
<p>Let’s start with the more intuitive notion, which is that of thermodynamic
entropy. This notion, when poorly explained, can seem opaque or quixotic;
however, when viewed through the right lens, it is straightforward, and the law
of increasing entropy becomes a highly intuitive result.</p>
<h3 id="counting-microstates">Counting Microstates</h3>
<p>Imagine, if you will, the bedroom of a teenager. We want to talk about the
entropy of two different states: the state of being “messy” and the state of
being “clean.” We will call these <strong>macrostates</strong>; they describe the macroscopic
(large-scale) view of the room. However, there are also many different
microstates. One can resolve these on a variety of scales, but let’s just say
they correspond to the location/position of each individual object in the
room. To review:</p>
<table>
<thead>
<tr>
<th>Type</th>
<th>Definition</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>Macrostate</td>
<td>Overall Description</td>
<td>“Messy”</td>
</tr>
<tr>
<td>Microstate</td>
<td>Fine-Scale Description</td>
<td>“Underwear on lamp, shoes in bed, etc.”</td>
</tr>
</tbody>
</table>
<h3 id="the-boltzmann-entropy">The Boltzmann Entropy</h3>
<p>One might notice an interesting fact: that there are many more possible
microstates that correspond to “messy” than there are microstates that
correspond to “clean.” <strong>This is exactly what we mean when we say that a messy
room has higher entropy.</strong> In particular, the entropy of a macrostate is <strong>the
log of the number of microstates that correspond to that macrostate.</strong> We call
this the Boltzmann entropy, and denote it by \(S_B\). If there are
\(\Omega\) possible microstates that correspond to the macrostate of being
“messy,” then we define the entropy of this state as<sup id="fnref:fnote2"><a href="#fn:fnote2" class="footnote">1</a></sup></p>
<script type="math/tex; mode=display">S_B(\text{messy}) = \log(\Omega).</script>
<p>This is essentiall all we need to know here.<sup id="fnref:fnote1"><a href="#fn:fnote1" class="footnote">2</a></sup> The entropy tells us how many
different ways there are to get a certian state. A pyramid of oranges in a
supermarket has lower entropy than the oranges fallen all over the floor,
because there are many configurations of oranges that we would call “oranges all
over the floor,” but very few that we would call “a nicely organized pyramid of
oranges.”</p>
<p>In this context, the law of increasing entropy becomes almost tautological. If
things are moving around in our bedroom at random, and we call <em>most</em> of those
configurations “messy,” then the room will tend towards messyness rather than
cleanliness. We sometimes use the terms “order” and “disorder” to refer to
states of relatively low and high entropy, respectively.</p>
<h2 id="entropy-in-information-theory">Entropy in Information Theory</h2>
<p>One also frequently encounters a notion of entropy in statistics and information
theory. This is called the <em>Shannon entropy</em>, and the motivation for this post
is my persistent puzzlement over the connection between Boltzmann’s notion of
entropy and Shannon’s. Previous to reading <a href="https://www.amazon.com/Data-Analysis-Bayesian-Devinderjit-Sivia/dp/0198568320">D. Sivia’s manual</a>, I only knew
the definition of Shannon entropy, but his work presented such a clear
exposition of the connection to Boltzmann’s ideas that I felt compelled to share it.</p>
<h3 id="permutations-and-probabilities">Permutations and Probabilities</h3>
<p>We’ll work with a thought experiment.<sup id="fnref:fnote3"><a href="#fn:fnote3" class="footnote">3</a></sup> Suppose we have \(N\) subjects
we organize into \(M\) groups, with \(N\gg M\). Let \(n_i\) indicate the
number of subjects that are in the \(i^\text{th}\) group, for
\(i=1,\ldots,M\). Of course,</p>
<script type="math/tex; mode=display">\sum_{i=1}^M n_i = N,</script>
<p>and if we choose a person at random the probability that they are in group
\(i\) is</p>
<script type="math/tex; mode=display">p_i = \frac{n_i}{N}.</script>
<p>The <strong>Shannon entropy</strong> of such a discrete distribution is defined as</p>
<script type="math/tex; mode=display">S = -\sum_{i=1}^M p_i\log(p_i)</script>
<p>But why? Why \(p\log(p)\)? Let’s look and see.</p>
<p>A macrostate of this system is defined by the size of the groups \(n_i\);
equivalently, it is defined as the probability distribution. A microstate of
this system is specifying the group of each subject: the specification that
subject number \(j\) is in group \(i\) for each \(j=1,\ldots,N\). How many
microstates correspond to a given macrostate? For the first group, we can fill
it with any of the \(N\) participants, and we must choose \(n_1\) members of
the group, so the number of ways of assigning participants to this group is</p>
<script type="math/tex; mode=display">{N\choose n_1} = \frac{N!}{n_1!(N-n_1)!}</script>
<p>For the second group, there are \(N - n_1\) remaining subjects, and we must assign
\(n_2\) of them, and so on. Thus, the total number of ways of arranging the
\(N\) balls into the groups of size \(n_i\) is</p>
<script type="math/tex; mode=display">\Omega = {N\choose n_1}{N-n_1 \choose n_2}\ldots {N-n_1-\ldots-n_{M-1}\choose n_M}.</script>
<p>This horrendous list of binomial coefficients can be simplified down to just</p>
<script type="math/tex; mode=display">\Omega = \frac{N!}{n_1!n_2!\ldots n_M!}.</script>
<p>The Boltzmann entropy of this macrostate is then</p>
<script type="math/tex; mode=display">S_B = \log(\Omega) = \log(N!) - \sum_{i=1}^M \log(n_i!)</script>
<h3 id="from-boltzmann-to-shannon">From Boltzmann to Shannon</h3>
<p><strong>We will now show that the Boltzmann entropy is (approimxately) a scaling of the
Shannon entropy</strong>; in particular, \(S_B \approx N\,S\). Things are going to get
slightly complicated in the algebra, but hang on. If you’d prefer, you can take
my word for it, and skip to the next section.</p>
<p>We will use the Stirling approximation \(\log(n!)\approx n\log(n)\)<sup id="fnref:fnote4"><a href="#fn:fnote4" class="footnote">4</a></sup>
to simplify:</p>
<script type="math/tex; mode=display">S_B \approx N\log(N) - \sum_{i=1}^M n_i\log(n_i)</script>
<p>Since the probability \(p_i=n_i/N\), we can re-express \(S_b\) in terms of
\(p_i\) via</p>
<script type="math/tex; mode=display">S_B \approx N\log(N)-N\sum_{i=1}^M p_i\log(Np_i)</script>
<p>Since \(\sum_ip_i=1\), we have</p>
<script type="math/tex; mode=display">S_B \approx -N\sum_{i=1}^M p_i\log(p_i) = N \, S.</script>
<p>Phew! So, the Boltzmann entropy \(S_b\) of having \(N\) students in \(M\)
groups with sized \(n_i\) is (approximately) \(N\) times the Shannon
entropy.</p>
<h2 id="who-cares">Who Cares?</h2>
<p>Admittedly, this kind of theoretical revalation will probably not change the way
you deploy cross-entropy in your machine learning projects. It is primarily used
because its gradients behave well, which is important in the stochastic
gradient-descent algorithms favored by modern deep-learning
architectures. However, I personally have a strong dislike of using tools that I
don’t have both a theoretical understanding of; hopefully you now have a better
grip on the theoretical underpinnings of cross entropy, and its relationship to
statistical mechanics.</p>
<!-------------------------------- FOOTER ---------------------------->
<!-- Wish we could put this in _includes/scripts.html. But it doesn't run from -->
<!-- there. It needs to be run at the bottom of the file, rather than at the -->
<!-- top; perhaps that has something to do with it. Anyways, I'll just include -->
<!-- this chunk of HTML at the footer of all my posts, even though its fugly. -->
<div id="disqus_thread"></div>
<script>
/**
* RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS.
* LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://disqus.com/admin/universalcode/#configuration-variables*/
/*
var disqus_config = function () {
this.page.url = PAGE_URL; // Replace PAGE_URL with your page's canonical URL variable
this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page's unique identifier variable
};
*/
(function() { // DON'T EDIT BELOW THIS LINE
var d = document, s = d.createElement('script');
s.src = 'https://pwills-com.disqus.com/embed.js';
s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
<div class="footnotes">
<ol>
<li id="fn:fnote2">
<p>Often a constant will be included in this definition, so that
\(S=k_B \log(\Omega)\). This constant is arbitrary, as it simply rescales
the units of our entropy, and it will only serve to get in the way of our
analysis, so we omit it. <a href="#fnref:fnote2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:fnote1">
<p>All we need to know for the purpose of establishing a connection
between thermodynamic and information-theoretic entropy; of course there is
much more to know, and there are many alternative ways of conceptualizing
entropy. However, none of these have ever been intuitive to me in the way
that Boltzmann’s definition of entropy is. <a href="#fnref:fnote1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:fnote3">
<p>We have slightly rephrased Sivia’s presentation to fit our purposes here. <a href="#fnref:fnote3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:fnote4">
<p>The most commonly used form of Stirling’s approximation is the more
precise \(\log(n!)\approx n\log(n)-n\), but we use a coarser form here. <a href="#fnref:fnote4" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Peter Willspeter@pwills.comEntropy is a word that we see a lot in various forms. It’s classical use comes from thermodynamics: e.g. “the entropy in the universe is always increasing.” With the recent boom in statistics and machine learning, the word has also seen a surge in use in information-theoretic contexts: e.g. “minimize the cross-entropy of the validation set.” It’s been an ongoing investigation for me, trying to figure out just what the hell this information-theoretic entropy is all about, and how it connects to the notion I’m familiar with from statistical mechanics. Reading through the wonderful book Data Analysis: a Bayesian Tutorial by D. S. Sivia, I found the first connection between these two notions that really clicked for me. I’m going to run through the basic argument here, in the hope that reframing it in my own words will help me understand it more thoroughly. Entropy in ThermodynamicsA Website is Born!2017-12-20T00:00:00+00:002017-12-20T00:00:00+00:00http://www.pwills.com/blog/posts/2017/12/20/website<p>I learned a lot while building this website; I hope to share it so that it might
be helpful for anyone trying to do the same. I’m sure you’ll notice that I’m far
from an expert in the subjects we’re going to explore here; this is my first
foray into web development. If you have any corrections, or things I’ve
misunderstood, I’d love to hear about it! Just post a comment.</p>
<p>The site is built using <a href="https://jekyllrb.com/">Jekyll</a>, using the theme <a href="https://mmistakes.github.io/minimal-mistakes/">Minimal Mistakes</a>. I
host it on <a href="https://pages.github.com/">Github pages</a>, and purchased and manage my domain through
<a href="https://domains.google/#/">Google Domains</a>. We’ll go through each of these steps in detail. I’ll assume
that you have the up-to-date versions of Ruby and Jekyll on your local
machine. I’m going through all this in macOS, which may affect some of the shell
commands I give, but translating to Windows shouldn’t be too hard.</p>
<h2 id="making-a-site-with-minimal-mistakes">Making a site with Minimal Mistakes</h2>
<p>The website for Minimal Mistakes includes a great quick-start guide; I
recommend the <a href="https://mmistakes.github.io/minimal-mistakes/docs/quick-start-guide/#starting-from-jekyll-new">Starting with <code class="highlighter-rouge">jekyll new</code></a> section as a place to
start. Using this you shoudl be able to establish a base site with some
simple demonstration content.</p>
<h3 id="enabling-mathjax">Enabling MathJax</h3>
<p>In order to enable <a href="https://www.mathjax.org">MathJax</a>, which renders the mathematical equations you see in
my posts, you’ll need to edit the file <code class="highlighter-rouge">scripts.html</code> contained in the folder
<code class="highlighter-rouge">_includes/</code> to include a line enabling MathJax. However, you’ll want to avoid
overwriting the contents of the default <code class="highlighter-rouge">scripts.html</code>.</p>
<p>So, we need to find where <code class="highlighter-rouge">bundle</code> is storing the Gem for Minimal Mistakes. To
find this, do</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bundle show minimal-mistakes-jekyll
</code></pre></div></div>
<p>If you just want to navigate directly to that directory, do</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd $(bundle show minimal-mistakes-jekyll)
</code></pre></div></div>
<p>Now you can copy the default <code class="highlighter-rouge">scripts.html</code> into your site:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp _includes/scripts.html /path/to/site/_includes/scripts.html
</code></pre></div></div>
<p>Open the copied <code class="highlighter-rouge">scripts.html</code> in your editor of choice,<sup id="fnref:fnote1"><a href="#fn:fnote1" class="footnote">1</a></sup> and add the
following lines at the end:</p>
<figure class="highlight"><pre><code class="language-html" data-lang="html">
{% if page.mathjax %}
<span class="nt"><script </span><span class="na">type=</span><span class="s">"text/javascript"</span> <span class="na">async</span>
<span class="na">src=</span><span class="s">"https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML"</span><span class="nt">></span>
<span class="nt"></script></span>
{% endif %}
</code></pre></figure>
<p>And you’re done!<sup id="fnref:fnote2"><a href="#fn:fnote2" class="footnote">2</a></sup> Now, you can type <code class="highlighter-rouge">$$x_1$$</code> to see <script type="math/tex">x_1</script>, and so
on. The <code class="highlighter-rouge">$$...$$</code> syntax will generate inline math if used inline, and will
generate a display equation if used on its own line. So, if one enters</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$$ f(a) = \frac{1}{2\pi i} \oint_\gamma \frac{f(z)}{z-a} dz $$
</code></pre></div></div>
<p>Then the rendered equation appears as so:</p>
<script type="math/tex; mode=display">f(a) = \frac{1}{2\pi i} \oint_\gamma \frac{f(z)}{z-a} dz</script>
<h3 id="customize-font-sizes">Customize Font Sizes</h3>
<p>I found the fonts a bit oversized, so I wanted to change the size for the
posts. In order to do this, you need to copy <strong>the entire folder</strong> which
contains all the relevant scss files. In order to do this, do</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd $(bundle show minimal-mistakes-jekyll)
cp -r _sass /path/to/site
</code></pre></div></div>
<p>Now, after much digging through the GitHub issues,<sup id="fnref:fnote3"><a href="#fn:fnote3" class="footnote">3</a></sup> I found that the
file to edit here is <code class="highlighter-rouge">_sass/_reset.scss</code>. In my site, the relevant chunk of text
looks like</p>
<figure class="highlight"><pre><code class="language-html" data-lang="html"> @include breakpoint($medium) {
font-size: 13px;
}
@include breakpoint($large) {
font-size: 15px;
}
@include breakpoint($x-large) {
font-size: 18px;
}</code></pre></figure>
<p>Once this file has been edited, you should see the font size reduced in your
page.</p>
<h2 id="getting-it-on-github-pages">Getting it on GitHub Pages</h2>
<p>Okay, now we write a bunch of nonsense, find some beautiful pictures at
<a href="https://git-scm.com/docs/gittutorial">Unsplash</a> to use as headers, and we’re ready to publish the thing on GitHub
Pages. I’ll first go through as though we don’t want to use a custom domain, so
that the website will be exposed at <code class="highlighter-rouge">USERNAME.github.io</code>.</p>
<h3 id="enabling-jekyll-remote-theme">Enabling <code class="highlighter-rouge">jekyll-remote-theme</code></h3>
<p>First of all, make sure that you’re using the <code class="highlighter-rouge">remote-theme</code> jekyll plugin,
which allows you to use any jekyll theme that is GitHub hosted, rather than only
the few that are officially supported. This process is outlined on the Minimal
Mistakes website, but I’ll go through it here.</p>
<p>First, <strong>in your <code class="highlighter-rouge">_config.yml</code> file</strong>, enable the plugin by including it in the
<code class="highlighter-rouge">plugins</code> list, via</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>plugins:
- jekyll-remote-theme
</code></pre></div></div>
<p>If you have other plugins you want to use (I use <code class="highlighter-rouge">jekyll-feed</code>), then add them
to this list as well. Designate the <code class="highlighter-rouge">remote_theme</code> variable, but do so <strong>after
setting the theme</strong>, so that you have in your config file</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>theme: "minimal-mistakes-jekyll"
remote_theme: "mmistakes/minimal-mistakes"
</code></pre></div></div>
<p>Finally, in your <code class="highlighter-rouge">Gemfile</code>, add <code class="highlighter-rouge">gem "jekyll-remote-theme"</code>.</p>
<h3 id="push-it-to-the-repository">Push it to the repository</h3>
<p>GitHub pages looks for a repository that follows the naming convention
<code class="highlighter-rouge">USERNAME.github.io</code>. So, for example, since my GitHub username is
<code class="highlighter-rouge">peterewills</code>, the repository for the source of this site is at
<code class="highlighter-rouge">https://www.github.com/peterewills/peterewills.github.io</code>.</p>
<p>Once you’ve created such a repository, initialize a git repo on your site by
going into <code class="highlighter-rouge">path/to/your/site</code> and doing <code class="highlighter-rouge">git init</code>. Then, do</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git remote add origin https://www.github.com/USERNAME/USERNAME.github.io
</code></pre></div></div>
<p>and then commit and push. (If you’re unfamiliar with using git, I recommend
<a href="https://git-scm.com/docs/gittutorial">either</a> of <a href="https://try.github.io/levels/1/challenges/1">these</a> tutorials.) You’ll get an email that your page build
was successful, but you’re “using an unsupported theme.” Don’t worry about this;
it happens whenever you use <code class="highlighter-rouge">remote-theme</code>.</p>
<p>You now should be able to navigate to <code class="highlighter-rouge">USERNAME.github.io</code> and see your page!</p>
<h2 id="using-a-custom-domain">Using a Custom Domain</h2>
<p>Suppose you’d prefer to use a custom domain, such as <code class="highlighter-rouge">mydomain.pizza</code> (this is
actually a real, and available, domain name). There are lots of ways to do this;
I did it through <a href="https://domains.google.com">Google Domains</a>, so I’ll go through those steps.</p>
<p>First, you go to <a href="https://domains.google.com">Google Domains</a>, pick out the domain you want, and register
it. For this example, we’ll assume you went with <code class="highlighter-rouge">mydomain.pizza</code>. You should
now see it appear under the <strong>My Domains</strong> tab on the right side of the
page. You should see a domain called <code class="highlighter-rouge">mydomain.pizza</code> and a <strong>DNS</strong> option. This
is what we need to edit.</p>
<p>We need to configure the DNS behavior of our domain so that it points at the IP
address where GitHub Pages is hosting it. On the DNS page, scroll down to
<strong>Custom Resource Records</strong>. You’ll want to add three custom resource records;
two “host” resource records (designated by an A) and one “alias” resource record
(designated by CNAME). GitHub pages exposes its sites at IP addresses
192.30.252.153 and 192.30.252.154. So, you’ll want to add both of these as host
resource records. You’ll want to add your GitHub Pages url <code class="highlighter-rouge">USERNAME.github.io</code>
as an alias record. By the time you’ve added the three, your list of resource
records should look like the example below.</p>
<p><img src="/assets/images/custom_resource.png" alt="" /></p>
<p>So, now your url (<code class="highlighter-rouge">mydomain.pizza</code>) knows that it is an alias for
<code class="highlighter-rouge">USERNAME.github.io</code>, but we still have to specify this aliasing on the GitHub
end of things.</p>
<p>To do this, simply make a text file called <code class="highlighter-rouge">CNAME</code> and include on the first line</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mydomain.pizza
</code></pre></div></div>
<p>This is the entire contents of the text file <code class="highlighter-rouge">CNAME</code>. Once this is pushed to the
repository <code class="highlighter-rouge">USERNAME/USERNAME.github.io</code>, the appropriate settings should
automatically update themselves. To check this, go to the respository settings,
scroll down to the “GitHub Pages” settings, and look under “Custom domain.” You
should see something like the following:</p>
<p><img src="/assets/images/github_repo.png" alt="" /></p>
<p>If the DNS record of your Google domain has not yet been updated, then you will
see <code class="highlighter-rouge">Your site is ready to be published mydomain.pizza</code> on a yellow
background. Note that it sometimes takes up to 48 hours for DNS records to
update, so be patient.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Once the DNS records have updated, you should be able to see your site at
<code class="highlighter-rouge">mydomain.pizza</code>. You can check out <a href="https://www.github.com/peterewills/peterewills.github.io">the repository for my site</a> to see
examples of what I’ve gone through here; including my <code class="highlighter-rouge">CNAME</code> file, my
<code class="highlighter-rouge">_include/scripts.html</code> file that enables MathJax, and my <code class="highlighter-rouge">_config.yml</code>
file. Please let me know, either by email or in the comments, if you have any
questions or corrections!</p>
<!-------------------------------- FOOTER ---------------------------->
<!-- Wish we could put this in _includes/scripts.html. But it doesn't run from -->
<!-- there. It needs to be run at the bottom of the file, rather than at the -->
<!-- top; perhaps that has something to do with it. Anyways, I'll just include -->
<!-- this chunk of HTML at the footer of all my posts, even though its fugly. -->
<div id="disqus_thread"></div>
<script>
/**
* RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS.
* LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://disqus.com/admin/universalcode/#configuration-variables*/
/*
var disqus_config = function () {
this.page.url = PAGE_URL; // Replace PAGE_URL with your page's canonical URL variable
this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page's unique identifier variable
};
*/
(function() { // DON'T EDIT BELOW THIS LINE
var d = document, s = d.createElement('script');
s.src = 'https://pwills-com.disqus.com/embed.js';
s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
<div class="footnotes">
<ol>
<li id="fn:fnote1">
<p>Presumably emacs. <a href="#fnref:fnote1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:fnote2">
<p>Some <a href="http://dasonk.com/blog/2012/10/09/Using-Jekyll-and-Mathjax">older blog posts</a> discuss the process of adding kramdown as
the markdown rendering engine, but this is default behavior for Jekyll 3.x,
so there’s no need to do this step. <a href="#fnref:fnote2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:fnote3">
<p>Michael, the guy who built Minimal Mistakes, is really wonderful
about responding to issues on GitHub, which are really used as a support
forum for people using the theme who have no experience in web development
(such as myself). <a href="#fnref:fnote3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Peter Willspeter@pwills.comI learned a lot while building this website; I hope to share it so that it might be helpful for anyone trying to do the same. I’m sure you’ll notice that I’m far from an expert in the subjects we’re going to explore here; this is my first foray into web development. If you have any corrections, or things I’ve misunderstood, I’d love to hear about it! Just post a comment.