Paper WIP

2021-04-14 09:47:03 +02:00 · 2021-04-14 09:47:03 +02:00 · ec8df65fba
commit ec8df65fba
parent eb386fb0a0
1 changed files with 718 additions and 21 deletions
--- a/paper/safety-reset-paper.tex
+++ b/paper/safety-reset-paper.tex
@ -1,4 +1,4 @@
-\documentclass[nohyperref]{iacrtrans}
+\documentclass[runningheads]{llncs}
 \usepackage[T1]{fontenc}
 \usepackage[
    backend=biber,
@ -12,7 +12,6 @@
 \usepackage{amssymb,amsmath}
 \usepackage{eurosym}
 \usepackage{wasysym}
-\usepackage{amsthm}

 \usepackage[binary-units]{siunitx}
 \DeclareSIUnit{\baud}{Bd}
@ -30,14 +29,35 @@

 \begin{document}

-\title[Ripples in a Pond]{Transmitting Information through Grid Frequency Modulation}
+\title{Ripples in a Pond: Transmitting Information through Grid Frequency Modulation}
+\titlerunning{Ripples in a Pond: Transmitting Information through Grid Frequency Modulation}
 \author{Jan Sebastian Götte \and Björn Scheuermann}
+\authorrunning{Jan Sebastian Götte \and Björn Scheuermann}
 \institute{HIIG\\ \email{safetyreset@jaseg.de} \and HU Berlin \\ \email{scheuermann@informatik.hu-berlin.de}}
 % FIXME keywords
-\keywords{hardware security \and energy systems \and signal theory}
 \maketitle
+\keywords{Security, privacy and resilience in critical infrastructures \and Security and privacy in ``internet of
+things'' \and Cyber-physical systems \and Hardware security \and Network Security \and Energy systems \and Signal theory}

 \begin{abstract}
+    The smart grid is a large, complex and interconnected technological system. With remotely controllable load switches
+    having been rolled out at scale in some countries, a tiny flaw inside the firmware of one of these embedded devices
+    may allow attacks to remotely trigger large-scale excursions of grid parameters with potentially catastrophic
+    results. Attaining perfect security from such cyberphysical attacks is a monumental embedded engineering task---and
+    observations do not indicate that current efforts meet the requirements of this task.%FIXME cite recent RECESSIM work
+
+    In this paper, we approach the smart grid safety issue by implementing an emergency override that can be used to
+    e.g.\ reset all connected devices to a known-good state and preempting subsequent compromise by cutting
+    communication links. To yield a fully fail-safe design, our system does not rely on the internet or any other
+    communication network to work. Instead, our system transmits error-corrected and cryptographically secured commands
+    by modulating grid frequency using a single large consumer such as a large aluminium smelter. This approach differs
+    from traditional Powerline Communication (PLC) systems in that reaches every device within the same synchronous
+    area.
+
+    Using extensive simulations we have determined that control of a $\SI{25}{\mega\watt}$ load would allow for the
+    transmission of a crytographically secured \emph{reset} signal within $15$ minutes. We have produced a
+    proof-of-concept prototype receiver that demonstrates the feasibility of decoding such signals even on 
+    resource-constrained microcontroller hardware.
 \end{abstract}

 \section{Introduction}
@ -79,15 +99,11 @@ This complex standardization landscape and market situation has led to a prolife
 microcontroller firmware. The complexity and scale of this--often network-connected--firmware makes for a ripe substrate
 for bugs to surface.

-A remotely exploitable flaw inside a smart meter's firmware\footnote{
-    There are several smart metering architectures that ascribe different roles to the component called \emph{smart
-    meter}. Not all systems are susceptible to attacks to the same degree, with the German implementation being almost
-    immune as far as energy availability is concerned. For clarity, we use \emph{smart meter} to describe the entire
-    system at the customer premises including both the meter and if present a gateway.
-} could have consequences ranging from impaired billing functionality to an existential threat to grid
-stability\cite{anderson01,anderson02}. In a country where meters commonly include disconnect switches for purposes such
-as prepaid tariffs a coördinated attack could at worst cause widespread activation of grid safety systems by repeatedly
-connecting and disconnecting megawatts of load capacity in just the wrong moments\cite{wu01}.
+A remotely exploitable flaw inside the firmware of a component of a smart metering ystem could have consequences ranging
+from impaired billing functionality to an existential threat to grid stability\cite{anderson01,anderson02}. In a country
+where meters commonly include disconnect switches for purposes such as prepaid tariffs a coördinated attack could at
+worst cause widespread activation of grid safety systems by repeatedly connecting and disconnecting megawatts of load
+capacity in just the wrong moments\cite{wu01}.

 Mitigation of these attacks through firmware security measures is unlikely to yield satisfactory results. The enormous
 complexity of smart meter firmware makes firmware security extremely labor-intensive. The diverse standardization
@ -104,10 +120,10 @@ re-flashes the meter's main microcontroller over the standard JTAG interface.  N
 \emph{changing grid frequency itself}.  This is fundamentally different in both generation and detection from systems
 such as traditional PLC that superimpose a signal on grid voltage, but leave grid frequency itself unaffected.

-In this thesis, starting from a high level architecture we have carried out extensive simulations of our proposal's
-performance under real-world conditions. Based on these simulations we implemented an end-to-end prototype of our
-proposed safety reset controller as part of a realistic smart meter demonstrator. Finally we experimentally validated
-our results and we will conclude with an outline of further steps towards a practical implementation.
+Starting from a high level architecture, we have carried out extensive simulations of our proposal's performance under
+real-world conditions. Based on these simulations we implemented an end-to-end prototype of our proposed safety reset
+controller as part of a realistic smart meter demonstrator. Finally, we experimentally validated our results and we will
+conclude with an outline of further steps towards a practical implementation.

 This work contains the following contributions:
 \begin{enumerate}
@ -116,15 +132,696 @@ This work contains the following contributions:
        implementation.
    \item We design a communication system based on GFM.
    \item We carry out extensive simulations of our systems to determine its performance characteristics.
-    \item We show the simple grid voltage recorder design we used to capture data for our simulations.
-    \item We introduce a new, simplified method to determine grid frequency from a capture of the grid voltage waveform
-        that is simple to implement on constrained embedded devices.
+    \item We show the simple grid frequency recorder design we used to capture data for our simulations.
 \end{enumerate}

 \section{Related work} 
 \label{sec_related_work}
+% FIXME: Cut down this section from ~6 pages to 2...3 pages.

-\section{Conclusion}
+\subsection{Security and Privacy in the Smart Grid}
+
+The smart grid in practice is nothing more or less than an aggregation of embedded control and measurement devices that
+are part of a large control system. This implies that all the same security concerns that apply to embedded systems in
+general also apply to most components of a smart grid. Where programmers have been struggling for decades now with input
+validation\cite{leveson01}, the same potential issue raises security concerns in smart grid scenarios as well\cite{mo01,
+lee01}.  Only, in smart grid we have two complicating factors present: Many components are embedded systems, and as such
+inherently hard to update. Also, the smart grid and its control algorithms act as a large (partially-)distributed
+system making problems such as input validation or authentication harder\cite{blaze01} and adding a host of distributed
+systems problems on top\cite{lamport01}.
+
+Given that the electrical grid is essential infrastructure in our modern civilization, these problems amount to
+significant issues in practice. Attacks on the electrical grid may have grave consequences\cite{anderson01,lee01} while
+the long maintenance cycles of various components make the system slow to adapt. Thus, components for the smart grid
+need to be built to a much higher standard of security than most consumer devices to ensure they live up to well-funded
+attackers even decades down the road. This requirement intensifies the challenges of embedded security and distributed
+systems security among others that are inherent in any modern complex technological system. The safety-critical nature
+of the modern smart metering ecosystem in particular was quickly recognized\cite{anderson01}.
+
+A point we will not consider in much depth in this work is theft of electricity. While in publications aimed towards the
+general public the introduction of smart metering is always motivated with potential cost savings and ecological
+benefits, in industry-internal publications the reduction of electricity theft is often cited as an
+incentive\cite{czechowski01}.  Likewise, academic publications tend to either focus on other benefits such as generation
+efficiency gains through better forecasting or rationalize the consumer-unfriendly aspects of smart metering with social
+benefits\cite{mcdaniel01}. They do not usually point out \emph{revenue protection} mechanisms as
+incentives\cite{anderson01,anderson02}.
+
+A serious issue in smart metering setups is customer privacy. Even though the meter ``only'' collects aggregate energy
+consumption of a whole household, this data is highly sensitive\cite{markham01}. This counterintuitive fact was
+initially overlooked in smart meter deployments leading to outrage, delays and reduced features\cite{cuijpers01}. The
+root cause of this problem is that given sufficient timing resolution these aggregate measurements contain ample
+entropy. Through disaggregation algorithms individual loads can be identified and through pattern matching even complex
+usage patterns can be discerned with alarming accuracy\cite{greveler01}. Similar privacy issues arise in many other
+areas of modern life through pervasive tracking and surveillance\cite{zuboff01}.
+
+Another fundamental challenge in smart grid implementations is the central role of smart electricity meters in the smart
+grid ecosystem. Smart meters are used both for highly-granular load measurement and (in some countries) load
+switching\cite{zheng01}.  Smart electricity meters are effectively consumer devices. They are built down to a certain
+price point that is measured by the burden it puts on consumers and that is divided by the relatively small market
+served by a single smart meter implementation. Such cost requirements can preclude security features such as the use of
+a standard hardened software environment on a high powered embedded system.  Landis+Gyr, a large manufacturer that makes
+most of its revenue from utility meters in their 2019 annual report write that they \SI{36}{\percent} of their total
+R\&D budget on embedded software (firmware) while spending only \SI{24}{\percent} on hardware
+R\&D\cite{landisgyr01,landisgyr02}, indicating a significant tension between firmware security and a smart meter
+vendor's bottom line. 
+
+\subsection{The state of the art in embedded security}
+
+Embedded software security generally is much harder than security of higher-level systems. The primary two factors
+affecting this are that on one hand, embedded devices usually run highly customized firmware that (often by necessity)
+is rarely updated. On the other hand, embedded devices often lack the advanced security mechanisms such as memory
+management units that are found in most higher-power devices. Even well-funded companies continue to have trouble
+securing their embedded systems. A spectacular example of this difficulty is the recently-exposed flaw in Apple's iPhone
+SoC first-stage ROM bootloader that allows for the full compromise of any iPhone before the iPhone X given physical
+access to the device. iPhone 8, one of the affected models, was still being manufactured and sold by Apple until April
+2020.  In another instance in 2016 researchers found multiple flaws in the secure-world firmware used by Samsung in
+their mobile phone SoCs.  The flaws they found were both severe architectural flaws such as secret user input being
+passed through untrusted userspace processes without any protection and shocking cryptographic flaws such as
+CVE-2016-1919\footnote{\url{http://cve.circl.lu/cve/CVE-2016-1919}}\cite{kanonov01}.  And Samsung is not the only large
+multinational corporation having trouble securing their secure world firmware implementation. In 2014 researchers found
+an embarrassing integer overflow flaw in the low-level code handling untrusted input in Qualcomm's QSEE
+firmware\cite{rosenberg01}. For an overview of ARM TrustZone including a survey of academic work and past security
+vulnerabilities of TrustZone-based firmware see \cite{pinto01}.
+
+If even companies targeting R\&D budgets that rival some countries' national budgets at mass-market consumer devices
+have trouble securing their secure embedded software stacks, what is a much smaller smart meter manufacturer to do?
+Especially if national standards mandate complex protocols such as TLS that are tricky to implement
+correctly\cite{georgiev01}, the manufacturer is short on options to secure their product.
+
+\subsection{Attack surface in the smart grid}
+
+From the previous paragraphs we can conclude that in smart metering technology, market incentives do not currently
+provide the conditions for a level of device security that will reliably last the coming decades. Considering this
+tension, in this paragraph we will outline the cyberphysical risks that arise from attacks on the smart grid in the
+first place.
+
+The first such attack that might come to mind is one where the attacker compromises components of the grids centralized
+control systems. This type of attack is often cited in popular discourse and to our knowledge is the only type of attack
+against a grid that has ever been carried out in practice at scale. Despite their severity, these attacks do not pose a
+strictly \emph{scientific} challenge, though since these attacks are generic to any industrial control system. Their
+causes and countermeasures are generally well-understood and the hardest challenge in their prevention is likely to lie
+in budgetary constraints.
+
+Beyond the centralized control systems, the next target for an attacker may be the communication links between those
+control systems and other smart grid components. While in older systems as well as the last mile to households' smart
+meters special-purpose systems such as PLC are still common, in the overall system IP-based technologies have
+proliferated much like they did in other industries. Along with this adoption of IP-based communication links comes the
+ability to apply generic network security measures from the IP world to the smart grid domain. In this way, a
+standardized, IP-based protocol stack unlocks decades of network security improvements at little cost. 
+
+Finally, an attacker might target the endpoint device itself. Smart meters are deployed at a large scale
+%%% FIXME << HERE WIP >>
+
+\subsection{Cyberphysical threats in the smart grid}
+
+If we model the smart grid as a control system responding to changes in inputs by regulating outputs, on a
+very high level we can see two general categories of attacks: Attacks that directly change the state of the outputs, and
+attacks that try to influence the outputs indirectly by changing the system's view of its inputs. The former would be an
+attack such as shutting down a power plant to decrease generation capacity\cite{lee01}. The latter would be an attack
+such as forging grid frequency measurements where they enter a power plant's control systems to provoke the control
+systems to
+oscillate\cite{kosut01,wu01,kim01}.
+
+
+\paragraph{Control function exploits.}
+Control function exploits are attacks on the mathematical control loops used by the centralized control system. One
+example of this type of attack are resonance attacks as described in \cite{wu01}.  In this kind of attack, inputs from
+peripheral sensors indicating grid load to the centralized control system are carefully modified to cause a
+disproportionately large oscillation in control system action. This type of attack relies on complex resonance effects
+that arise when mechanical generators are electrically coupled. These resonances, colloquially called ``modes'', are
+well-studied in power system engineering\cite{rogers01,grebe01,entsoe01,crastan03}.  Even disregarding modern attack
+scenarios, for stability electrical grids are designed with measures in place to dampen any resonances inherent to grid
+structure. These resonances are hard to analyze since they require an accurate grid model and they are unlikely to be
+noticed under normal operating conditions.
+
+Mitigation of these attacks can be achieved by ensuring unmodified sensor inputs to the control systems in the first
+place. Carefully designing control systems not to exhibit exploitable behavior such as oscillations is also possible but
+harder.
+
+\paragraph{Endpoint exploits.}
+The one to us rather interesting attack on smart grid systems is someone exploiting the grid's endpoint devices such as
+smart electricity meters.  These meters are deployed on a massive scale, with at least one meter per household on
+average\footnote{Households rarely share a meter but some households may have a separate meter for detached properties
+such as a detached garage or basement.}.  Once compromised, restoration to an uncompromised state can be difficult if it
+requires physical access to thousands of devices in hard-to-access locations.
+
+By compromising smart electricity meters, an attacker can forge the distributed energy measurements these devices
+perform. In a best-case scenario, this might only affect billing and lead to customers being under- or over-charged if
+the attack is not noticed in time. In a less ideal scenario falsified energy measurements reported by these devices
+could impede the correct operation of centralized control systems.
+
+In some countries such as the UK smart meters have one additional function that is highly useful to an attacker: They
+contain high-current disconnect switches to disconnect the entire household or business in case electricity bills are
+left unpaid for a certain period. In countries that use these kinds of systems on a widespread level, the load
+disconnect switch is controlled by the smart meter's central microcontroller. This allows anyone compromising this
+microcontroller's firmware to actuate the disconnect switch at will.  Given control over a large number of
+network-connected smart meters, an attacker might thus be able to cause large-scale disruptions of power
+consumption\cite{anderson01,temple01}.  Combined with an attack method such as the resonance attack from \cite{wu01}
+that was mentioned above, this scenario poses a serious threat to grid stability.
+
+In places where Demand-Side Management (DSM) is common this functionality may be abused in a similar way. In DSM the
+smart metering system directly controls power to certain devices such as heaters. The utility can remotely control the
+turn-on and turn-off of these devices to smoothen out the load curve. In exchange the customer is billed a lower price
+for the energy consumed by these loads. DSM was traditionally done in a federated fashion usually through low-frequency
+PLC over the distribution grid\cite{dzung01}. Smart metering systems no longer require large, resource-intensive
+transmitters in substations and bear the potential for a rollout of such technology on a much wider scale than before.
+This leads to a potentially significant role of DSM systems in the impact calculation of an attack on a smart metering
+system. DSM does not control as much load capacity as remote disconnect switches do but the attacks cited in the above
+paragraph still fundamentally apply.
+
+\subsection{Communication Channels on the Grid}
+
+There is a number of well-established technologies for communication on or along power lines. We can distinguish three
+basic system categories: Systems using separate wires (such as DSL over landline telephone wiring), wireless radio
+systems (such as LTE) and \emph{power line communication} (PLC) systems that reüse the existing mains wiring and
+superimpose data transmissions onto the 50 Hz mains sine\cite{gungor01,kabalci01}.
+
+For our scenario, we will ignore short-range communication systems. There exists a large number of \emph{wideband}
+power line communication systems that are popular with consumers for bridging Ethernet segments between parts of an
+apartment or house.  These systems transmit up to several hundred megabits per second over distances up to several tens
+of meters\cite{kabalci01}.  Technologically, these wideband PLC systems are very different from \emph{narrowband}
+systems used by utilities for load management among other applications and they are not relevant to our analysis.
+
+\paragraph{Power line communication (PLC).}
+In long-distance communications for applications such as load management, PLC systems are attractive since they allow
+re-using the existing wiring infrastructure and have been used as early as in the 1930s\cite{hovi01}. Narrowband PLC
+systems are a potentially low-cost solution to the problem of transmitting data at small bandwidth over distances of
+several hundred meters up to tens of kilometers.
+
+Narrowband PLC systems transmit on the order of Kilobits per second or slower.  A common use of this sort of system are
+\emph{ripple control} systems. These systems superimpose a low-frequency signal at some few hundred Hertz carrier
+frequency on top of the 50Hz mains sine. This low-frequency signal is used to encode switching commands for
+non-essential residential or industrial loads. Ripple control systems provide utilities with the ability to actively
+control demand while promising savings in electricity cost to consumers\cite{dzung01}.
+
+In any PLC system there is a strict trade-off between bandwidth, power and distance. Higher bandwidth requires higher
+power and reduces maximum transmission distance. Where ripple control systems usually use few transmitters to cover
+the entire grid of a regional distribution utility, higher bandwidth bidirectional systems used for automatic meter
+reading (AMR) in places such as Italy or France require repeaters within a few hundred meters of a transmitter.
+
+\subsubsection{Landline and wireless IP-based systems.}
+Especially in automated meter reading (AMR) infrastructure the cost-benefit trade-off of power line systems does not
+always work out for utilities. A common alternative in these systems is to use the public internet for communication.
+Using the public internet has the advantage of low initial investment on the part of the utility company as well as
+quick commissioning. Disadvantages compared to a PLC system are potentially higher operational costs due to recurring
+fees to network providers as well as lower reliability. Being integrated into power grid infrastructure, a PLC system's
+failure modes are highly correlated with the overall grid. Put briefly, if the PLC interface is down, there is a good
+chance that power is out, too. In contrast general internet services exhibit a multitude of failures that are entirely
+uncorrelated to power grid stability.  For purposes such as meter reading for billing purposes, this stability is
+sufficient. However for systems that need to hold up in crisis situations such as the recovery system we are
+contemplating in this thesis, the public internet may not provide sufficient reliability.
+
+\subsubsection{Short-range wireless systems.}
+Smart meters contain copious amounts of firmware but still pale in comparison to the complexity of full-scale computers
+such as smartphones. For short-range communication between a meter and a cellular radio gateway mounted nearby or
+between a meter and a meter reading operator in a vehicle on the street a protocol such as Wifi (IEEE 802.11) is too
+complex. Absent widely-used standards in this space proprietary radio protocols grew attractive. These are often based
+on some standardized lower-level protocol such as ZigBee (IEEE 802.15) but entirely home-grown ones also exist. To the
+meter manufacturer a proprietary radio protocol has several advantages. It is easy to implement and requires no external
+certification. It can be customized to its specific application. In addition it provides vendor lock-in to customers
+sharing infrastructure such as a cellular radio gateway between multiple devices.  In other fields a lack of
+standardization has led to a proliferation of proprietary protocols and a fragmented protocol landscape. This is a large
+problem since the consumer cannot easily integrate products made by different manufacturers into one system. In advanced
+metering infrastructure this is unlikely to be a disadvantage since usually there is only one distribution grid
+operator for an area.  Shared resources such as a cellular radio gateway would most likely only be shared within a
+single building and usually they are all operated by the same provider.
+
+Systems in Europe commonly support Wireless M-Bus, an European standardized protocol\cite{silabs01} that operates on
+several ISM bands\footnote{
+    Frequency bands that can be used for \emph{Industrial, Scientific and Medical} applications by anyone and that do
+    not require obtaining a license for transmitter operation. Manufacturers can use whatever protocol they like on
+    these bands as long as they obtain certification that their transmitters obey certain spectral and power
+    limitations.
+}. ZigBee is another popular standard and some vendors additionally support their own proprietary protcols\footnote{
+    For an example see \cite{honeywell01}.
+}.
+
+\section{Grid Frequency as a Communication Channel}
+
+Despite the awesome complexity of large power grids the physics underlying their response to changes in load and
+generation is surprisingly simple. Individual machines (loads and generators) can be approximated by a small number of
+differential equations and the entire grid can be modelled by aggregating these approximations into a large system of
+nonlinear differential equations. Evaluating these systems it has been found that in large power grids small signal
+steady state changes in generation/consumption power balance cause an approximately linear change in
+frequency\cite{kundur01,crastan03,entsoe02,entsoe04}. \emph{Small signal} here describes changes in power balance that
+are small compared to overall grid power.  \emph{Steady state} describes changes over a time frame of multiple waveform
+cycles as opposed to transient events that only last a few milliseconds.
+
+This approximately linear relationship allows the specification of a coefficient with unit \si{\watt\per\hertz} linking
+power differential $\Delta P$ and frequency differential $\Delta f$.  In this thesis we are using the European power
+grid as our model system. We are using data provided by ENTSO-E (formerly UCTE), the governing association of European
+transmission system operators. In our calculations we use data for the continental European synchronous area, the
+largest synchronous area. $\frac{\Delta P}{\Delta f}$, called \emph{Overall Network Power Frequency Characteristic} by
+ENTSO-E is around \SI{25}{\giga\watt\per\hertz}.
+
+We can derive general design parameter for any system utilizing grid frequency as a communication channel from the
+policies of ENTSO-E\cite{entsoe02,entsoe03}.  Any such system should stay below a modulation amplitude of
+\SI{100}{\milli\hertz} which is the threshold defined in the ENTSO-E incidents classification scale for a Scale 0-1
+(from ``Anomaly'' to ``Noteworthy Incident'' scale) frequency degradation incident\cite{entsoe02} in the continental
+Europe synchronous area.
+% FIXME resolve cut --- 
+
+Grid frequency in Europe's synchronous areas is nominally 50 Hertz, but there are small load-dependent variations from
+this nominal value. Any device connected to the power grid (or even just within physical proximity of power wiring) can
+reliably and accurately measure grid frequency at low hardware overhead. By intentionally modifying grid frequency, we
+can create a very low-bandwidth broadcast communication channel. Grid frequency modulation has only ever been proposed
+as a communication channel at very small scales in microgrids before\cite{urtasun01} and to our knowledge has not yet
+been considered for large-scale application.
+
+Advantages of using grid frequency for communication are low receiver hardware complexity as well as the fact that a
+single transmitter can cover an entire synchronous area. Though the transmitter has to be very large and powerful the
+setup of a single large transmitter faces lower bureaucratic hurdles than integration of hundreds of smaller ones into
+hundreds of local systems that each have autonomous governance.
+
+% FIXME resolve cut ---
+\subsection{Interference from Frequency-Coupled Control Systems}
+
+The ENTSO-E Operations Handbook Policy 1 chapter\cite{entsoe02} defines the activation threshold of primary control to
+be \SI{20}{\milli\hertz}. Ideally, a modulation system would stay well below this threshold to avoid fighting the
+primary control reserve. Modulation line rate should likely be on the order of a few hundred Millibaud.  Modulation at
+these rates would outpace primary control action which is specified by ENTSO-E as acting within between ``a few
+seconds'' and \SI{15}{\second}.
+
+Keeping modulation amplitude below this threshold would help to avoid spuriously triggering these control functions.
+The effective \emph{Network Power Frequency Characteristic} of primary control in the European grid is reported by
+ENTSO-E at around \SI{20}{\giga\watt\per\hertz}.   This works out to an upper bound on modulation power of
+\SI{20}{\mega\watt\per\milli\hertz}.
+
+
+\subsection{Transmission Grid Fundamentals for Computer Scientists}
+\subsection{Determining Grid Frequency}
+
+% FIXME resolve cut ---
+In commercial power systems Phasor Measurement Units (PMUs, also called \emph{synchrophasors}) are used to precisely
+measure parameters of the mains voltage waveform, one of which is grid frequency. PMUs are used as part of SCADA systems
+controlling transmission networks to characterize the operational state of the network.
+
+From a superficial viewpoint measuring grid frequency might seem like a simple problem. Take the mains voltage waveform,
+measure time between two rising-edge (or falling-edge) zero-crossings and take the inverse $f = t^{-1}$. In practice,
+phasor measurement units are significantly more complex than this. This discrepancy is due to the combination of both
+high precision and quick response that is demanded from these units. High precision is necessary since variations of
+mains frequency under normal operating conditions are quite small--in the range of \SIrange{5}{10}{\milli\hertz} over
+short intervals of time. Relative to the nominal \SI{50}{\hertz} this is a derivation of less than \SI{100}{ppm}.
+Relative to the corresponding period of \SI{20}{\milli\second} this means a time derivation of about $2 \mu\text{s}$
+from cycle to cycle. From this it is already obvious why a simplistic measurement cannot yield the required precision
+for manageable averaging times: We would need either an ADC sampling rate in the order of megabits per second or for a
+reconstruction through interpolated readings an impractically high ADC resolution.
+
+Detail on the inner workings of commercial phasor measurement units is scarce but given their essential role to SCADA
+systems there is a large amount of academic research on such algorithms\cite{narduzzi01,derviskadic01,belega01}. A
+popular approach to these systems is to perform a Short-Time Fourier Transform (STFT) on ADC data sampled at high
+sampling rate (e.g. \SI{10}{\kilo\hertz}) and then perform analysis on the frequency-domain data to precisely locate the
+peak at \SI{50}{\hertz}. A key observation here is that FFT bin size is going to be much larger than required frequency
+resolution. This fundamental limitation follows from the Nyquist criterion\cite{shannon01}
+and if we had to process an \emph{arbitrary} signal this would severely limit our practical measurement accuracy
+\footnote{
+    Some software packages providing FFT or STFT primitives such as scipy\cite{virtanen01} allow the user to
+    super-sample FFT output by specifying an FFT width larger than input data length, padding the input data with zeros
+    on both sides. Note that in line with the Nyquist theorem this \emph{does not} actually provide finer output
+    resolution but instead just amounts to an interpolation between output bins. Depending on the downstream analysis
+    algorithm it may still be sensible to use this property of the DFT for interpolation, but in general it will be
+    computationally expensive compared to other interpolation methods and in any case it will not yield any better
+    frequency resolution aside from a potential numerical advantage\cite{gasior02}.
+}.
+For this reason all approaches to grid frequency estimation are based on a model of the voltage waveform.  Nominally
+this waveform is a perfect sine at $f=\SI{50}{\hertz}$. In practice it is a sine at $f\approx\SI{50}{\hertz}$
+superimposed with some aperiodic noise (e.g. irregular spikes from inductive loads being energized) as well as harmonic
+distortion that is caused by topologically nearby devices with power factor $\cos \theta \neq 1.0$. Under a continuous
+fourier transform over a long period the frequency spectrum of a signal distorted like this will be a low noise floor
+depending mainly on aperiodic noise on which a comb of harmonics as well as some sub-harmonics of $f \approx
+f_\text{nom} = \SI{50}{\hertz}$ is riding. The main peak at $f \approx f_\text{nom}$ will be very strong with the
+harmonics being approximately an order of magnitude weaker in energy and the noise floor being at least another order of
+magnitude weaker. See Figure \ref{mains_voltage_spectrum} for a measured spectrum.  This domain knowledge about the
+expected frequency spectrum of the signal can be employed in a number of interpolation techniques to reconstruct the
+precise frequency of the spectrum's main component despite distortions and the comparatively coarse STFT resolution.
+
+Published grid frequency estimation algorithms such as \cite{narduzzi01,derviskadic01} are rather sophisticated and use
+a combination of techniques to reduce numerical errors in FFT calculation and peak fitting. Given that we do not need
+reference standard-grade accuracy for our application we chose to start with a very basic algorithm instead. We chose to
+use a general approach to estimate the precise fundamental frequency of an arbitrary signal that was published by
+experimental physicists Gasior and Gonzalez at CERN\cite{gasior01}. This approach assumes a general sinusoidal signal
+superimposed with harmonics and broadband noise.  Applicable to a wide spectrum of practical signal analysis tasks it is
+a reasonable first-degree approximation of the much more sophisticated estimation algorithms developed specifically for
+power systems.  Some algorithms use components such as kalman filters\cite{narduzzi01} that require a physical model.
+As a general algorithm \cite{gasior01} does not require this kind of application-specific tuning, eliminating one source
+of error.
+
+The Gasior and Gonzalez algorithm\cite{gasior01} passes the windowed input signal through a DFT, then interpolates the
+signal's fundamental frequency by fitting a wavelet such as a Gaussian to the largest peak in the DFT results. The bias
+parameter of this curve fit is an accurate estimation of the signal's fundamental frequency. This algorithm is similar
+to the simpler interpolated DFT algorithm used as a reference in much of the synchrophasor estimation
+literature\cite{borkowski01}. The three-term variant of the maximum side lobe decay window often used there is a
+Blackman window with parameter $\alpha = \frac{1}{4}$. Analysis has shown\cite{belega01} that the interpolated DFT
+algorithm is worse than algorithms involving more complex models under some conditions but that there is \emph{no free
+lunch} meaning that more complex perform worse when the input signal deviates from their models.
+% FIXME resolve cut ---
+
+\subsubsection{Our Algorithm}
+\subsubsection{Our Hardware}
+
+\section{Characteristics of Grid Frequency}
+
+\section{Grid Frequency Modulation}
+\subsection{Fundamental Physics}
+\subsection{Transmitter Implementation}
+
+% FIXME resolve cut ---
+In its most basic form a transmitter for grid frequency modulation would be a very large controllable load connected to
+the power grid at a suitable vantage point. A spool of wire submerged in a body of cooling liquid such as a small lake
+along with a thyristor rectifier bank would likely suffice to perform this function during occasional cybersecurity
+incidents.  We can however decrease hardware and maintenance investment even further compared to this rather
+uncultivated solution by repurposing regular large industrial loads as transmitters in an emergency situation.  For some
+preliminary exploration we went through a list of energy-intensive industries in Europe\cite{ec01}.  The most
+electricity-intensive industries in this list are primary aluminum and steel production.  In primary production raw ore
+is converted into raw metal for further refinement such as casting, rolling or extrusion.  In steelmaking iron is
+smolten in an electric arc furnace. In aluminum smelting aluminum is electrolytically extracted from alumina. Both
+processes involve large amounts of electricity with electricity making up \SI{40}{\percent} of production costs. Given
+these circumstances a steel mill or aluminum smelter would be good candidates as transmitters in a grid frequency
+modulation system.
+
+In aluminum smelting high-voltage mains is transformed, rectified and fed into about 100 series-connected electrolytic
+cells forming a \emph{potline}. Inside these pots alumina is dissolved in molten cryolite electrolyte at about
+\SI{1000}{\degreeCelsius} and electrolysis is performed using a current of tens or hundreds of Kiloampère. The resulting
+pure aluminum settles at the bottom of the cell and is tapped off for further processing.
+
+Like steelworks, aluminum smelters are operated night and day without interruption. Aside from metallurgical issues the
+large thermal mass and enormous heating power requirements do not permit power cycling. Due to the high costs of
+production inefficiencies or interruptions the behavior of aluminum smelters under power outages is a
+well-characterized phenomenon in the industry. The recent move away from nuclear power and towards renewable energy has
+lead to an increase in fluctuations of electricity price throughout the day. These electricity price fluctuations have
+provided enough economic incentive to aluminum smelters to develop techniques to modulate smelter power consumption
+without affecting cell lifetime or product quality\cite{duessel01,eisma01}. Power outages of tens of minutes up to two
+hours reportedly do not cause problems in aluminum potlines and are in fact part of routine operation for purposes such
+as electrode changes\cite{eisma01,oye01}.
+
+The power supply system of an aluminum plant is managed through a highly-integrated control system as keeping all cells
+of a potline under optimal operating conditions is challenging. Modern power supply systems employ large banks of diodes
+or SCRs\footnote{SCRs, also called thyristors, are electronic devices that are often used in high-power switching
+applications. They are normally-off devices that act like diodes when a current is fed into their control terminal.} to
+rectify low-voltage AC to DC to be fed into the potline\cite{ayoub01}. The potline voltage can be controlled almost
+continuously through a combination of a tap changer and a transductor. The individual cell voltages can be controlled by
+changing the anode to cathode distance (ACD) by physically lowering or raising the anode.  The potline power supply is
+connected to the high voltage input and to the potline through isolators and breakers.
+
+In an aluminum smelter most of the power is sunk into resistive losses and the electrolysis process. As such an
+aluminum smelter does not have any significant electromechanical inertia compared to the large rotating machines used
+in other industries. Depending on the capabilities of the rectifier controls high slew rates are possible, permitting
+modulation at high\footnote{Aluminum smelter rectifiers are \emph{pulse rectifiers}. This means instead of simply
+rectifying the incoming three-phase voltage they use a special configuration of transformer secondaries and in some
+cases additional coils to produce a large number of equally spaced phases (e.g.\ six) from a standard three-phase input.
+Where a direct-connected three-phase rectifier would draw current in six pulses per mains voltage cycle a pulse
+rectifier draws current in more, smaller pulses to increase power factor. For example a 12-pulse rectifier will draw
+current in 12 pulses per cycle. In the best case an SCR pulse rectifier switched at zero crossing should allow
+\SIrange{0}{100}{\percent} load changes from one rectifier pulse to the next, i.e. within a fraction of a single cycle.}
+data rates.
+% FIXME resolve cut ---
+
+\subsection{Parametrizing DSSS Modulation for GFM}
+
+% FIXME resolve cut/write intro ---
+\begin{description}
+    \item[Modulation amplitude.] Amplitude is proportionally related to modulation power. In a practical setup we might
+        realize a modulation power up to a few hundred \si{\mega\watt} which would yield a few tens of \si{\milli\hertz}
+        of frequency amplitude.
+    \item[Modulation preemphasis and slew-rate control.] Preemphasis might be necessary to ensure an adequate
+        Signal-to-Noise ratio (SNR) at the receiver. Slew-rate control and other shaping measures might be necessary to
+        reduce the impact of these sudden load changes on the transmitter's primary function (say, aluminum smelting)
+        and to prevent disturbances to other grid components.
+    \item[Modulation frequency.] For a practical implementation a careful study would be necessary to determine the
+        optimal frequency band for operation. On one hand we need to prevent disturbances to the grid such as the
+        excitation of local or inter-area modes. On the other hand we need to optimize Signal-to-Noise ratio (SNR)
+        and data rate to achieve optimal latency between transmission start and reset completion and to reduce the
+        overall burden on both transmitter and grid.
+    \item[Further modulation parameters.] The modulation itself has numerous parameters that are discussed in Section
+        \ref{mod_params} below.
+\end{description}
+
+% FIXME resolve cut/write intro ---
+% FIXME too many enumerations?
+In this section we will explore how we can construct a reliable communication channel from the analog primitive we
+have outlined in the previous section. Our load control approach to grid frequency modulation leads to a channel with the
+following properties.
+
+\begin{description}
+    \item[Slow-changing.] Accurate grid frequency measurements take several periods of the mains sine wave. Faster
+        sampling rates can be achieved with more complex specialized synchrophasor estimation algorithms but this will
+        result in a trade-off between sampling rate and accuracy\cite{belega01}.
+    \item[Analog.] Grid frequency is an analog signal.
+    \item[Noisy.] While stable over long periods of time thanks to power stations' Load-Frequency Control
+        systems\cite{entsoe04} there are considerable random short-term variations. Our modulation amplitude is limited
+        by technical and economic constraints so we have to find a system that will work at poor SNRs.
+    \item[Polarized.] Grid frequency measurements have an inherent sense of polarity that we can use in our modulation
+        scheme.
+\end{description}
+
+% FIXME resolve cut ---
+Modern power systems are complex electromechanical systems. Each component is controlled by several carefully tuned
+feedback loops to ensure voltage, load and frequency regulation. Multiple components are coupled through transmission
+lines that themselves exhibit complex dynamic behavior. The overall system is generally stable, but may exhbit
+instabilities to particular small-signal stimuli\cite{kundur01,crastan03}. These instabilities, called \emph{modes},
+occur when due to mis-tuning of parameters or physical constraints the overall system exhibits oscillation at a
+particular frequency.  \cite{kundur01} separates these modes into four categories:
+
+\begin{description}
+    \item[Local modes] where a single power station oscillates in some parameter,
+    \item[Interarea modes] where subsections of the overall grid oscillate with respect to each other due to weak
+        coupling between them,
+    \item[Control modes] caused by imperfectly tuned control systems and
+    \item[Torsional modes] that originate from electromechanical oscillations in the generator itself.
+\end{description}
+
+The oscillation frequencies associated with each of these modes are usually between a few tens of Millihertz and a few
+Hertz\cite{grebe01,entsoe01,crastan03}. It is hard to predict the particular modes of a power system at the scale of the
+central European interconnected system. Theoretical analysis and simulation may give rough indications but cannot yield
+conclusive results. Due to the obvious danger as well as high economical impact due to inefficiencies experimental
+measurements are infeasible. Modes are highly dependent on the power grid's structure and will change with changes in
+the power grid over time. For all of these reasons, a grid frequency modulation system must be designed very
+conservatively without relying on the absence (or presence) of modes at particular frequencies. A concrete design
+guideline that we can derive from this situation is that the frequency spectrum of any grid frequency modulation system
+should not exhibit large peaks and should avoid a concentration of spectral energy in small frequency bands.
+% FIXME resolve cut ---
+
+\subsection{Parametrizing a "Safety Reset" System Based on GFM}
+% FIXME resolve cut & write intro ---
+% FIXME cut down next 2 sections
+\subsubsection{Error-correcting codes}
+
+To reduce reception error rate we have to layer channel coding on top of the DSSS modulation. The messages we expect to
+transmit are at least a few tens of bits long. We are highly constrained in SNR due to limited transmission power and
+with lower SNR comes higher BER (Bit Error Rate). At a fixed BER, packet error rate grows exponentially with
+transmission length so for our relatively long transmissions we would realistically get unacceptable error rates.
+
+Error correcting codes are a very broad field with many options for specialization. Since we are implementing only an
+advanced prototype in this thesis we chose to spend only limited resources on optimization and settled on a basic
+Reed-Solomon code. We have no doubt that applying a more state-of-the-art code we could gain further improvements in
+code overhead and decoding speed among others\cite{mackay01}. Since message length in our system limits system response
+time but we do not have a fixed target we can tolerate some degree of overhead.  Decoding speed is of very low concern
+to us because our data rate is extremely low.  We derived our implementation by adapting and optimizing an existing open
+source decoder that we validated on an open source encoder implementation. We generate test signals using a Python tool
+on the host.
+
+\subsubsection{Cryptographic security}
+\label{sec-crypto}
+Above the communication base layer elaborated in the previous section we have to layer a cryptographic protocol to
+ensure system security. We want to avoid a case where a third party could interfere with our system or even subvert this
+safety system itself for an attack.  From a protocol security perspective the system we are looking for can informally
+be modelled as consisting of three parties: the trusted \emph{transmitter}, one of a large number of untrusted
+\emph{receivers}, and an \emph{attacker}.  These three play according to the following rules:
+
+\begin{description}
+    \item[Access.] Both transmitter and attacker can transmit any bit sequence.
+    \item[Indistinguishability.] The receiver receives any transmission by either but cannot distinguish between them.
+    \item[Kerckhoff's principle.] Since the protocol design is public and anyone can get access to an electricity meter
+        the attacker knows anything any receiver might know\cite{kerckhoff01,kerckhoff02}.
+    \item[Priority.] The transmitter is stronger than an attacker and will ``win'' during simultaneous transmission.
+    \item[Seeding.] Both transmitter and receiver can be seeded out-of-band with some information on each other such as
+        public key fingerprints.
+\end{description}
+
+We are not considering situations where an attacker attempts to jam an ongoing transmission. In practice there are
+several avenues to prevent such attempts. Compromised large loads that are being abused by the attacker can be manually
+disconnected by the utility. Error-correcting codes can be used to provide resiliency against small-scale disturbances.
+Finally, the transmitter can be designed to have high enough power to be able to override any likely attacker.
+
+With the above properties in mind our goal is to find a cryptographic primitive that has the following properties:
+\begin{description}
+    \item[Authentication.] The transmitter can produce a message bit sequence that a certain subset of receivers can
+        identify as being generated by the transmitter. On reception of this sequence, all addressed receivers perform a
+        safety reset.
+    \item[Unforgeability.] The attacker cannot forge a message, i.e.\ find a bit sequence other than one of the
+        transmitter's previous messages that a receiver would accept. This implies that the attacker also cannot create
+        a new distinct message from a previously transmitted message.
+    \item[Brevity.] The message should be short. Our communication channel is outrageously slow compared to anything
+        else used in modern telecommunications and every bit counts.
+\end{description}
+
+On a protocol level we also have to ensure \emph{idempotence}. Our system should have an at-most-once semantic. This
+means for a given message each receiver either performs exactly one safety reset or none at all, even if the message is
+re-transmitted by either the transmitter or an attacker.  We cannot achieve the ideal exactly-once semantic wit pure
+protocol gymnastics since we are using an unidirectional lossy communication primitive. A receiver might be offline
+(e.g.\ due to a local power outage) and then would not hear the transmission even if our broadcast primitive was
+reliable. Since there is no back channel, the transmitter has no way of telling when that happens.  The practical impact
+of this can be mitigated by the transmitter repeating the message a number of times.
+
+It follows from the unforgeability requirement that we can trivially reach idempotence at the protocol level by keeping
+a database of all previous messages and only accepting new messages. By considering this in our cryptographic design we
+can reduce the storage overhead of this ``database''.
+
+Along with the indistinguishability property the access requirement implies that we need a cryptographic
+signature\cite{lamport01}.  However, we have relaxed constraints on this signature compared to standard cryptographic
+practice\cite{anderson04}.  While cryptographic signatures need to work over arbitrary inputs, all we want to ``sign''
+here is the instruction to perform a safety reset. This is the only message we might ever want to transmit so our
+message space has only one element. The information content of our message thus is 0 bit! All the information we want to
+transmit is already encoded \emph{in the fact that we are transmitting} and we do not require a further payload to be
+transmitted: We can omit the entirety of the message and just transmit whatever ``signature'' we
+produce\cite{haller01,rfc1760}. This is useful to conserve transmission bits so our transmission does not take an
+exceedingly long time over our extremely slow communication channel.
+
+We can modify this construction to allow for a small number of bits of information content in our message (say two or
+three instead of zero) at no transmission overhead by transmitting the cryptographic signature as usual but simply
+omitting the message. The message contains only a few bits of information and we are dealing with minutes of
+transmission time so the receiver can reconstruct the message through brute-force. Though this trade-off between
+computation and data transmission might seem inelegant it does work for our extremely slow link for up to a few bits of
+information.
+
+There is an important limitation in the rules of our setup above: The attacker can always record the reset bit sequence
+the transmitter transmits and replay that same sequence later. Even without cryptography we can trivially prevent an
+attacker from violating the at-most-once criterion. If every receiver memorizes all bit sequences that have been
+transmitted so far it can detect replays. With this mitigation by replaying an older authentic transmission an attacker
+can cause receivers that were offline during the original transmission to reset at a later point. Considering our goal
+is to reset them in the first place this should not pose a threat to the system's safety or security.
+
+A possible scenario would be that an attacker first causes enough havoc for authorities to trigger a safety reset. The
+attacker would record the trigger transmission. We can assume most meters were reset during the attack. Due to this the
+attacker cannot cause a significant number of additional resets immediately afterwards.  However, the attacker could
+wait several years for a number of new meters to be installed that might not yet have updated firmware that includes the
+last transmission. This means the attacker could cause them to reset by replaying the original sequence.
+
+A possible mitigation for this risk would be to introduce one bit of information into the trigger message that is
+ignored by the replay protection mechanism.  This \emph{enable} bit would be $1$ for the actual reset trigger message.
+After the attack the transmitter would then perform scheduled transmissions of a ``disarm'' message that has this bit
+set to $0$. This message informs all new meters and meters that were offline during the original transmission of the
+original transmission for replay protection without actually performing any further resets.
+
+We could use any of several traditional asymmetric cryptographic primitives to produce these signatures. The
+comparatively high computational effort required for signature verification would not be an issue. Transmissions take
+several minutes anyway and we can afford to spend some tens of seconds even in signature verification. Transmission
+length and by proxy system latency would be determined by the length of the signature. For RSA signature length is the
+modulus length (i.e. larger than \SI{1000}{bit} for very basic contemporary security). For elliptic curve-based systems
+curve length is approximately twice the security level and signature size is twice the curve length because two curve
+points need to be encoded\cite{anderson02}.  For contemporary security this results in more than 300 bit transmission
+length. We can exploit our unique setting's low message entropy to improve on this by basing our scheme on a
+cryptographic hash function used as a one-way pseudo-random function (PRF). Hash-based signature schemes date back to
+the very beginnings of cryptographic signatures\cite{anderson04,diffie01,lamport02}. Today, in general applications
+schemes based on asymmetric cryptography are preferred but hash-based signature systems have their applications in
+certain use cases. One example of such a scheme is the TESLA scheme\cite{perrig01} that is the basis for navigation
+message authentication in the European Galileo global navigation satellite system. Here, a system based purely on
+asymmetric primitives would result in too much computation and communication overhead\cite{ec05}.  In the following
+sections we will introduce the foundations of hash-based signatures before deriving our authentication scheme.
+
+\subsubsection{Lamport signatures}
+
+1979, Lamport in \cite{lamport02} introduced a signature scheme that is based only on a one-way function such as a
+cryptographic hash function. The basic observation is that by choosing a random secret input to a one-way function and
+publishing the output, one can later prove knowledge of the input simply by publishing it. In the following paragraphs
+we will describe a construction of a one-time signature scheme based on this observation. The scheme we describe is the
+one usually called a ``Lamport Signature'' in modern literature but is slightly different from the variant described in
+the 1979 paper. For our purposes we can consider both to be equivalent.
+
+\paragraph{Setup.} In a Lamport signature, for an n-bit hash function $H$ the signer generates a private key $s =
+\left(s_{b, i} | b\in\left\{0, 1\right\}, 0\le i<n\right)$ of $2n$ random strings of length $n$. The signer publishes a
+public key $p = \left(p_{b, i} = H\left(s_{b, i}\right), b\in\left\{0, 1\right\}, 0\le i<n\right)$ that is simply the
+list of hashes of each of the random strings that make up the private key.
+
+\paragraph{Signing.} To sign a message $m$, the signer publishes the signature $\sigma = \left(\sigma_i = k_{H(m)_i,
+i}\right)$ where $H(m)_i$ is the $i$-th bit of $H$ applied to $m$. That is, for the $i$-th bit of the message's hash
+$H(m)$ the signer publishes either of $p_{0, i}$ or $p_{1, i}$ depending on the hash bit's value, keeping the other
+entry of $P$ secret.
+
+\paragraph{Verification.} The verifier can compute $H(m)$ themselves and check the corresponding entries $\sigma_i =
+k_{H(m)_i}$ of $S$ correctly evaluate to $p_{b, i} = H\left(s_{b, i}\right)$ from $P$ under $H$.
+
+The above scheme is a one-time signature scheme only. After one signature has been published for a given key, the
+corresponding key must not be reüsed for other signatures. This is intuitively clear as we are effectively publishing
+part of the private key as the signature, and if we were to publish a signature for another message an attacker could
+derive additional signatures by ``mixing'' the two published signatures.
+
+\subsubsection{Winternitz signatures}
+
+An improvement to basic Lamport signatures as described above are Winternitz signatures as detailed in
+\cite{merkle01,dods01}. Winternitz signatures reduce public key length as well as signature length for hash length $n$
+from $2n$ to $\mathcal O \left(n/t\right)$ for some choice of parameter $t$ (usually a small number such as 4).
+
+\paragraph{Setup.} The signer generates a private key $s = \left(s_i\right)$ consisting of $\lceil\frac{n}{t}\rceil$ random
+bit strings. The signer publishes a public key $p = \left(H^{2^t}\left(s_i\right)\right)$ where each element
+$H^{2^t}\left(s_i\right)$ is the $2^t$-fold recursive application of $H$ to $s_i$.
+
+\paragraph{Signing.} The signer splits $m$ padded to a multiple of $t$ bits into $\lceil\frac{n}{t}\rceil$ chunks $m_i$ of
+$t$ bit each. The signer publishes the signature $\sigma = \left( \sigma_i = H^{m_i}\left(s_i\right) \right)$.
+
+\paragraph{Verification.} The verifier can calculate for each $\sigma_i = H^{m_i}\left(s_i\right)$ that $H^{2^t -
+m_i}\left(\sigma_i\right) = H^{2^t - m_i}\left(H^{m_i}\left(s_i\right)\right) = H^{2^t - m_i + m_i} \left(s_i\right) =
+p_i$.
+
+To prevent an attacker from forging additional signatures from one signature by calculating $\sigma_i' =
+H\left(\sigma_i\right)$ matching $m_i' = m_i + 1$, this scheme is usually paired with a simple checksum as described in
+\cite{merkle01}.
+
+\subsubsection{Using hash-based signatures for trigger authentication}
+
+Applying these concepts the most basic trigger authentication scheme possible would be to simply generate a random
+secret key bit string $s$ and publish $p = H(s)$ for some hash function $H$. To activate the trigger, $\sigma = s$ is
+published and receivers verify that $H(\sigma) = p = H(s)$. This simplistic scheme has one main disadvantage: It is a
+fundamentally one-time construction. To prevent an attacker from re-triggering a receiver a second time by replaying a
+valid trigger $\sigma$ all receivers have to blacklist any ``used'' $\sigma$. Alas, this means we can only ever trigger
+a receiver \emph{once}.  The good part is that any receiver that missed this trigger can still be triggered later, but
+the bad part is that once $s$ is burned we are out of options. The trivial solution to this would be to simply provision
+each receiver with a whole list of public keys in advance. This however takes $n$ times the amount of space for $n$-fold
+retriggerability and for each one we have to memorize separately whether it has been used up. Luckily we can easily
+derive a scheme that yields $n$-fold retriggerability and naturally memorizes replay state while using no more space
+than the original scheme by taking some inspiration from Winternitz signatures.
+
+In this improved scheme the secret key $s$ is still a random bit string. The public key is $p = H^n(s)$ for $n$-times
+retriggerability.  The $i$-th time the trigger is activated, $\sigma_i = H^{n-i}(s)$ is published, and every receiver
+can verify that $\sigma_{i-1} = H\left(\sigma_i\right)$ with $\sigma_0 = p$. In case a receiver missed one or more
+previous triggers it continues computing $H\left(H\left(\sigma_i\right)\right)$ and
+$H\left(H\left(H\left(\sigma_i\right)\right)\right)$ and so on until either reaching the $n$-th recursion
+level--indicating an invalid signature--or finding $H^n\left(\sigma_i\right) = \sigma_j$ with $\sigma_j$ being the last
+signature this receiver recorded or $p$ in case there is none.
+
+This scheme provides replay protection since the receiver memorizes the last signature they acted on. Public key length
+is equal to the length of the hash function $H$ used. Even for our embedded systems use case $n$ can realistically be up
+to $\mathcal O\left(10^3\right)$, which is enough for our purposes. This use of a hash chain for event authentication is
+identical to the one in the S/KEY one-time password system\cite{anderson04,haller01,rfc1760}.
+% 1990ies crypto yeah!
+
+The ``disarm'' message we discussed above for replay protection can be integrated into this scheme by encoding the
+``enable'' bit into the least significant bit of $n$ in our $H^n$ construction. In the chain of valid signatures every
+second one would be a disarm signature: Reset and disarm signatures would alternate in this scheme. By skipping a disarm
+signature two resets can still be triggered directly after one another.
+
+In practice it may be useful to have some control over which meters reset. An attack exploiting a particular network
+protocol implementation flaw might only affect one series of meters made by one manufacturer. Resetting \emph{all}
+meters may be too much in this case. A simple solution for this is to define addressable subsets of meters.  ``All
+meters'' along with ``meters made by manufacturer $x$'' and ``meters of model $y$'' are good choices for such scopes. On
+the cryptographic level the protocol state is simply duplicated for each scope. This incurs memory and computation
+overhead linear in the number of scopes but device memory requirements are small at a few bytes only and computation is
+of no concern due to the very slow channel so this simple solution is adequate. The transmitter has to either store
+copies of all scope's keys or derive these keys from a root key using the scope's identifier. Keys are small and the
+transmitter would be using a regular server or hardware security module for key management so either easily feasible.
+
+A diagram of the key structure in this key management scheme is shown in Figure \ref{fig:sig_key_chain}. The
+transmitter key management is shown in Figure \ref{fig:tx_scope_key_illu}. This scheme is simplistic but suffices for
+our prototype in Section \ref{sec-prototype} and may even be useful in a practical implementation. During
+standardization of a safety reset system the key management system would most likely have to be customized to the
+particular application's requirements. Developing an universal solution is outside the scope of this work.
+
+
+% FIXME resolve cut ---
+
+\subsection{Simulation Results}
+
+\section{Discussion}
 \label{sec_conclusion} 

 \printbibliography[heading=bibintoc]