safety-reset/paper/safety-reset-paper.tex

\documentclass[runningheads]{llncs}
\usepackage[T1]{fontenc}
\usepackage[
    backend=biber,
    style=numeric,
    natbib=true,
    url=false,
    doi=true,
    eprint=false
    ]{biblatex}
\addbibresource{safety-reset.bib}
\usepackage{amssymb,amsmath}
\usepackage{eurosym}
\usepackage{wasysym}

\usepackage[binary-units]{siunitx}
\DeclareSIUnit{\baud}{Bd}
\DeclareSIUnit{\year}{a}
\usepackage{commath}
\usepackage{graphicx,color}
\usepackage{subcaption}
\usepackage{array}
\usepackage{hyperref}

\renewcommand{\floatpagefraction}{.8}
\newcommand{\degree}{\ensuremath{^\circ}}
\newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}
\newcommand{\partnum}[1]{\texttt{#1}}

\begin{document}

\title{Ripples in the Pond: Transmitting Information through Grid Frequency Modulation}
\titlerunning{Ripples in the Pond: Transmitting Information through Grid Frequency}
\author{Jan Sebastian Götte \and Liran Katzir \and Björn Scheuermann}
\institute{HIIG\\ \email{safetyreset@jaseg.de} \and Tel Aviv University\\Faculty of Engineering \and HU Berlin \\ \email{scheuermann@informatik.hu-berlin.de}}
% FIXME keywords
\maketitle
\keywords{Security, privacy and resilience in critical infrastructures \and Security and privacy in ``internet of
things'' \and Cyber-physical systems \and Hardware security \and Network Security \and Energy systems \and Signal theory}

\begin{abstract}
    The smart grid is a large, complex and interconnected technological system. With remotely controllable load switches
    having been rolled out at scale in some countries, a tiny flaw inside the firmware of one of these embedded devices
    may enable attacks to remotely trigger large-scale disruption with potentially catastrophic results. Attaining
    perfect security against such cyberphysical attacks is a monumental embedded engineering task---and observations do
    not indicate that current efforts meet the requirements of this task.%FIXME cite recent RECESSIM work

    In this paper, we approach the smart grid safety issue by implementing an emergency override that can be used to
    reset all connected devices to a known-good state and preempt subsequent compromise by cutting communication links.
    To yield a fully fail-safe design, our system does not rely on the internet or other conventional communication
    network to work. Instead, our system transmits error-corrected and cryptographically secured commands by modulating
    grid frequency using a single large consumer such as a large aluminium smelter. This approach differs from
    traditional Powerline Communication (PLC) systems in that reaches every device within the same synchronous area as
    the signal is embedded into the fundamental grid frequency instead of a superimposed voltage that is quickly
    attenuated across long distances.

    Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load would allow for the transmission
    of a crytographically secured \emph{reset} signal within $15$ minutes. We have produced a proof-of-concept prototype
    receiver that demonstrates the feasibility of decoding such signals even on resource-constrained microcontroller
    hardware.
\end{abstract}

\section{Introduction}

In the power grid, as in many other engineered systems, we can observe an ongoing diffusion of information systems into
the domain of industrial control. Automation of these control systems has already been practiced for the better part of a
century.  Throughout the 20th century this automation was mostly limited to core components of the grid. Generators in
power stations are computer-controlled according to electromechanical and economic models. Switching in substations is
automated to allow for fast failure recovery. Human operators are still vital to these systems, but their tasks have
shifted from pure operation to engineering, maintenance and surveillance\cite{crastan03,anderson02}.

With the turn of the century came a large-scale trend in power systems to move from a model of centralized generation,
built around massive large-scale fossil and nuclear power plants, towards a more heterogenous model of smaller-scale
generators working together. In this new model large-scale fossil power plants still serve a major role, but new
factors come into play. One such factor is the advance of renewable energies. The large-scale use of wind and solar power in
particular seems unavoidable for continued human life on this planet. For the electrical
grid these systems constitute a significant challenge. Fossil-fueled power plants can be controlled in a precise and
quick way to match energy consumption. This tracking of consumption with production is vital to the stability of the
grid. Renewable energies such as wind and solar power do not provide the same degree of controllability, and they
introduce a larger degree of uncertainty due to the unpredictability of the forces of nature\cite{crastan03}.

Along with this change in dynamic behavior, renewable energies have brought forth the advance of distributed generation.
In distributed generation end-customers that previously only consumed energy have started to feed energy into the grid
from small solar installations on their property. Distributed generation is a chance for customers to gain autonomy and
shift from a purely passive role to being active participants of the electricity market\cite{crastan03}.

% FIXME the following paragraph is weird.

To match this new landscape unpredictable renewable resources and of decentralized generation, the utility industry has
had to adapt itself in major ways. One aspect of this adaptation that is particularly visible to energy consumers is the
computerization of end-user energy metering. Despite the widespread use of industrial control systems inside the
electrical grid and the far-reaching diffusion of computers into people's everyday lives, the energy meter has long been
one of the last remnants of an offline, analog time. Until the 2010s many households were still served through
electromechanical Ferraris-style meters that have their origin in the late 19th
century\cite{borlase01,ukgov04,bnetza02}.  Today, under the umbrella term \emph{Smart Metering}, the shift towards fully
computerized, often networked meters is well underway. The roll out of these \emph{Smart Meters} has not been very
smooth overall with some countries severely lagging behind. As a safety-critical technology, smart metering technology
is usually standardized on a per-country basis. This leads to an inhomogenous landscape with--in some instances--wildly
incompatible systems.  Often vendors only serve a single country or have separate models of a meter for each country.
This complex standardization landscape and market situation has led to a proliferation of highly complex, custom-coded
microcontroller firmware. The complexity and scale of this--often network-connected--firmware makes for a ripe substrate
for bugs to surface.

A remotely exploitable flaw inside the firmware of a component of a smart metering system could have consequences
ranging from impaired billing functionality to an existential threat to grid stability\cite{anderson01,anderson02}. In a
country where meters commonly include disconnect switches for purposes such as prepaid tariffs, a coordinated attack
could at worst cause widespread activation of grid safety systems through oscillations caused by repeated cycling of
megawatts of load capacity at just the wrong frequency\cite{wu01}.

Mitigation of these attacks through firmware security measures is unlikely to yield satisfactory results. The enormous
complexity of smart meter firmware makes firmware security extremely labor-intensive. The diverse standardization
landscape makes a coordinated, comprehensive response unlikely.

In this paper, instead of focusing on the very hard task of improving firmware security we introduce a pragmatic
solution to the--in our opinion likely--scenario of a large-scale compromise of smart meter firmware. In our concept
the components of the smart meter that are threatened by remote compromise are equipped with a physically separate
\emph{safety reset controller} that listens for a ``reset'' command transmitted through the electrical grid's frequency
and on reception forcibly resets the smart meter's entire firmware to a known-good state.  Our safety reset controller
receives commands through Direct Sequence Spread Spectrum (DSSS) modulation carried out on grid frequency through a
large controllable load such as an aluminium smelter. After forward error correction and cryptographic verification it
re-flashes the meter's main microcontroller over the standard JTAG interface.  Note that our modulation technique is one
\emph{changing grid frequency itself}.  This is fundamentally different in both generation and detection from systems
such as traditional PLC that superimpose a signal on grid voltage, but leave the underlying grid frequency itself
unaffected.

Starting from a high level architecture, we have carried out simulations of our concept's performance under real-world
conditions. Based on these simulations we implemented an end-to-end prototype of our proposed safety reset controller as
part of a realistic smart meter demonstrator. Finally, we experimentally validated our results and we will conclude with
an outline of further steps towards a practical implementation.

This work contains the following contributions:
\begin{enumerate}
    \item We introduce Grid Frequency Modulation (GFM) as a communication primitive. % FIXME done before in that one paper
    \item We elaborate the fundamental physics underlying GFM and theorize on the constrains of a practical
        implementation.
    \item We design a communication system based on GFM.
    \item We carry out extensive simulations of our systems to determine its performance characteristics.
\end{enumerate}

\section{Related work}
\label{sec_related_work}

% FIXME: intro here

\subsection{Security and Privacy in the Smart Grid}

The smart grid in practice is nothing more or less than an aggregation of embedded control and measurement devices that
are part of a large control system. This implies that all the same security concerns that apply to embedded systems in
general also apply to the components of a smart grid. Where programmers have been struggling for decades now with issues
such as input validation\cite{leveson01}, the same potential issue raises security concerns in smart grid scenarios as
well\cite{mo01, lee01}.  Only, in smart grid we have two complicating factors present: Many components are embedded
systems, and as such inherently hard to update. Also, the smart grid and its control algorithms act as a large partially
distributed system making problems such as input validation or authentication harder\cite{blaze01} and adding a host of
distributed systems problems on top\cite{lamport01}.

Given that the electrical grid is essential infrastructure in our modern civilization, these problems amount to
significant issues. Attacks on the electrical grid may have grave consequences\cite{anderson01,lee01} while the long
replacement cycles of various components make the system slow to adapt. Thus, components for the smart grid need to be
built to a much higher standard of security than most consumer devices to ensure they live up to well-funded attackers
even decades down the road. This requirement intensifies the challenges of embedded security and distributed systems
security among others that are inherent in any modern complex technological system. The safety-critical nature of the
modern smart metering ecosystem in particular was quickly recognized\cite{anderson01}.

A point we will not consider in much depth in this work is theft of electricity. While in publications aimed towards the
general public the introduction of smart metering is always motivated with potential cost savings and ecological
benefits, in industry-internal publications the reduction of electricity theft is often cited as an
incentive\cite{czechowski01}.  Likewise, academic publications tend to either focus on other benefits such as generation
efficiency gains through better forecasting or rationalize the consumer-unfriendly aspects of smart metering with social
benefits\cite{mcdaniel01}. They do not usually point out revenue protection mechanisms as
incentives\cite{anderson01,anderson02}.

A serious issue in smart metering setups is customer privacy. Even though the meter ``only'' collects aggregate energy
consumption of a whole household, this data is highly sensitive\cite{markham01}. This counterintuitive fact was
initially overlooked in smart meter deployments leading to outrage, delays and reduced features\cite{cuijpers01}. The
root cause of this problem is that given sufficient timing resolution these aggregate measurements contain ample
entropy. Through disaggregation algorithms, individual loads can be identified and through pattern matching even complex
usage patterns can be discerned with alarming accuracy\cite{greveler01} in the same way that similar privacy issues
arise in many other areas of modern life through other kinds of pervasive tracking and surveillance\cite{zuboff01}.

Another fundamental challenge in smart grid implementations is the central role of smart electricity meters in the smart
grid ecosystem. Smart meters are used both for highly-granular load measurement and in some countries also for load
switching\cite{zheng01}.  Smart electricity meters are effectively consumer devices. They are built down to a certain
price point that is measured by the burden it puts on consumers and that is divided by the relatively small market
served by a single smart meter implementation. Such cost requirements can preclude security features such as the use of
a standard hardened software environment on a high powered embedded system.  Landis+Gyr, a large manufacturer that makes
most of its revenue from utility meters in their 2019 annual report write that they \SI{36}{\percent} of their total
R\&D budget on embedded software while spending only \SI{24}{\percent} on hardware R\&D\cite{landisgyr01,landisgyr02},
indicating a significant tension between firmware security and a smart meter vendor's bottom line.

\subsection{The state of the art in embedded security}

Embedded software security generally is much harder than security of higher-level systems. The primary two factors
affecting this are that on one hand, embedded devices usually run highly customized firmware that (often by necessity)
is rarely updated. On the other hand, embedded devices often lack advanced security mechanisms such as memory management
units that are found in most higher-power devices. Even well-funded companies continue to have trouble securing their
embedded systems. A spectacular example of this difficulty is the 2019 flaw in Apple's iPhone SoC first-stage ROM
bootloader that allows for the full compromise of any iPhone before the iPhone X given physical access to the
device\cite{heise01}. iPhone 8, one of the affected models, was still being manufactured and sold by Apple until April
2020.  In another instance in 2016, researchers found multiple flaws in the secure world firmware used by Samsung in
their mobile phone SoCs.  The flaws they found were both severe architectural flaws such as secret user input being
passed through untrusted userspace processes without any protection as well as shocking cryptographic flaws such as
CVE-2016-1919\footnote{\url{http://cve.circl.lu/cve/CVE-2016-1919}}\cite{kanonov01}.  And Samsung is not the only large
multinational corporation having trouble securing their secure world firmware implementation. In 2014 researchers found
an embarrassing integer overflow flaw in the low-level code handling untrusted input in Qualcomm's QSEE
firmware\cite{rosenberg01}. For an overview of ARM TrustZone including a survey of academic work and past security
vulnerabilities of TrustZone-based firmware see \cite{pinto01}.

If even companies with R\&D budgets that rival some countries' national budgets at mass-market consumer devices
have trouble securing their mass market secure embedded software stacks, what is a much smaller smart meter manufacturer
to do?  Especially if national standards mandate complex protocols such as TLS that are tricky to implement
correctly\cite{georgiev01}, this manufacturer will be short on options to secure their product.

\subsection{Attack surface in the smart grid}

From the incidents we outlined in the previous paragraphs we conclude that in smart metering technology, market
incentives do not currently provide the conditions for a level of device security that will reliably last the coming
decades. Considering this tension, in this paragraph we examine the cyberphysical risks that arise from attacks on the
smart grid in the first place. These risks arise at three different infrastructure levels.

The first level is that of attacks on centralized control systems. This type of attack is often cited in popular
discourse and to our knowledge is the only type of attack against an electric grid that has ever been carried out in
practice at scale\cite{lee01}. Despite their severity, these attacks do not pose a strictly \emph{scientific} challenge
since they are generic to any industrial control system. Their causes and countermeasures are generally well-understood
and the hardest challenge in their prevention is likely to be budgetary constraints.

Beyond the centralized control systems, the next target for an attacker may be the communication links between those
control systems and other smart grid components. While in some countries such as Italy special-purpose systems such as
PLC are common\cite{ec03}, overall, IP-based technologies have proliferated according to the larger trend in commputing
towards IP-based communications.  This proliferation of IP-based communication links brings along the possibility for
the application of generic network security measures from the IP world to the smart grid domain. In this way, a
standardized, IP-based protocol stack unlocks decades of network security improvements at little cost.

Beyond these layers towards the core of the smart grid's control infrastructure, an attacker might also corrupt the
network from the edges and target the endpoint devices itself. The large scale deployment of networked smart meters
creates an environment that is favorable to such attacks.
% FIXME cite RECESSIM landis+gyr protocol hacking wiki/youtube

\subsection{Cyberphysical threats in the smart grid}

Assuming that an attacker has compromised devices on any of these levels of smart grid infrastructure, what could they
do with their newly gained power? The obvious action would be to switch off everything. Of all scenarios,
this is both the most likely in practice---it is exactly what happened in the russian cyberattacks on the Ukranian
grid\cite{lee01}---but it is also the easiest to mitigate since the vulnerable components are few and centralized.
Mitigations include the installation of fail-safes as well as a defense in depth approach to hardening the grid's
cyber-infrastructure.

Another possible action for an attacker would be to forge energy measurements in an attempt to cause financial mayhem.
Both individual consumers as well as the utility could be targeted by such an attack. While such an attack might have
localized success, larger-scale discrepancies will likely quickly be caught by monitoring systems. For example, if a
large number of meters in an area systematically under- or over-reported their energy readings, meter readings across
the affected area would no longer add up with those of monitoring devices in other locations in the transmission and
distribution grid.

In some countries, smart meter functionality goes beyond mere monitoring devices and also includes remotely controlled
switches. There are two types of these switches: Switches to support \emph{Demand-Side Management} (DMS) and cut
off-switches that are used to punish defaulting customers. Demand Side Management is when a grid operator can remotely
control the timing of large, non-time-critical loads on the customer's premises\cite{dzung01}. A typical example of this
is a customer using an electric water heater: The heater is outfitted with a large hot water storage tank and is
connected hooked up to the utility's DSM system. The customer does not care when exactly their water is heated as long
as there is enough of it, and the utitliy offers them cheaper rates for the electricity used for heating in exchange for
control over its precise timing.  The utility uses this control to even out peaks in the consumption/production
imbalance, remotely enabling DSM systems during off-peak times and disabling them during peak hours.  In contrast to
DSM, cut-off switches are switches placed in-between the grid and the entire customer's household such that the utility
can disconnect non-paying customers without incurring the expense of sending a technician to the customer's premises.
Unlike DSM systems, cut-off switches are not opt-in\cite{anderson01,temple01}.  An attack that uses cut-off switches
would obviously immediately cause severe mayhem. Attacks on DSM may have more limited immediate impact as affected
consumers may not notice an interruption for several hours.

Instead of switching off loads outright, an attack employing DSM switches (and potentially also cut-off switches) could
choose to target the grid's stability.  By synchronizing many compromised smart meters to switch on and off a large
amount of load capacity, an attacker might cause the entire electrical grid to oscillate\cite{kosut01,wu01,kim01}. As a
large system of coupled mechanical systems, the electrical grid exhibits a complex frequency-domain behavior.  These
resonance effects, colloquially called ``modes'', are well-studied in power system
engineering\cite{rogers01,grebe01,entsoe01,crastan03}. As they can cause issues even under normal operating conditions,
a large effort is invested in dampening these resonances. Howewer, fully eliminating them under changing load conditions
may not be achievable.

\subsection{Communication Channels on the Grid}

A core part of intervening with any such cyberattack is the ability to communicate remediary actions to the devices
under attack.  There is a number of well-established technologies for communication on or along power lines. We can
distinguish three basic system categories: Systems using separate wires (such as DSL over landline telephone wiring),
wireless radio systems (such as LTE) and \emph{Power Line Communication} (PLC) systems that reuse the existing mains
wiring and superimpose data transmissions onto the 50 Hz mains sine\cite{gungor01,kabalci01}.

During a large-scale cyberattack, availability of internet and cellular connectivity cannot be relied upon. An attacker
may already have disabled such systems in a separate attack, or they may go down along with parts of the electrical
grid. Traditional powerline communication systems or an utitly's proprietary wireless systems would work, but at a range
of no more than several tens of kilometers reaching all meters in a country would require a large upfront infrastructure
investment.

\section{Grid Frequency as a Communication Channel}

We propose to approach the problem of broadcasting an emergency signal to all smart meters within a synchronous area by
using grid frequency as a communication channel.  Despite the awesome complexity of large power grids, the physics
underlying their response to changes in load and generation is surprisingly simple. Individual machines (loads and
generators) can be approximated by a small number of differential equations and the entire grid can be modelled by
aggregating these approximations into a large system of nonu differential equations. As a consequence, small signal
changes in generation/consumption power balance cause an approximately proportional change in
frequency\cite{kundur01,crastan03,entsoe02,entsoe04}.  This \emph{Power Frequency Charactersistic} is about
\SI{25}{\giga\watt\per\hertz} for the continental European synchronous area according to European electricity grid
authority ENTSO-E.

If we modulate the power consumption of a large load such as a multi-megawatt aluminium smelter, this modulation will
result in a small change in frequency according to this characteristic. So long as we stay within the operational limits
set by ENTSO-E\cite{entsoe02,entsoe03}, this change will not degrade the operation of other parts of the grid. The
advantages of grid frequency modulation are the fact that a single transmitter can cover an entire synchronous area as
well as low receiver hardware complexity.

To the best of the authors' knowledge, grid frequency modulation has only ever been proposed as a communication channel
at very small scales in microgrids before\cite{urtasun01} and has not yet been considered for large-scale application.

\subsection{Characterizing Grid Frequency}

In utility SCADA systems, Phasor Measurement Units (PMUs, also called \emph{synchrophasors}) are used to precisely
measure grid frequency among other parameters.  This task is much more complicated in practice than it might appear at
first glance since a PMU has to make extremely precise measurements, track fast changes in frequency and handle even
distorted input signals.  Detail on the inner workings of commercial phasor measurement units is scarce but there is a
large amount of academic research on sophisticated phasor measurement
algorithms\cite{narduzzi01,derviskadic01,belega01}.

Since we do not need reference standard-grade accuracy for our application we chose to start with a very basic algorithm
based on short-time fourier transform (STFT). Our system uses the universal frequency estimation approach of
experimental physicists Gasior and Gonzalez at CERN\cite{gasior01}.  The Gasior and Gonzalez algorithm\cite{gasior01}
passes the windowed input signal through a DFT, then interpolates the signal's fundamental frequency by fitting a
wavelet such as a Gaussian to the largest peak in the DFT results. The bias parameter of this curve fit is an accurate
estimation of the signal's fundamental frequency. This algorithm is similar to the simpler interpolated DFT algorithm
used as a reference in much of the phasor measurement literature\cite{borkowski01}.

To collect ground truth measurements for our analysis of grid frequency as a communication channel, we developed a device
to safely record real mains voltage waveforms.  Our system consists of an \texttt{STM32F030F4P6} ARM Cortex M0
microcontroller that records mains voltage using its internal 12-bit ADC and transmits measured values through a
galvanically isolated USB/serial bridge to a host computer. We derive our system's sampling clock from a crystal oven to
avoid frequency measurement noise due to thermal drift of a regular crystal: \SI{1}{ppm} of crystal drift would cause a
grid frequency error of $\SI{50}{\micro\hertz}$. We validated the performance of our crystal oven solution by
benchmarking it against a GPS 1pps reference.

% FIXME measurement results, spectra

\section{Grid Frequency Modulation}

Given the grid characteristics we measured using our custom waveform recorder and a model of our transmitter, we can
derive parameters for the modulation of our broadcast system.  In its most basic form a transmitter for grid frequency
modulation would be a very large controllable load connected to the power grid at a suitable vantage point. A spool of
wire submerged in a body of cooling liquid such as a small lake along with a thyristor rectifier bank would likely
suffice to perform this function during occasional cybersecurity incidents.  We can however decrease hardware and
maintenance investment even compared to this rather uncultivated solution by repurposing large industrial loads
as transmitters.  Going through a list of energy-intensive industries in Europe\cite{ec01}, we found that an aluminium
smelter would be a good candidate.  In aluminium smelting, aluminium is electrolytically extracted from alumina solution.
High-voltage mains power is transformed, rectified and fed into about 100 series-connected electrolytic cells forming a
\emph{potline}. Inside these pots alumina is dissolved in molten cryolite electrolyte at about \SI{1000}{\degreeCelsius}
and electrolysis is performed using a current of tens or hundreds of Kiloampère. The resulting pure aluminium settles at
the bottom of the cell and is tapped off for further processing.

Aluminium smelters are operated around the clock, and due to the high financial stakes their behavior under power
outages has been carefully characterized by the industry. Power outages of tens of minutes up to two hours reportedly do
not cause problems in aluminium potlines\cite{eisma01,oye01}. Recently, even techniques for intentional power modulation
without affecting cell lifetime or product quality have been devloped to take advantage of variable energy
prices.\cite{duessel01,eisma01}.  An aluminium plant's power supply is controlled to constantly keep all smelter cells
under optimal operating conditions. Modern power supply systems employ large banks of diodes or SCRs to rectify
low-voltage AC to DC to be fed into the potline\cite{ayoub01}. Potline voltage is controlled through a combination of a
tap changer and a transductor.  Individual cell voltages are controlled by changing the physical distance between anode
and cathode distance.  In this setup, power can be modulated fully electronically. Since this system does not have any
mechanical inertia, high modulation rates can reasonably be achieved.

\subsection{Parametrizing Modulation for GFM}

Modulating $\SI{25}{\mega\watt}$ of smelter power would yield a frequency shift of $\SI{1}{\milli\hertz}$. At an RMS
frequency noise of around $\SI{10}{\milli\hertz}$ in the band around $\SI{1}{\hertz}$, this results in challenging SNR.
% FIXME properly calculate frequency noise density, SNR
Under such conditions, the obvious choice for modulation are spread-spectrum techniques. Thus, we approached the setting
using Direct Sequence Spread Spectrum for its simple implementation and good overall performance. DSSS chip timing
should be as fast as the transmitter's physics allow to exploit the low-noise region between
$\SI{0.2}{\hertz}$ to $\SI{2.0}{\hertz}$ in the frequency noise spectrum while avoiding any of the grid's oscillation modes. Going
past $\approx\SI{2}{\hertz}$ would put strain on the receiver's frequency measurement subsystem\cite{belega01}. Using a
spread-spectrum technique allows us to reduce the effect of interference by spurious tones. In addition, spreading our
signal's energy over frequency also reduces the likelihood that we cause the grid to oscillate along any of its modes.

To test our proposed approach, we wrote a proof-of-concept modulator and demodulator in Python and tested this
proof-of-concept prototype with data captured from our grid frequency sensor. Our simulations covered a range of
parameters in modulation amplitude, DSSS sequence bit depth, chip duration and detection threshold.
Figure~\ref{fig_ser_nbits} shows symbol error rate (SER) as a function of modulation amplitude with Gold sequences of
several bit depths. As can be seen, realistic modulation amplitudes are in the range around $\SI{1}{\milli\hertz}$. In
the continental European synchronous area, this corresponds to a modulation power of approximately
$\SI{25}{\mega\watt}$.  Figure~\ref{fig_ser_thf} shows SER against detection threshold relative to background noise.
Figure~\ref{fig_ser_chip} shows SER against chip duration for a given fixed symbol length. As expected from looking at
our measured grid frequency noise spectrum, performance is best for short chip durations and worsens for longer chip
durations since shorter chip durations move our signals' bandwidth into the lower-noise region from $\SI{0.2}{\hertz}$
to $\SI{2}{\hertz}$.
%FIXME introduce term "chip" somewhere

\begin{figure}
    \centering
    \includegraphics[width=0.6\textwidth]{../notebooks/fig_out/dsss_gold_nbits_overview}
    \caption{Symbol Error Rate as a function of modulation amplitude for Gold sequences of several lengths.}
    \label{fig_ser_nbits}
\end{figure}

\begin{figure}
    \centering
    \hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../notebooks/fig_out/dsss_thf_amplitude_5678}
    \caption{SER vs.\ Amplitude and detection threshold. Detection threshold is set as a factor of background noise
    level.}
    \label{fig_ser_thf}
\end{figure}

\begin{figure}
    \centering
    \hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../notebooks/fig_out/chip_duration_sensitivity_6}
    \vspace*{-1cm}
    \caption{SER vs.\ DSSS chip duration.}
    \label{fig_ser_chip}
\end{figure}

\subsection{Parametrizing a proof-of-concept "Safety Reset" System Based on GFM}

Taking these modulation parameters as a starting point, we proceeded to create a proof-of-concept smart meter emergency
reset system. On top of the modulation described in the previous paragraphs we layered simple Reed-Solomon error
correction\cite{mackay01} and some cryptography. The goal of our PoC cryptographic implementation was to allow the
sender of an emergency reset broadcast to authorize a reset command to all listening smart meters. An additional
constraint of our setting is that due to the extremely slow communication channel all messages should be kept as short
as possible. The solution we chose for our PoC is a simplistic hash chain using the approach from the Lamport and
Winternitz One-time Signature (OTS) schemes. Informally, the private key is a random bitstring. The public key is
generated by recursively applying a hash function to this key a number of times. Each smart meter reset command is then
authorized by disclosing subsequent elements of this series. Unwinding the hash chain from the public key at the end of
the chain towards the private key at its beginning, at each step a receiver can validate the current command by checking
that it corresponds to the previously unknown input of the current step of the hash chain. Replay attacks are prevented
by recording the most recent valid command. This simple scheme does not afford much functionality but it results in very
short messages and removes the need for computationally public key cryptography inside the smart meter.
% FIXME add more precise/formal description of crypto
% FIXME add description of targeting/scope function?
% FIXME somewhere above descirbe entire reset system architecture????!!!
% FIXME add description of disarm message (replay protection)

\subsection{Experimental results}

\begin{figure}
    \centering
    \includegraphics[width=0.6\textwidth]{prototype.jpg}
    \caption{The completed prototype setup. The board on the left is the safety reset microcontroller. It is connected
    to the smart meter in the middle through an adapter board. The top left contains a USB hub with debug interfaces to
    the reset microcontroller. The cables on the bottom left are the debug USB cable and the \SI{3.5}{\milli\meter}
    audio cable for the simulated mains voltage input.}
    \label{fig_proto_pic}
\end{figure}

For a realistic proof of concept, we decided to implement our signal processing chain from DSSS demodulator through
error correction up to our simple cryptography layer in microcontroller firmware and demonstrate this firmware on actual
smart meter hardware, shown in Figure~\ref{fig_proto_pic}. In our proof of concept a safety reset controller is
connected to the main application microcontroller of a smart meter. The reset controller is tasked with listening for
authenticated reset commands on the voltage waveform, and on reception of such a command resetting the smart meter
application controller by flashing a known-good firmware image to its memory.

The signal processing chain of our PoC is shown in Figure~\ref{fig_demo_sig_schema}. To interoperate with existing
implementations of SHA-512 and reed-solomon decoding, this implementation was written in the C programming language. To
demonstrate an application close to a field implementation, we chose an Easymeter \texttt{Q3DA1002} smart meter as our
reset target. This model is popular in the German market and readily available second-hand. The meter consists of three
isolated metering ASICs connected to a data logging and display PCB through infrared optical links. To demonstrate the
safety reset's firmware reset functionality, we connected our safety reset microcontroller to the Texas Instruments
\texttt{MSP430} microcontroller on the meter's display and data logging board through the JTAG debug interface that the
board's vendor had conveniently left accessible. We ported part of
\texttt{mspdebug}\footnote{\url{https://dlbeer.co.nz/mspdebug/}} to drive the meter microcontroller's JTAG interface and
wrote a piece of demonstrator code that overwrites the meter's firmware with one that displays an identifying string on
the meter's display after boot-up.

\begin{figure}
    \centering
    \includegraphics[width=\textwidth]{prototype_schema}
    \caption{The signal processing chain of our demonstrator.}
    \label{fig_demo_sig_schema}
\end{figure}

Since we did not have an aluminium smelter ready, we decided to feed our proof-of-concept reset controller with an
emulated grid voltage sine wave from a computer's headphone jack. Where in a real application this microcontroller might
take ADC readings of input mains voltage divided down by a long resistive divider chain, we instead feed the ADC from a
$\SI{3.5}{\milli\meter}$ audio input. For operational safety, we disconnected the meter microcontroller from its
grid-referenced capacitive dropper power supply and connected it to our reset controlller's debug USB power supply.

We performed several successful experiments using a signature truncated at 120 bit and a 5 bit DSSS sequence. Taking the
sign bit into account, the length of the encoded signature is 20 DSSS symbols. On top of this we used Reed-Solomon error
correction at a 2:1 ratio inflating total message length to 30 DSSS symbols. At the \SI{1}{\second} chip rate we used in
other simulations as well this equates to an overall transmission duration of approximately \SI{15}{\minute}. To give
the demodulator some time to settle and to produce more realistic conditions of signal reception we padded the modulated
signal unmodulated noise on both ends.

\section{Discussion}

For our proof of concept, before settling on the commercial smart meter we first tried to use an \texttt{EVM430-F6779}
smart meter evaluation kit made by Texas Instruments. This evaluation kit did not turn out well for two main reasons.
One, it shipped with half the case missing and no cover for the terminal blocks. Because of this some work was required
to get it electrically safe.  Even after mounting it in an electrically safe manner the safety reset controller
prototype would also have to be galvanically isolated to not pose an electrical safety risk since the main MCU is not
isolated from the grid and the JTAG port is also galvanically coupled. The second issue we ran into was that the
development board is based around a specific microcontroller from TI's \texttt{MSP430} series that is incompatible with
common JTAG programmers.

Our initial assumption that a development kit would be easier to program than a commercial meter did not prove to be
true. Contrary to our expectations the commercial meter had JTAG enabled allowing us to easily read out its stock
firmware without either reverse-engineering vendor firmware update files nor circumventing code protection measures.
The fact that its firmware was only available in its compiled binary form was not much of a hindrance as it proved not
to be too complex and all we wanted to know we found out with just a few hours of digging in
Ghidra\footnote{\url{https://ghidra-sre.org/}}.

In the firmware development phase our approach of testing every module individually (e.g. DSSS demodulator, Reed-Solomon
decoder, grid frequency estimation) proved to be very useful. In particular debugging benefited greatly from being able
to run several thousand tests within seconds. In case of our DSSS demodulator, this modular testing and simulation
architecture allowed us to simulate thousands of runs of our implementation on test data and directly compare it to our
Jupyter/Python prototype. Since we spent more time polishing our embedded C implementation it turned out to perform
better than our Python prototype while still exhibiting the same fundamental response to changes to its parameters.  One
significant bug we fixed in the embedded C version was the Python version's tendency towards incorrect decodings at even
very large amplitudes.

In accordance with our initial estimations we did not run into any code space nor computation bottlenecks for chosing
floating point emulation instead of porting over our algorithms to fixed point calculations. The extremely slow sampling
rate of our systems makes even heavyweight processing such as FFT or our brute force dynamic programming approach to
DSSS demodulation possible well within our performance constraints.

Since we are only building a prototype we did not optimize firmware code size.  At around \SI{64}{\kilo\byte}, the
compiled code size of our firmware implementation is slightly larger than we would like. The overall most heavy-weight
operations are the SHA512 implementation from libsodium and the FFT from ARM's CMSIS signal processing library.
Especially the SHA512 implementation has large potential for size optimization because it is highly optimized for speed
using extensive manual loop unrolling. Despite being larger than what we initially targeted, this firmware is still
small compared to the firmware space available in commercially deployed smart meters. We estimate that even without
additional optimizations, our PoC firmware is already within the realm of firmware size that could be implemented in a
commercially viable safety reset controller.

\section{Conclusion}
\label{sec_conclusion}

In this paper we have developed an end-to-end design of a reset system to restore smart meters to a safe operating state
during an ongoing large-scale cyberattack.  To allow our system to be triggered even in the middle of a cyberattack we
have developed a broadcast data transmission system based on intentional modulation of global grid frequency. We have
shown the viability of our end-to-end design through simulations. To put these simulations on a solid foundation we have
developed a grid frequency measurement methodology comprising of a custom-designed hardware device for electrically safe
data capture and a set of software tools to archive and process captured data. Our simulations show good behavior of our
broadcast communication system and give an indication that cooperating with a large consumer such as an aluminium smelter
would be a feasible way to set up a transmitter with low hardware overhead.  We have outlined a simple cryptographic
protocol ready for embedded implementation in resource-constrained systems that allows triggering a safety reset with a
response time of less than 30 minutes.  We have experimentally validated our system using simulated grid frequency data
in a demonstrator setup based on a commercial microcontroller as our safety reset controller and an off-the-shelf smart
meter. Source code and electronics CAD designs are available at the public repository listed at the end of this
document.

\printbibliography[heading=bibintoc]

%%% FIXME remove appendix and work into text.

\center{
    \center{This is version \texttt{\input{version.tex}\unskip} of this paper, generated on \today. The git repository
    can be found at:}

    \center{\url{https://git.jaseg.de/safety-reset.git}}
}
\end{document}