554 lines
43 KiB
TeX
554 lines
43 KiB
TeX
\documentclass[runningheads]{llncs}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage[
|
|
backend=biber,
|
|
style=numeric,
|
|
natbib=true,
|
|
url=false,
|
|
doi=true,
|
|
eprint=false
|
|
]{biblatex}
|
|
\addbibresource{safety-reset.bib}
|
|
\usepackage{amssymb,amsmath}
|
|
\usepackage{eurosym}
|
|
\usepackage{wasysym}
|
|
|
|
\usepackage[binary-units]{siunitx}
|
|
\DeclareSIUnit{\baud}{Bd}
|
|
\DeclareSIUnit{\year}{a}
|
|
\usepackage{commath}
|
|
\usepackage{graphicx,color}
|
|
\usepackage{subcaption}
|
|
\usepackage{array}
|
|
\usepackage{hyperref}
|
|
|
|
\renewcommand{\floatpagefraction}{.8}
|
|
\newcommand{\degree}{\ensuremath{^\circ}}
|
|
\newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}
|
|
\newcommand{\partnum}[1]{\texttt{#1}}
|
|
|
|
\begin{document}
|
|
|
|
\title{Ripples in the Pond: Transmitting Information through Grid Frequency Modulation}
|
|
\titlerunning{Ripples in the Pond: Transmitting Information through Grid Frequency}
|
|
\author{Jan Sebastian Götte \and Liran Katzir \and Björn Scheuermann}
|
|
\institute{HIIG\\ \email{safetyreset@jaseg.de} \and Tel Aviv University\\Faculty of Engineering \and HU Berlin \\ \email{scheuermann@informatik.hu-berlin.de}}
|
|
% FIXME keywords
|
|
\maketitle
|
|
\keywords{Security, privacy and resilience in critical infrastructures \and Security and privacy in ``internet of
|
|
things'' \and Cyber-physical systems \and Hardware security \and Network Security \and Energy systems \and Signal theory}
|
|
|
|
\begin{abstract}
|
|
The smart grid is a large, complex and interconnected technological system. With remotely controllable load switches
|
|
having been rolled out at scale in some countries, a tiny flaw inside the firmware of one of these embedded devices
|
|
may enable attacks to remotely trigger large-scale disruption with potentially catastrophic results. Attaining
|
|
perfect security against such cyberphysical attacks is a monumental embedded engineering task---and observations do
|
|
not indicate that current efforts meet the requirements of this task.%FIXME cite recent RECESSIM work
|
|
|
|
In this paper, we approach the smart grid safety issue by implementing an emergency override that can be used to
|
|
reset all connected devices to a known-good state and preempt subsequent compromise by cutting communication links.
|
|
To yield a fully fail-safe design, our system does not rely on the internet or other conventional communication
|
|
network to work. Instead, our system transmits error-corrected and cryptographically secured commands by modulating
|
|
grid frequency using a single large consumer such as a large aluminium smelter. This approach differs from
|
|
traditional Powerline Communication (PLC) systems in that reaches every device within the same synchronous area as
|
|
the signal is embedded into the fundamental grid frequency instead of a superimposed voltage that is quickly
|
|
attenuated across long distances.
|
|
|
|
Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load would allow for the transmission
|
|
of a crytographically secured \emph{reset} signal within $15$ minutes. We have produced a proof-of-concept prototype
|
|
receiver that demonstrates the feasibility of decoding such signals even on resource-constrained microcontroller
|
|
hardware.
|
|
\end{abstract}
|
|
|
|
\section{Introduction}
|
|
|
|
In the power grid, as in many other engineered systems, we can observe an ongoing diffusion of information systems into
|
|
the domain of industrial control. Automation of these control systems has already been practiced for the better part of a
|
|
century. Throughout the 20th century this automation was mostly limited to core components of the grid. Generators in
|
|
power stations are computer-controlled according to electromechanical and economic models. Switching in substations is
|
|
automated to allow for fast failure recovery. Human operators are still vital to these systems, but their tasks have
|
|
shifted from pure operation to engineering, maintenance and surveillance\cite{crastan03,anderson02}.
|
|
|
|
With the turn of the century came a large-scale trend in power systems to move from a model of centralized generation,
|
|
built around massive large-scale fossil and nuclear power plants, towards a more heterogenous model of smaller-scale
|
|
generators working together. In this new model large-scale fossil power plants still serve a major role, but new
|
|
factors come into play. One such factor is the advance of renewable energies. The large-scale use of wind and solar power in
|
|
particular seems unavoidable for continued human life on this planet. For the electrical
|
|
grid these systems constitute a significant challenge. Fossil-fueled power plants can be controlled in a precise and
|
|
quick way to match energy consumption. This tracking of consumption with production is vital to the stability of the
|
|
grid. Renewable energies such as wind and solar power do not provide the same degree of controllability, and they
|
|
introduce a larger degree of uncertainty due to the unpredictability of the forces of nature\cite{crastan03}.
|
|
|
|
Along with this change in dynamic behavior, renewable energies have brought forth the advance of distributed generation.
|
|
In distributed generation end-customers that previously only consumed energy have started to feed energy into the grid
|
|
from small solar installations on their property. Distributed generation is a chance for customers to gain autonomy and
|
|
shift from a purely passive role to being active participants of the electricity market\cite{crastan03}.
|
|
|
|
% FIXME the following paragraph is weird.
|
|
|
|
To match this new landscape unpredictable renewable resources and of decentralized generation, the utility industry has
|
|
had to adapt itself in major ways. One aspect of this adaptation that is particularly visible to energy consumers is the
|
|
computerization of end-user energy metering. Despite the widespread use of industrial control systems inside the
|
|
electrical grid and the far-reaching diffusion of computers into people's everyday lives, the energy meter has long been
|
|
one of the last remnants of an offline, analog time. Until the 2010s many households were still served through
|
|
electromechanical Ferraris-style meters that have their origin in the late 19th
|
|
century\cite{borlase01,ukgov04,bnetza02}. Today, under the umbrella term \emph{Smart Metering}, the shift towards fully
|
|
computerized, often networked meters is well underway. The roll out of these \emph{Smart Meters} has not been very
|
|
smooth overall with some countries severely lagging behind. As a safety-critical technology, smart metering technology
|
|
is usually standardized on a per-country basis. This leads to an inhomogenous landscape with--in some instances--wildly
|
|
incompatible systems. Often vendors only serve a single country or have separate models of a meter for each country.
|
|
This complex standardization landscape and market situation has led to a proliferation of highly complex, custom-coded
|
|
microcontroller firmware. The complexity and scale of this--often network-connected--firmware makes for a ripe substrate
|
|
for bugs to surface.
|
|
|
|
A remotely exploitable flaw inside the firmware of a component of a smart metering system could have consequences
|
|
ranging from impaired billing functionality to an existential threat to grid stability\cite{anderson01,anderson02}. In a
|
|
country where meters commonly include disconnect switches for purposes such as prepaid tariffs, a coordinated attack
|
|
could at worst cause widespread activation of grid safety systems through oscillations caused by repeated cycling of
|
|
megawatts of load capacity at just the wrong frequency\cite{wu01}.
|
|
|
|
Mitigation of these attacks through firmware security measures is unlikely to yield satisfactory results. The enormous
|
|
complexity of smart meter firmware makes firmware security extremely labor-intensive. The diverse standardization
|
|
landscape makes a coordinated, comprehensive response unlikely.
|
|
|
|
In this paper, instead of focusing on the very hard task of improving firmware security we introduce a pragmatic
|
|
solution to the--in our opinion likely--scenario of a large-scale compromise of smart meter firmware. In our concept
|
|
the components of the smart meter that are threatened by remote compromise are equipped with a physically separate
|
|
\emph{safety reset controller} that listens for a ``reset'' command transmitted through the electrical grid's frequency
|
|
and on reception forcibly resets the smart meter's entire firmware to a known-good state. Our safety reset controller
|
|
receives commands through Direct Sequence Spread Spectrum (DSSS) modulation carried out on grid frequency through a
|
|
large controllable load such as an aluminium smelter. After forward error correction and cryptographic verification it
|
|
re-flashes the meter's main microcontroller over the standard JTAG interface. Note that our modulation technique is one
|
|
\emph{changing grid frequency itself}. This is fundamentally different in both generation and detection from systems
|
|
such as traditional PLC that superimpose a signal on grid voltage, but leave the underlying grid frequency itself
|
|
unaffected.
|
|
|
|
Starting from a high level architecture, we have carried out simulations of our concept's performance under real-world
|
|
conditions. Based on these simulations we implemented an end-to-end prototype of our proposed safety reset controller as
|
|
part of a realistic smart meter demonstrator. Finally, we experimentally validated our results and we will conclude with
|
|
an outline of further steps towards a practical implementation.
|
|
|
|
This work contains the following contributions:
|
|
\begin{enumerate}
|
|
\item We introduce Grid Frequency Modulation (GFM) as a communication primitive. % FIXME done before in that one paper
|
|
\item We elaborate the fundamental physics underlying GFM and theorize on the constrains of a practical
|
|
implementation.
|
|
\item We design a communication system based on GFM.
|
|
\item We carry out extensive simulations of our systems to determine its performance characteristics.
|
|
\end{enumerate}
|
|
|
|
\section{Related work}
|
|
\label{sec_related_work}
|
|
|
|
% FIXME: intro here
|
|
|
|
\subsection{Security and Privacy in the Smart Grid}
|
|
|
|
The smart grid in practice is nothing more or less than an aggregation of embedded control and measurement devices that
|
|
are part of a large control system. This implies that all the same security concerns that apply to embedded systems in
|
|
general also apply to the components of a smart grid. Where programmers have been struggling for decades now with issues
|
|
such as input validation\cite{leveson01}, the same potential issue raises security concerns in smart grid scenarios as
|
|
well\cite{mo01, lee01}. Only, in smart grid we have two complicating factors present: Many components are embedded
|
|
systems, and as such inherently hard to update. Also, the smart grid and its control algorithms act as a large partially
|
|
distributed system making problems such as input validation or authentication harder\cite{blaze01} and adding a host of
|
|
distributed systems problems on top\cite{lamport01}.
|
|
|
|
Given that the electrical grid is essential infrastructure in our modern civilization, these problems amount to
|
|
significant issues. Attacks on the electrical grid may have grave consequences\cite{anderson01,lee01} while the long
|
|
replacement cycles of various components make the system slow to adapt. Thus, components for the smart grid need to be
|
|
built to a much higher standard of security than most consumer devices to ensure they live up to well-funded attackers
|
|
even decades down the road. This requirement intensifies the challenges of embedded security and distributed systems
|
|
security among others that are inherent in any modern complex technological system. The safety-critical nature of the
|
|
modern smart metering ecosystem in particular was quickly recognized\cite{anderson01}.
|
|
|
|
A point we will not consider in much depth in this work is theft of electricity. While in publications aimed towards the
|
|
general public the introduction of smart metering is always motivated with potential cost savings and ecological
|
|
benefits, in industry-internal publications the reduction of electricity theft is often cited as an
|
|
incentive\cite{czechowski01}. Likewise, academic publications tend to either focus on other benefits such as generation
|
|
efficiency gains through better forecasting or rationalize the consumer-unfriendly aspects of smart metering with social
|
|
benefits\cite{mcdaniel01}. They do not usually point out revenue protection mechanisms as
|
|
incentives\cite{anderson01,anderson02}.
|
|
|
|
A serious issue in smart metering setups is customer privacy. Even though the meter ``only'' collects aggregate energy
|
|
consumption of a whole household, this data is highly sensitive\cite{markham01}. This counterintuitive fact was
|
|
initially overlooked in smart meter deployments leading to outrage, delays and reduced features\cite{cuijpers01}. The
|
|
root cause of this problem is that given sufficient timing resolution these aggregate measurements contain ample
|
|
entropy. Through disaggregation algorithms, individual loads can be identified and through pattern matching even complex
|
|
usage patterns can be discerned with alarming accuracy\cite{greveler01} in the same way that similar privacy issues
|
|
arise in many other areas of modern life through other kinds of pervasive tracking and surveillance\cite{zuboff01}.
|
|
|
|
Another fundamental challenge in smart grid implementations is the central role of smart electricity meters in the smart
|
|
grid ecosystem. Smart meters are used both for highly-granular load measurement and in some countries also for load
|
|
switching\cite{zheng01}. Smart electricity meters are effectively consumer devices. They are built down to a certain
|
|
price point that is measured by the burden it puts on consumers and that is divided by the relatively small market
|
|
served by a single smart meter implementation. Such cost requirements can preclude security features such as the use of
|
|
a standard hardened software environment on a high powered embedded system. Landis+Gyr, a large manufacturer that makes
|
|
most of its revenue from utility meters in their 2019 annual report write that they \SI{36}{\percent} of their total
|
|
R\&D budget on embedded software while spending only \SI{24}{\percent} on hardware R\&D\cite{landisgyr01,landisgyr02},
|
|
indicating a significant tension between firmware security and a smart meter vendor's bottom line.
|
|
|
|
\subsection{The state of the art in embedded security}
|
|
|
|
Embedded software security generally is much harder than security of higher-level systems. The primary two factors
|
|
affecting this are that on one hand, embedded devices usually run highly customized firmware that (often by necessity)
|
|
is rarely updated. On the other hand, embedded devices often lack advanced security mechanisms such as memory management
|
|
units that are found in most higher-power devices. Even well-funded companies continue to have trouble securing their
|
|
embedded systems. A spectacular example of this difficulty is the 2019 flaw in Apple's iPhone SoC first-stage ROM
|
|
bootloader that allows for the full compromise of any iPhone before the iPhone X given physical access to the
|
|
device\cite{heise01}. iPhone 8, one of the affected models, was still being manufactured and sold by Apple until April
|
|
2020. In another instance in 2016, researchers found multiple flaws in the secure world firmware used by Samsung in
|
|
their mobile phone SoCs. The flaws they found were both severe architectural flaws such as secret user input being
|
|
passed through untrusted userspace processes without any protection as well as shocking cryptographic flaws such as
|
|
CVE-2016-1919\footnote{\url{http://cve.circl.lu/cve/CVE-2016-1919}}\cite{kanonov01}. And Samsung is not the only large
|
|
multinational corporation having trouble securing their secure world firmware implementation. In 2014 researchers found
|
|
an embarrassing integer overflow flaw in the low-level code handling untrusted input in Qualcomm's QSEE
|
|
firmware\cite{rosenberg01}. For an overview of ARM TrustZone including a survey of academic work and past security
|
|
vulnerabilities of TrustZone-based firmware see \cite{pinto01}.
|
|
|
|
If even companies with R\&D budgets that rival some countries' national budgets at mass-market consumer devices
|
|
have trouble securing their mass market secure embedded software stacks, what is a much smaller smart meter manufacturer
|
|
to do? Especially if national standards mandate complex protocols such as TLS that are tricky to implement
|
|
correctly\cite{georgiev01}, this manufacturer will be short on options to secure their product.
|
|
|
|
\subsection{Attack surface in the smart grid}
|
|
|
|
From the incidents we outlined in the previous paragraphs we conclude that in smart metering technology, market
|
|
incentives do not currently provide the conditions for a level of device security that will reliably last the coming
|
|
decades. Considering this tension, in this paragraph we examine the cyberphysical risks that arise from attacks on the
|
|
smart grid in the first place. These risks arise at three different infrastructure levels.
|
|
|
|
The first level is that of attacks on centralized control systems. This type of attack is often cited in popular
|
|
discourse and to our knowledge is the only type of attack against an electric grid that has ever been carried out in
|
|
practice at scale\cite{lee01}. Despite their severity, these attacks do not pose a strictly \emph{scientific} challenge
|
|
since they are generic to any industrial control system. Their causes and countermeasures are generally well-understood
|
|
and the hardest challenge in their prevention is likely to be budgetary constraints.
|
|
|
|
Beyond the centralized control systems, the next target for an attacker may be the communication links between those
|
|
control systems and other smart grid components. While in some countries such as Italy special-purpose systems such as
|
|
PLC are common\cite{ec03}, overall, IP-based technologies have proliferated according to the larger trend in commputing
|
|
towards IP-based communications. This proliferation of IP-based communication links brings along the possibility for
|
|
the application of generic network security measures from the IP world to the smart grid domain. In this way, a
|
|
standardized, IP-based protocol stack unlocks decades of network security improvements at little cost.
|
|
|
|
Beyond these layers towards the core of the smart grid's control infrastructure, an attacker might also corrupt the
|
|
network from the edges and target the endpoint devices itself. The large scale deployment of networked smart meters
|
|
creates an environment that is favorable to such attacks.
|
|
% FIXME cite RECESSIM landis+gyr protocol hacking wiki/youtube
|
|
|
|
\subsection{Cyberphysical threats in the smart grid}
|
|
|
|
Assuming that an attacker has compromised devices on any of these levels of smart grid infrastructure, what could they
|
|
do with their newly gained power? The obvious action would be to switch off everything. Of all scenarios,
|
|
this is both the most likely in practice---it is exactly what happened in the russian cyberattacks on the Ukranian
|
|
grid\cite{lee01}---but it is also the easiest to mitigate since the vulnerable components are few and centralized.
|
|
Mitigations include the installation of fail-safes as well as a defense in depth approach to hardening the grid's
|
|
cyber-infrastructure.
|
|
|
|
Another possible action for an attacker would be to forge energy measurements in an attempt to cause financial mayhem.
|
|
Both individual consumers as well as the utility could be targeted by such an attack. While such an attack might have
|
|
localized success, larger-scale discrepancies will likely quickly be caught by monitoring systems. For example, if a
|
|
large number of meters in an area systematically under- or over-reported their energy readings, meter readings across
|
|
the affected area would no longer add up with those of monitoring devices in other locations in the transmission and
|
|
distribution grid.
|
|
|
|
In some countries, smart meter functionality goes beyond mere monitoring devices and also includes remotely controlled
|
|
switches. There are two types of these switches: Switches to support \emph{Demand-Side Management} (DMS) and cut
|
|
off-switches that are used to punish defaulting customers. Demand Side Management is when a grid operator can remotely
|
|
control the timing of large, non-time-critical loads on the customer's premises\cite{dzung01}. A typical example of this
|
|
is a customer using an electric water heater: The heater is outfitted with a large hot water storage tank and is
|
|
connected hooked up to the utility's DSM system. The customer does not care when exactly their water is heated as long
|
|
as there is enough of it, and the utitliy offers them cheaper rates for the electricity used for heating in exchange for
|
|
control over its precise timing. The utility uses this control to even out peaks in the consumption/production
|
|
imbalance, remotely enabling DSM systems during off-peak times and disabling them during peak hours. In contrast to
|
|
DSM, cut-off switches are switches placed in-between the grid and the entire customer's household such that the utility
|
|
can disconnect non-paying customers without incurring the expense of sending a technician to the customer's premises.
|
|
Unlike DSM systems, cut-off switches are not opt-in\cite{anderson01,temple01}. An attack that uses cut-off switches
|
|
would obviously immediately cause severe mayhem. Attacks on DSM may have more limited immediate impact as affected
|
|
consumers may not notice an interruption for several hours.
|
|
|
|
Instead of switching off loads outright, an attack employing DSM switches (and potentially also cut-off switches) could
|
|
choose to target the grid's stability. By synchronizing many compromised smart meters to switch on and off a large
|
|
amount of load capacity, an attacker might cause the entire electrical grid to oscillate\cite{kosut01,wu01,kim01}. As a
|
|
large system of coupled mechanical systems, the electrical grid exhibits a complex frequency-domain behavior. These
|
|
resonance effects, colloquially called ``modes'', are well-studied in power system
|
|
engineering\cite{rogers01,grebe01,entsoe01,crastan03}. As they can cause issues even under normal operating conditions,
|
|
a large effort is invested in dampening these resonances. Howewer, fully eliminating them under changing load conditions
|
|
may not be achievable.
|
|
|
|
\subsection{Communication Channels on the Grid}
|
|
|
|
A core part of intervening with any such cyberattack is the ability to communicate remediary actions to the devices
|
|
under attack. There is a number of well-established technologies for communication on or along power lines. We can
|
|
distinguish three basic system categories: Systems using separate wires (such as DSL over landline telephone wiring),
|
|
wireless radio systems (such as LTE) and \emph{Power Line Communication} (PLC) systems that reuse the existing mains
|
|
wiring and superimpose data transmissions onto the 50 Hz mains sine\cite{gungor01,kabalci01}.
|
|
|
|
During a large-scale cyberattack, availability of internet and cellular connectivity cannot be relied upon. An attacker
|
|
may already have disabled such systems in a separate attack, or they may go down along with parts of the electrical
|
|
grid. Traditional powerline communication systems or an utitly's proprietary wireless systems would work, but at a range
|
|
of no more than several tens of kilometers reaching all meters in a country would require a large upfront infrastructure
|
|
investment.
|
|
|
|
\section{Grid Frequency as a Communication Channel}
|
|
|
|
We propose to approach the problem of broadcasting an emergency signal to all smart meters within a synchronous area by
|
|
using grid frequency as a communication channel. Despite the awesome complexity of large power grids, the physics
|
|
underlying their response to changes in load and generation is surprisingly simple. Individual machines (loads and
|
|
generators) can be approximated by a small number of differential equations and the entire grid can be modelled by
|
|
aggregating these approximations into a large system of nonu differential equations. As a consequence, small signal
|
|
changes in generation/consumption power balance cause an approximately proportional change in
|
|
frequency\cite{kundur01,crastan03,entsoe02,entsoe04}. This \emph{Power Frequency Charactersistic} is about
|
|
\SI{25}{\giga\watt\per\hertz} for the continental European synchronous area according to European electricity grid
|
|
authority ENTSO-E.
|
|
|
|
If we modulate the power consumption of a large load such as a multi-megawatt aluminium smelter, this modulation will
|
|
result in a small change in frequency according to this characteristic. So long as we stay within the operational limits
|
|
set by ENTSO-E\cite{entsoe02,entsoe03}, this change will not degrade the operation of other parts of the grid. The
|
|
advantages of grid frequency modulation are the fact that a single transmitter can cover an entire synchronous area as
|
|
well as low receiver hardware complexity.
|
|
|
|
To the best of the authors' knowledge, grid frequency modulation has only ever been proposed as a communication channel
|
|
at very small scales in microgrids before\cite{urtasun01} and has not yet been considered for large-scale application.
|
|
|
|
\subsection{Characterizing Grid Frequency}
|
|
|
|
In utility SCADA systems, Phasor Measurement Units (PMUs, also called \emph{synchrophasors}) are used to precisely
|
|
measure grid frequency among other parameters. This task is much more complicated in practice than it might appear at
|
|
first glance since a PMU has to make extremely precise measurements, track fast changes in frequency and handle even
|
|
distorted input signals. Detail on the inner workings of commercial phasor measurement units is scarce but there is a
|
|
large amount of academic research on sophisticated phasor measurement
|
|
algorithms\cite{narduzzi01,derviskadic01,belega01}.
|
|
|
|
Since we do not need reference standard-grade accuracy for our application we chose to start with a very basic algorithm
|
|
based on short-time fourier transform (STFT). Our system uses the universal frequency estimation approach of
|
|
experimental physicists Gasior and Gonzalez at CERN\cite{gasior01}. The Gasior and Gonzalez algorithm\cite{gasior01}
|
|
passes the windowed input signal through a DFT, then interpolates the signal's fundamental frequency by fitting a
|
|
wavelet such as a Gaussian to the largest peak in the DFT results. The bias parameter of this curve fit is an accurate
|
|
estimation of the signal's fundamental frequency. This algorithm is similar to the simpler interpolated DFT algorithm
|
|
used as a reference in much of the phasor measurement literature\cite{borkowski01}.
|
|
|
|
To collect ground truth measurements for our analysis of grid frequency as a communication channel, we developed a device
|
|
to safely record real mains voltage waveforms. Our system consists of an \texttt{STM32F030F4P6} ARM Cortex M0
|
|
microcontroller that records mains voltage using its internal 12-bit ADC and transmits measured values through a
|
|
galvanically isolated USB/serial bridge to a host computer. We derive our system's sampling clock from a crystal oven to
|
|
avoid frequency measurement noise due to thermal drift of a regular crystal: \SI{1}{ppm} of crystal drift would cause a
|
|
grid frequency error of $\SI{50}{\micro\hertz}$. We validated the performance of our crystal oven solution by
|
|
benchmarking it against a GPS 1pps reference.
|
|
|
|
% FIXME measurement results, spectra
|
|
|
|
\section{Grid Frequency Modulation}
|
|
|
|
Given the grid characteristics we measured using our custom waveform recorder and a model of our transmitter, we can
|
|
derive parameters for the modulation of our broadcast system. In its most basic form a transmitter for grid frequency
|
|
modulation would be a very large controllable load connected to the power grid at a suitable vantage point. A spool of
|
|
wire submerged in a body of cooling liquid such as a small lake along with a thyristor rectifier bank would likely
|
|
suffice to perform this function during occasional cybersecurity incidents. We can however decrease hardware and
|
|
maintenance investment even compared to this rather uncultivated solution by repurposing large industrial loads
|
|
as transmitters. Going through a list of energy-intensive industries in Europe\cite{ec01}, we found that an aluminium
|
|
smelter would be a good candidate. In aluminium smelting, aluminium is electrolytically extracted from alumina solution.
|
|
High-voltage mains power is transformed, rectified and fed into about 100 series-connected electrolytic cells forming a
|
|
\emph{potline}. Inside these pots alumina is dissolved in molten cryolite electrolyte at about \SI{1000}{\degreeCelsius}
|
|
and electrolysis is performed using a current of tens or hundreds of Kiloampère. The resulting pure aluminium settles at
|
|
the bottom of the cell and is tapped off for further processing.
|
|
|
|
Aluminium smelters are operated around the clock, and due to the high financial stakes their behavior under power
|
|
outages has been carefully characterized by the industry. Power outages of tens of minutes up to two hours reportedly do
|
|
not cause problems in aluminium potlines\cite{eisma01,oye01}. Recently, even techniques for intentional power modulation
|
|
without affecting cell lifetime or product quality have been devloped to take advantage of variable energy
|
|
prices.\cite{duessel01,eisma01}. An aluminium plant's power supply is controlled to constantly keep all smelter cells
|
|
under optimal operating conditions. Modern power supply systems employ large banks of diodes or SCRs to rectify
|
|
low-voltage AC to DC to be fed into the potline\cite{ayoub01}. Potline voltage is controlled through a combination of a
|
|
tap changer and a transductor. Individual cell voltages are controlled by changing the physical distance between anode
|
|
and cathode distance. In this setup, power can be modulated fully electronically. Since this system does not have any
|
|
mechanical inertia, high modulation rates can reasonably be achieved.
|
|
|
|
\subsection{Parametrizing Modulation for GFM}
|
|
|
|
Modulating $\SI{25}{\mega\watt}$ of smelter power would yield a frequency shift of $\SI{1}{\milli\hertz}$. At an RMS
|
|
frequency noise of around $\SI{10}{\milli\hertz}$ in the band around $\SI{1}{\hertz}$, this results in challenging SNR.
|
|
% FIXME properly calculate frequency noise density, SNR
|
|
Under such conditions, the obvious choice for modulation are spread-spectrum techniques. Thus, we approached the setting
|
|
using Direct Sequence Spread Spectrum for its simple implementation and good overall performance. DSSS chip timing
|
|
should be as fast as the transmitter's physics allow to exploit the low-noise region between
|
|
$\SI{0.2}{\hertz}$ to $\SI{2.0}{\hertz}$ in the frequency noise spectrum while avoiding any of the grid's oscillation modes. Going
|
|
past $\approx\SI{2}{\hertz}$ would put strain on the receiver's frequency measurement subsystem\cite{belega01}. Using a
|
|
spread-spectrum technique allows us to reduce the effect of interference by spurious tones. In addition, spreading our
|
|
signal's energy over frequency also reduces the likelihood that we cause the grid to oscillate along any of its modes.
|
|
|
|
To test our proposed approach, we wrote a proof-of-concept modulator and demodulator in Python and tested this
|
|
proof-of-concept prototype with data captured from our grid frequency sensor. Our simulations covered a range of
|
|
parameters in modulation amplitude, DSSS sequence bit depth, chip duration and detection threshold.
|
|
Figure~\ref{fig_ser_nbits} shows symbol error rate (SER) as a function of modulation amplitude with Gold sequences of
|
|
several bit depths. As can be seen, realistic modulation amplitudes are in the range around $\SI{1}{\milli\hertz}$. In
|
|
the continental European synchronous area, this corresponds to a modulation power of approximately
|
|
$\SI{25}{\mega\watt}$. Figure~\ref{fig_ser_thf} shows SER against detection threshold relative to background noise.
|
|
Figure~\ref{fig_ser_chip} shows SER against chip duration for a given fixed symbol length. As expected from looking at
|
|
our measured grid frequency noise spectrum, performance is best for short chip durations and worsens for longer chip
|
|
durations since shorter chip durations move our signals' bandwidth into the lower-noise region from $\SI{0.2}{\hertz}$
|
|
to $\SI{2}{\hertz}$.
|
|
%FIXME introduce term "chip" somewhere
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.6\textwidth]{../notebooks/fig_out/dsss_gold_nbits_overview}
|
|
\caption{Symbol Error Rate as a function of modulation amplitude for Gold sequences of several lengths.}
|
|
\label{fig_ser_nbits}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../notebooks/fig_out/dsss_thf_amplitude_5678}
|
|
\caption{SER vs.\ Amplitude and detection threshold. Detection threshold is set as a factor of background noise
|
|
level.}
|
|
\label{fig_ser_thf}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../notebooks/fig_out/chip_duration_sensitivity_6}
|
|
\vspace*{-1cm}
|
|
\caption{SER vs.\ DSSS chip duration.}
|
|
\label{fig_ser_chip}
|
|
\end{figure}
|
|
|
|
\subsection{Parametrizing a proof-of-concept "Safety Reset" System Based on GFM}
|
|
|
|
Taking these modulation parameters as a starting point, we proceeded to create a proof-of-concept smart meter emergency
|
|
reset system. On top of the modulation described in the previous paragraphs we layered simple Reed-Solomon error
|
|
correction\cite{mackay01} and some cryptography. The goal of our PoC cryptographic implementation was to allow the
|
|
sender of an emergency reset broadcast to authorize a reset command to all listening smart meters. An additional
|
|
constraint of our setting is that due to the extremely slow communication channel all messages should be kept as short
|
|
as possible. The solution we chose for our PoC is a simplistic hash chain using the approach from the Lamport and
|
|
Winternitz One-time Signature (OTS) schemes. Informally, the private key is a random bitstring. The public key is
|
|
generated by recursively applying a hash function to this key a number of times. Each smart meter reset command is then
|
|
authorized by disclosing subsequent elements of this series. Unwinding the hash chain from the public key at the end of
|
|
the chain towards the private key at its beginning, at each step a receiver can validate the current command by checking
|
|
that it corresponds to the previously unknown input of the current step of the hash chain. Replay attacks are prevented
|
|
by recording the most recent valid command. This simple scheme does not afford much functionality but it results in very
|
|
short messages and removes the need for computationally public key cryptography inside the smart meter.
|
|
% FIXME add more precise/formal description of crypto
|
|
% FIXME add description of targeting/scope function?
|
|
% FIXME somewhere above descirbe entire reset system architecture????!!!
|
|
% FIXME add description of disarm message (replay protection)
|
|
|
|
\subsection{Experimental results}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.6\textwidth]{prototype.jpg}
|
|
\caption{The completed prototype setup. The board on the left is the safety reset microcontroller. It is connected
|
|
to the smart meter in the middle through an adapter board. The top left contains a USB hub with debug interfaces to
|
|
the reset microcontroller. The cables on the bottom left are the debug USB cable and the \SI{3.5}{\milli\meter}
|
|
audio cable for the simulated mains voltage input.}
|
|
\label{fig_proto_pic}
|
|
\end{figure}
|
|
|
|
For a realistic proof of concept, we decided to implement our signal processing chain from DSSS demodulator through
|
|
error correction up to our simple cryptography layer in microcontroller firmware and demonstrate this firmware on actual
|
|
smart meter hardware, shown in Figure~\ref{fig_proto_pic}. In our proof of concept a safety reset controller is
|
|
connected to the main application microcontroller of a smart meter. The reset controller is tasked with listening for
|
|
authenticated reset commands on the voltage waveform, and on reception of such a command resetting the smart meter
|
|
application controller by flashing a known-good firmware image to its memory.
|
|
|
|
The signal processing chain of our PoC is shown in Figure~\ref{fig_demo_sig_schema}. To interoperate with existing
|
|
implementations of SHA-512 and reed-solomon decoding, this implementation was written in the C programming language. To
|
|
demonstrate an application close to a field implementation, we chose an Easymeter \texttt{Q3DA1002} smart meter as our
|
|
reset target. This model is popular in the German market and readily available second-hand. The meter consists of three
|
|
isolated metering ASICs connected to a data logging and display PCB through infrared optical links. To demonstrate the
|
|
safety reset's firmware reset functionality, we connected our safety reset microcontroller to the Texas Instruments
|
|
\texttt{MSP430} microcontroller on the meter's display and data logging board through the JTAG debug interface that the
|
|
board's vendor had conveniently left accessible. We ported part of
|
|
\texttt{mspdebug}\footnote{\url{https://dlbeer.co.nz/mspdebug/}} to drive the meter microcontroller's JTAG interface and
|
|
wrote a piece of demonstrator code that overwrites the meter's firmware with one that displays an identifying string on
|
|
the meter's display after boot-up.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{prototype_schema}
|
|
\caption{The signal processing chain of our demonstrator.}
|
|
\label{fig_demo_sig_schema}
|
|
\end{figure}
|
|
|
|
Since we did not have an aluminium smelter ready, we decided to feed our proof-of-concept reset controller with an
|
|
emulated grid voltage sine wave from a computer's headphone jack. Where in a real application this microcontroller might
|
|
take ADC readings of input mains voltage divided down by a long resistive divider chain, we instead feed the ADC from a
|
|
$\SI{3.5}{\milli\meter}$ audio input. For operational safety, we disconnected the meter microcontroller from its
|
|
grid-referenced capacitive dropper power supply and connected it to our reset controlller's debug USB power supply.
|
|
|
|
We performed several successful experiments using a signature truncated at 120 bit and a 5 bit DSSS sequence. Taking the
|
|
sign bit into account, the length of the encoded signature is 20 DSSS symbols. On top of this we used Reed-Solomon error
|
|
correction at a 2:1 ratio inflating total message length to 30 DSSS symbols. At the \SI{1}{\second} chip rate we used in
|
|
other simulations as well this equates to an overall transmission duration of approximately \SI{15}{\minute}. To give
|
|
the demodulator some time to settle and to produce more realistic conditions of signal reception we padded the modulated
|
|
signal unmodulated noise on both ends.
|
|
|
|
\section{Discussion}
|
|
|
|
For our proof of concept, before settling on the commercial smart meter we first tried to use an \texttt{EVM430-F6779}
|
|
smart meter evaluation kit made by Texas Instruments. This evaluation kit did not turn out well for two main reasons.
|
|
One, it shipped with half the case missing and no cover for the terminal blocks. Because of this some work was required
|
|
to get it electrically safe. Even after mounting it in an electrically safe manner the safety reset controller
|
|
prototype would also have to be galvanically isolated to not pose an electrical safety risk since the main MCU is not
|
|
isolated from the grid and the JTAG port is also galvanically coupled. The second issue we ran into was that the
|
|
development board is based around a specific microcontroller from TI's \texttt{MSP430} series that is incompatible with
|
|
common JTAG programmers.
|
|
|
|
Our initial assumption that a development kit would be easier to program than a commercial meter did not prove to be
|
|
true. Contrary to our expectations the commercial meter had JTAG enabled allowing us to easily read out its stock
|
|
firmware without either reverse-engineering vendor firmware update files nor circumventing code protection measures.
|
|
The fact that its firmware was only available in its compiled binary form was not much of a hindrance as it proved not
|
|
to be too complex and all we wanted to know we found out with just a few hours of digging in
|
|
Ghidra\footnote{\url{https://ghidra-sre.org/}}.
|
|
|
|
In the firmware development phase our approach of testing every module individually (e.g. DSSS demodulator, Reed-Solomon
|
|
decoder, grid frequency estimation) proved to be very useful. In particular debugging benefited greatly from being able
|
|
to run several thousand tests within seconds. In case of our DSSS demodulator, this modular testing and simulation
|
|
architecture allowed us to simulate thousands of runs of our implementation on test data and directly compare it to our
|
|
Jupyter/Python prototype. Since we spent more time polishing our embedded C implementation it turned out to perform
|
|
better than our Python prototype while still exhibiting the same fundamental response to changes to its parameters. One
|
|
significant bug we fixed in the embedded C version was the Python version's tendency towards incorrect decodings at even
|
|
very large amplitudes.
|
|
|
|
In accordance with our initial estimations we did not run into any code space nor computation bottlenecks for chosing
|
|
floating point emulation instead of porting over our algorithms to fixed point calculations. The extremely slow sampling
|
|
rate of our systems makes even heavyweight processing such as FFT or our brute force dynamic programming approach to
|
|
DSSS demodulation possible well within our performance constraints.
|
|
|
|
Since we are only building a prototype we did not optimize firmware code size. At around \SI{64}{\kilo\byte}, the
|
|
compiled code size of our firmware implementation is slightly larger than we would like. The overall most heavy-weight
|
|
operations are the SHA512 implementation from libsodium and the FFT from ARM's CMSIS signal processing library.
|
|
Especially the SHA512 implementation has large potential for size optimization because it is highly optimized for speed
|
|
using extensive manual loop unrolling. Despite being larger than what we initially targeted, this firmware is still
|
|
small compared to the firmware space available in commercially deployed smart meters. We estimate that even without
|
|
additional optimizations, our PoC firmware is already within the realm of firmware size that could be implemented in a
|
|
commercially viable safety reset controller.
|
|
|
|
\section{Conclusion}
|
|
\label{sec_conclusion}
|
|
|
|
In this paper we have developed an end-to-end design of a reset system to restore smart meters to a safe operating state
|
|
during an ongoing large-scale cyberattack. To allow our system to be triggered even in the middle of a cyberattack we
|
|
have developed a broadcast data transmission system based on intentional modulation of global grid frequency. We have
|
|
shown the viability of our end-to-end design through simulations. To put these simulations on a solid foundation we have
|
|
developed a grid frequency measurement methodology comprising of a custom-designed hardware device for electrically safe
|
|
data capture and a set of software tools to archive and process captured data. Our simulations show good behavior of our
|
|
broadcast communication system and give an indication that cooperating with a large consumer such as an aluminium smelter
|
|
would be a feasible way to set up a transmitter with low hardware overhead. We have outlined a simple cryptographic
|
|
protocol ready for embedded implementation in resource-constrained systems that allows triggering a safety reset with a
|
|
response time of less than 30 minutes. We have experimentally validated our system using simulated grid frequency data
|
|
in a demonstrator setup based on a commercial microcontroller as our safety reset controller and an off-the-shelf smart
|
|
meter. Source code and electronics CAD designs are available at the public repository listed at the end of this
|
|
document.
|
|
|
|
\printbibliography[heading=bibintoc]
|
|
|
|
%%% FIXME remove appendix and work into text.
|
|
|
|
\center{
|
|
\center{This is version \texttt{\input{version.tex}\unskip} of this paper, generated on \today. The git repository
|
|
can be found at:}
|
|
|
|
\center{\url{https://git.jaseg.de/safety-reset.git}}
|
|
}
|
|
\end{document}
|