652 lines
49 KiB
TeX
652 lines
49 KiB
TeX
\documentclass[letterpaper,twocolumn,10pt]{article}
|
|
\usepackage{usenix}
|
|
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage[
|
|
backend=biber,
|
|
style=numeric,
|
|
natbib=true,
|
|
url=false,
|
|
doi=true,
|
|
eprint=false
|
|
]{biblatex}
|
|
\addbibresource{safety-reset.bib}
|
|
\usepackage{amssymb,amsmath}
|
|
\usepackage{eurosym}
|
|
\usepackage{wasysym}
|
|
|
|
\usepackage[binary-units]{siunitx}
|
|
\DeclareSIUnit{\baud}{Bd}
|
|
\DeclareSIUnit{\year}{a}
|
|
\usepackage{commath}
|
|
\usepackage{graphicx,color}
|
|
\usepackage{subcaption}
|
|
\usepackage{array}
|
|
\usepackage{hyperref}
|
|
\usepackage{enumitem}
|
|
|
|
\renewcommand{\floatpagefraction}{.8}
|
|
\newcommand{\degree}{\ensuremath{^\circ}}
|
|
\newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}
|
|
\newcommand{\partnum}[1]{\texttt{#1}}
|
|
|
|
\begin{document}
|
|
|
|
% https://eepublicdownloads.entsoe.eu/clean-documents/pre2015/publications/entsoe/Operation_Handbook/Policy_1_Appendix%20_final.pdf
|
|
|
|
\date{}
|
|
\title{Ripples in the Pond: Transmitting Information through Grid Frequency Modulation}
|
|
\author{Jan Sebastian Götte \and Liran Katzir \and Björn Scheuermann}
|
|
%\institute{TU Darmstadt\\ Communication Networks Lab\\ \email{safetyreset@jaseg.de}
|
|
%\and Tel Aviv University\\ Faculty of Engineering\\ \email{lirankat@tau.ac.il}
|
|
%\and TU Darmstadt\\ Communication Networks Lab\\ \email{scheuermann@informatik.hu-berlin.de}}
|
|
\maketitle
|
|
%\keywords{Security, privacy and resilience in critical infrastructures \and Security and privacy in ``internet of
|
|
%things'' \and Cyber-physical systems \and Hardware security \and Network Security \and Energy systems \and Signal theory}
|
|
|
|
\begin{abstract}
|
|
Previous work has explored the scenario of an attacker compromising a large number of Smart Meters that are equipped
|
|
with remote disconnect switches, and using these remote-controllable switches to cause a large-scale outage.
|
|
Previous work focuses on attack prevention. In this paper, we will instead look at recovery after a successful
|
|
attack. To transmission system operators (TSOs), the major challenge after such a Smart Meter-triggered outage is
|
|
that the attacker will likely persist through the outage, and compromised Smart Meters will resume malicious
|
|
activity after their power is restored. In the event of such an attack, TSOs would need a way to remotely put these
|
|
compromised devices into a \emph{safe} mode of operation.
|
|
|
|
Given that public telecommunications networks including the internet, cellular networks, and LoRa base stations may
|
|
also be disrupted during a large-scale blackout, the challenging aspect of this remote \emph{Safety Reset} is the
|
|
communication channel between TSO and the smart meter. For this purpose, in this paper we propose a simple yet
|
|
effective communication channel based on modulating grid frequency by modulating the power of a connected load or
|
|
generator. Our proposed communciation channel (1) requires minimal infrastructure, (2) has a reach spanning the
|
|
entire power grid and (3) is fully independent of other telecommunication networks and functions even under severe
|
|
disruption of the grid.
|
|
\end{abstract}
|
|
|
|
\section{Introduction}
|
|
|
|
With the rollout of the smart grid, the IT security of electrical infrastructure has attracted increased attention in
|
|
the last years. Smart Grid security has two major components: The security of central SCADA systems, and the security
|
|
of equipment at the consumer premises such as smart meters and IoT devices. While there is previous work on both sides,
|
|
their interactions have not yet received much attention.
|
|
|
|
In this paper, we consider the previously proposed scenario where a large number of compromised consumer devices is used
|
|
alone or in conjunction with an attack on the grid's central SCADA systems to destabilize the grid by rapidly modulating
|
|
the total connected load. Previous work considered compromised smart meters with integrated remote disconnect switches
|
|
as likely candidates for such an attack, but the same attack can also be performed using compromised IoT devices. Such
|
|
attacks are hard to mitigate, and existing literature focuses on hardening device firmware to prevent compromise.
|
|
Despite the infeasibility of perfect firmware security, there is little research on \emph{post-compromise} mitigation
|
|
approaches. A core issue with post-attack mitigation is that the devices normal network connection may not work due to
|
|
the attack and as such an out-of-band communication channel is necessary.
|
|
|
|
We propose a \emph{safety reset} controller that is controlled through a novel, resilient, grid-wide powerline
|
|
communication technique. Our safety reset controller can be fitted into any Smart Meter or IoT device. Its purpose is to
|
|
await an out-of-band command to put the device into a safe state (e.g. \emph{relay on} or \emph{light on}) that
|
|
interrupts attacker control over the device. The safety reset controller is separated from the system's main application
|
|
controller and does not have any conventional network connections to reduce attack surface and cost.
|
|
|
|
We propose a resilient grid-wide broadcast channel based on modulating grid frequency. This channel can be operated by
|
|
transmission system operators (TSOs) even during black-start recovery procedures and in this situation bridges the gap
|
|
between the TSO's private network and the consumer devices. To demonstrate our proposed channel, we have implemented a
|
|
system that transmits error-corrected and cryptographically secured commands.
|
|
|
|
Our approach differs from traditional Powerline Communication (PLC) systems in that it reaches every device within one
|
|
synchronous area as the signal is embedded into the fundamental grid frequency. Traditional PLC uses a superimposed
|
|
voltage, which is quickly attenuated across long distances.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.4\textwidth]{flowchart}
|
|
\caption{Structural overview of our concept. 1 - Government authority or utility operations center. 2 - Emergency
|
|
radio link. 3 - Aluminium smelter. 4 - Electrical grid. 5 - Target smart meter.}
|
|
\label{fig_intro_flowchart}
|
|
\end{figure}
|
|
|
|
Figure~\ref{fig_intro_flowchart} shows an overview of our concept. Two scenarios for its application are before or
|
|
during a cyberattack, to stop an attack on the electrical grid in its tracks, and after an attack while power is being
|
|
restored to prevent a repeated attack. In both scenarios, our concept is fully independent of all public communication
|
|
networks (such as the Internet or mobile networks) as well as broadcast systems (such as cable television or terrestrial
|
|
broadcast radio). A grid frequency-based system can function as long as power is still available, or as soon as power is
|
|
restored after the attack. One powerful function this allows is ``flushing out`` an attacker from compromised smart
|
|
meters after an attack, before restoring smart meter internet connectivity.
|
|
|
|
Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load such as a large aluminium smelter,
|
|
load bank or photovoltaic farm would allow for the transmission of a crytographically secured \emph{reset} signal within
|
|
$15$ minutes. We have designed and constructed a proof-of-concept prototype receiver that demonstrates the feasibility
|
|
of decoding such signals on a resource-constrained microcontroller.
|
|
|
|
\subsection{Motivation}
|
|
|
|
Consumer devices are increasingly becoming \emph{smart}. Large numbers of IoT devices are connected through the public
|
|
internet, and in several countries internet-connected Smart Meters can disconnect entire households from the grid in
|
|
case of unpaid bills. The increasing proliferation of smart devices on the consumer side presents an opportunity to grid
|
|
operators, who rely on forecasts for the cost-optimized control of generation and power flow. The core of the
|
|
\emph{Smart Grid} vision is that utilities can now gather detailed data for more accurate consumption forecasts, and in
|
|
some cases can even adjust parameters of large devices like water heaters to smooth out load spikes.
|
|
|
|
However, this increased degree of visibility and control comes with an increased IT security risk. In this paper we
|
|
focus on scenarios where an attacker compromises a large number of grid-connected remote-controllable devices. This may
|
|
be simple smart home devices such as IoT light bulbs, but it may also include Smart Meters that are outfitted with a
|
|
remote disconnect switch as is common in some countries. By rapidly switching large numbers of such devices in a
|
|
coordinated manner, the attacker has the opportunity to de-stabilize the electrical grid. % FIXME citation
|
|
|
|
Previous work on IoT and Smart Grid security has focused on the prevention of attacks though firmware security measures.
|
|
While research on prevention is undoubtably important, we estimate that its practical impact will be limited by the vast
|
|
diversity of implementations found in the field combined with the slow update cycles inherent to non-functional firmware
|
|
enhancements for consumer devices. We predict that it would be a Sisyphean task to secure sufficiently many devices
|
|
to deny an attacker the critical mass needed to cause trouble. For this reason, in this paper we focus on recovery after
|
|
an attack.
|
|
|
|
\subsection{Black-start recovery}
|
|
|
|
The recovery from a large-scale power outage is a complex operational challenge. Large outages are caused by cascading
|
|
failures. Since all consumers and producers that are connected to the electrical grid are physically coupled through the
|
|
electromotive force, a fault in one part of the grid affects all devices connected across the grid. To function, the
|
|
grid relies on a delicate balance between electricity generation, transmission and consumption. When this balance is
|
|
disturbed, cascading failures can occur. A transmission line shutting off can lead other, nearby lines to overload and
|
|
shut off. Due to the electromechanical coupling of all machines connected to the grid, a generator or consumer suddenly
|
|
shutting off causes a transient in the grid's frequency. If the frequency goes too far out of bounds, protection devices
|
|
take power plants and large industrial loads offline.
|
|
|
|
The recovery from a large-scale outage requires the grid's operators to bring generators and loads back online one by
|
|
one while continuously maintaining balance between generation and consumption to avoid their protection devices shutting
|
|
them down again. To coordinate this process, transmission system operators cannot rely on the public internet or
|
|
cellular networks, as they may not work during a large-scale power outage. Instead, they maintain private communication
|
|
infrastructure using dedicated lines rented from telecommunciations providers, fibers run along transmission lines, and
|
|
dedicated radio links.
|
|
|
|
To start from a complete outage, first a number of \emph{black start}-capable power stations that can start by
|
|
themselves without any external power are brought online. With their help, other power stations and consumers are
|
|
gradually brought online until a part of the grid has been restored to nominal operation. This process can be performed
|
|
simultaneously in different parts of the grid. After these \emph{islands} have been restored, they can then be joined to
|
|
restore the grid to its normal state.
|
|
|
|
\subsection{Contents}
|
|
|
|
Starting from a high level architecture, we have carried out simulations of our concept's performance under real-world
|
|
conditions. Based on these simulations we implemented an end-to-end prototype of our proposed safety reset controller as
|
|
part of a realistic smart meter demonstrator. Finally, we experimentally validated our results and we will conclude with
|
|
an outline of further steps towards a practical implementation.
|
|
|
|
This work contains the following contributions:
|
|
\begin{enumerate}[topsep=4pt]
|
|
\item We introduce Grid Frequency Modulation (GFM) as a communication primitive. % FIXME done before in that one paper
|
|
\item We elaborate the fundamental physics underlying GFM and theorize on the constrains of a practical
|
|
implementation.
|
|
\item We design a communication system based on GFM.
|
|
\item We carry out extensive simulations of our systems to determine its performance characteristics.
|
|
\end{enumerate}
|
|
|
|
\subsection{Notation}
|
|
|
|
To a computer scientist there is one confusing aspect to the theory of grid frequency modulation. GFM can be seen as a
|
|
frequency modulation (FM) with a baseband signal in the band below approximately $f_m = \SI{5}{\hertz}$ that is
|
|
modulated on top of a carrier signal at $f_c = \SI{50}{\hertz}$ in case of the European electrical grid. The frequency
|
|
deviation $f_\Delta$ that the modulated carrier deviates from its nominal value of $f_m$ is very small at only a few
|
|
milli-Hertz.
|
|
|
|
When grid frequency is measured by first digitizing the mains voltage waveform, then de-modulating digitally, the FM's
|
|
SNR is very high and is dominated by the ADC's quantization noise and nearby mains voltage noise sources such as
|
|
resistive droop due to large inrush current of nearby machines.
|
|
|
|
Note that both the carrier signal at $f_c$ and the modulation signal at $f_m$ both have unit Hertz. To disambiguate
|
|
them, in this paper we will use \textbf{bold} letters to refer to the carrier waveform $\mathbf{U}$ or frequency
|
|
$\mathbf{f_c}$ as well as its deviation $\mathbf{f_\Delta}$, and we will use normal weight for the actual modulation
|
|
signal and its properties such as $f_m$.
|
|
|
|
\section{Related work}
|
|
\label{sec_related_work}
|
|
|
|
Previous work has analyzed Smart Grid security from numerous angles and made several suggestions towards its
|
|
improvement. Apart from the critical location that Smart Grid devices occupy, they are computer systems like many
|
|
others. Thus, for IT security purposes the Smart Grid is simply an aggregation of embedded control and measurement
|
|
devices that are part of a large control system. These devices share the same security concerns that apply to embedded
|
|
systems in general.
|
|
|
|
\subsection{Smart Meter Security}
|
|
|
|
Where programmers have been struggling for decades now with issues such as input validation~\cite{leveson01}, the same
|
|
potential issue raises security concerns in smart grid scenarios as well~\cite{mo01, lee01}. Only, in smart grid we
|
|
have two complicating factors present: many components are embedded systems, and as such inherently hard to update.
|
|
Also, the smart grid and its control algorithms act as a large (partially) distributed system making problems such as
|
|
input validation or authentication harder~\cite{blaze01} and adding a host of distributed systems problems on
|
|
top~\cite{lamport01}.
|
|
|
|
Given that the electrical grid is essential infrastructure, these issues are significant. Attacks on the electrical grid
|
|
may have grave consequences~\cite{anderson01,lee01} while the long replacement cycles of various components make the
|
|
system slow to adapt. Thus, components for the smart grid need to be built to a higher standard of security than e.g.\
|
|
IoT devices to live up to well-funded attackers decades down the road. Another implication of their long service life
|
|
is that their agility w.r.t.\ post-hoc mitigations through firmware updates is limited.
|
|
|
|
%Another fundamental challenge in smart grid implementations is the central role of smart electricity meters in the
|
|
%smart grid ecosystem. Smart meters are used both for highly-granular load measurement and in some countries also for
|
|
%load switching~\cite{zheng01}.
|
|
Smart electricity meters are effectively consumer devices built down to a certain price point. The small market served
|
|
by a single smart meter implementation limits how much effort a vendor can spend on firmware security. Landis+Gyr, a
|
|
large manufacturer that makes most of its revenue from utility meters state in their 2019 annual report that they
|
|
invested \SI{36}{\percent} of their total R\&D budget on embedded software while spending only \SI{24}{\percent} on
|
|
hardware R\&D~\cite{landisgyr01,landisgyr02}, indicating significant tension between firmware security and the vendor's
|
|
bottom line.
|
|
|
|
% FIXME more sources!
|
|
|
|
\subsection{The state of the art in embedded security}
|
|
|
|
Embedded software security generally is much harder than security of higher-level systems. The primary two factors
|
|
affecting this are that on one hand, embedded devices usually run highly customized firmware that (often by necessity)
|
|
is rarely updated. On the other hand, embedded devices often lack advanced security mechanisms such as memory management
|
|
units that are found in most higher-power devices. Even well-funded companies continue to have trouble securing their
|
|
embedded systems. A spectacular example of this difficulty is the 2019 flaw in Apple's iPhone SoC first-stage ROM
|
|
bootloader that allows for the full compromise of any iPhone before the iPhone X given physical access to the
|
|
device~\cite{heise01}. iPhone 8, one of the affected models, was still being manufactured and sold by Apple until April
|
|
2020. In another instance in 2016, researchers found multiple flaws in Samsung's implementation of ARM TrustZone
|
|
``secure world'' firmware that Samsung used for their own mobile phone SoCs. The flaws they found were both severe
|
|
architectural flaws such as secret user input being passed through untrusted userspace processes without any protection
|
|
as well as shocking cryptographic flaws such as
|
|
CVE-2016-1919\footnote{\url{http://cve.circl.lu/cve/CVE-2016-1919}}~\cite{kanonov01}. And Samsung is not the only large
|
|
multinational corporation having trouble securing their secure firmware implementation. In 2014 researchers found an
|
|
embarrassing integer overflow flaw in the low-level code handling untrusted input in Qualcomm's QSEE
|
|
firmware~\cite{rosenberg01}. For an overview of ARM TrustZone including a survey of academic work and past security
|
|
vulnerabilities of TrustZone-based firmware see~\cite{pinto01}.
|
|
|
|
If even companies with R\&D budgets that rival some countries' national budgets at mass-market consumer devices
|
|
have trouble securing their mass market secure embedded software stacks, what is a much smaller smart meter manufacturer
|
|
to do? Especially if national standards mandate complex protocols such as TLS that are tricky to implement
|
|
correctly~\cite{georgiev01}, this manufacturer will be short on options to secure their product.
|
|
|
|
\subsection{Attack surface in the smart grid}
|
|
|
|
From the incidents we outlined in the previous paragraphs we conclude that in smart metering technology, market
|
|
incentives do not currently provide the conditions for a level of device security that will reliably last the coming
|
|
decades. Considering this tension, in this paragraph we examine the cyberphysical risks that arise from attacks on the
|
|
smart grid in the first place. These risks arise at three different infrastructure levels.
|
|
|
|
The first level is that of attacks on centralized control systems. This type of attack is often cited in popular
|
|
discourse and to our knowledge is the only type of attack against an electric grid that has ever been carried out in
|
|
practice at scale~\cite{lee01}. Despite their severity, these attacks do not pose a strictly \emph{scientific} challenge
|
|
since they are generic to any industrial control system. Their causes and countermeasures are generally well-understood
|
|
and the hardest challenge in their prevention is likely to be budgetary constraints.
|
|
|
|
Beyond the centralized control systems, the next target for an attacker may be the communication links between those
|
|
control systems and other smart grid components. While in some countries such as Italy special-purpose systems such as
|
|
PLC are common~\cite{ec03}, overall, IP-based technologies have proliferated according to the larger trend towards
|
|
IP-based communications. This proliferation of IP links brings along the possibility for the application of generic
|
|
network security measures from the IP world to the smart grid domain. In this way, a standardized, IP-based protocol
|
|
stack unlocks decades of network security improvements at little cost.
|
|
|
|
Beyond these layers towards the core of the smart grid's control infrastructure, an attacker might also corrupt the
|
|
network from the edges and target the endpoint devices itself. The large scale deployment of networked smart meters
|
|
creates an environment that is favorable to such attacks.
|
|
% FIXME cite RECESSIM landis+gyr protocol hacking wiki/youtube
|
|
|
|
\subsection{Cyberphysical threats in the smart grid}
|
|
|
|
Assuming that an attacker has compromised devices on any of these levels of smart grid infrastructure, what could they
|
|
do with their newly gained power? The obvious action would be to switch off everything. Of all scenarios,
|
|
this is both the most likely in practice---it is exactly what happened in the Russian cyberattacks on the Ukranian
|
|
grid~\cite{lee01}---but it is also the easiest to mitigate since the vulnerable components are few and centralized.
|
|
Mitigations include the installation of fail safes as well as a defense in depth approach to hardening the grid's
|
|
cyber infrastructure.
|
|
|
|
Another possible action for an attacker would be to forge energy measurements in an attempt to cause financial mayhem.
|
|
Both individual consumers as well as the utility could be targeted by such an attack. While such an attack might have
|
|
localized success, larger-scale discrepancies will likely quickly be caught by monitoring systems. For example, if a
|
|
large number of meters in an area systematically under- or over-reported their energy readings, meter readings across
|
|
the affected area would no longer add up with those of monitoring devices in other locations in the transmission and
|
|
distribution grid.
|
|
|
|
In some countries, smart meter functionality goes beyond mere monitoring devices and also includes remotely controlled
|
|
switches. There are two types of these switches: Switches to support \emph{Demand-Side Management} (DMS) and cut
|
|
off-switches that are used to punish defaulting customers. Demand Side Management is when a grid operator can remotely
|
|
control the timing of large, non-time-critical loads on the customer's premises~\cite{dzung01}. A typical example of this
|
|
is a customer using an electric water heater: The heater is outfitted with a large hot water storage tank and is
|
|
connected hooked up to the utility's DSM system. The customer does not care when exactly their water is heated as long
|
|
as there is enough of it, and the utility offers them cheaper rates for the electricity used for heating in exchange for
|
|
control over its precise timing. The utility uses this control to even out peaks in the consumption/production
|
|
imbalance, remotely enabling DSM systems during off-peak times and disabling them during peak hours. In contrast to
|
|
DSM, cut-off switches are switches placed in between the grid and the entire customer's household such that the utility
|
|
can disconnect non-paying customers without incurring the expense of sending a technician to the customer's premises.
|
|
Unlike DSM systems, cut-off switches are not opt-in~\cite{anderson01,temple01}. An attack that uses cut-off switches
|
|
would obviously immediately cause severe mayhem. Attacks on DSM may have more limited immediate impact as affected
|
|
consumers may not notice an interruption for several hours.
|
|
|
|
Instead of switching off loads outright, an attack employing DSM switches (and potentially also cut-off switches) could
|
|
choose to target the grid's stability. By synchronizing many compromised smart meters to switch on and off a large
|
|
load capacity, an attacker might cause the entire electrical grid to oscillate~\cite{kosut01,wu01,kim01}. As a large
|
|
system of coupled mechanical systems, the electrical grid exhibits a complex frequency-domain behavior. Resonance
|
|
effects, colloquially called ``modes'', are well-studied in power system
|
|
engineering~\cite{rogers01,grebe01,entsoe01,crastan03}. As they can cause issues even under normal operating conditions,
|
|
a large effort is invested in dampening these resonances. Howewer, fully eliminating them under changing load conditions
|
|
may not be achievable.
|
|
|
|
\subsection{Communication Channels on the Grid}
|
|
|
|
A core part of intervening with any such cyberattack is the ability to communicate remediary actions to the devices
|
|
under attack. There is a number of well-established technologies for communication on or along power lines. We can
|
|
distinguish three basic system categories: systems using separate wires (such as DSL over landline telephone wiring),
|
|
wireless radio systems (such as LTE) and \emph{Power Line Communication} (PLC) systems that reuse the existing mains
|
|
wiring and superimpose data transmissions onto the 50 Hz mains sine~\cite{gungor01,kabalci01}.
|
|
|
|
During a large-scale cyberattack, availability of internet and cellular connectivity cannot be relied upon. An attacker
|
|
may already have disabled such systems in a separate attack, or they may go down along with parts of the electrical
|
|
grid. Traditional powerline communication systems or an utitly's proprietary wireless systems would work, but at a range
|
|
of no more than several tens of kilometers reaching all meters in a country would require a large upfront infrastructure
|
|
investment.
|
|
|
|
\section{Grid Frequency as a Communication Channel}
|
|
|
|
We propose to approach the problem of broadcasting an emergency signal to all smart meters within a synchronous area by
|
|
using grid frequency as a communication channel. Despite the technological complexity of the grid, the physics
|
|
underlying its response to changes in load and generation is surprisingly simple. Individual machines (loads and
|
|
generators) can be approximated by a small number of differential equations and the entire grid can be modelled by
|
|
aggregating these approximations into a large system of nonlinear differential equations. As a consequence, small signal
|
|
changes in generation/consumption power balance cause an approximately proportional change in
|
|
frequency~\cite{kundur01,crastan03,entsoe02,entsoe04}. This \emph{Power Frequency Charactersistic} is about
|
|
\SI{25}{\giga\watt\per\hertz} for the continental European synchronous area according to European electricity grid
|
|
authority ENTSO-E.
|
|
|
|
If we modulate the power consumption of a large load such as a multi-megawatt aluminium smelter, this modulation will
|
|
result in a small change in frequency according to this characteristic. As long as we stay within the operational limits
|
|
set by ENTSO-E~\cite{entsoe02,entsoe03}, this change will not degrade the operation of other parts of the grid. The
|
|
advantages of grid frequency modulation are the fact that a single transmitter can cover an entire synchronous area as
|
|
well as low receiver hardware complexity.
|
|
|
|
To the best of the authors' knowledge, grid frequency modulation has only ever been proposed as a communication channel
|
|
at very small scales in microgrids before~\cite{urtasun01} and has not yet been considered for large-scale application.
|
|
|
|
Compared to traditional channels such as DSL, LTE or LoraWAN, grid frequency as a communication channel has a large
|
|
resiliency advantage: If there is power, a grid frequency modulation system is operational. Both DSL and LTE systems not
|
|
only require power but also require large amounts of centralized infrastructure to operate. Mesh networks such as
|
|
LoraWAN can cover short distances up to $\SI{20}{\kilo\meter}$ without requiring infrastructure to be available, but for
|
|
longer distances LoraWAN relies on the public internet for its network backbone. Additionally, systems such as DSL, LTE
|
|
and LoraWAN are built around a point-to-point communication model and usually do not support a generic broadcast
|
|
primitive. During times when a large number of devices must be reached simultaneously this can lead to congestion of
|
|
local cellular towers or gateways.
|
|
Therefore, during an ongoing cyberattack, grid frequency is promising as a communication channel as only a single
|
|
transmitter facility must be operational for it to function, and this single transmitter can reach all connected devices
|
|
simultaneously. After a power outage, it can function as soon as electrical power is restored, even while the public
|
|
internet and mobile networks are still offline and it is unaffected by cyberattacks that target telecommunication
|
|
networks.
|
|
|
|
\subsection{Characterizing Grid Frequency}
|
|
\label{grid-freq-characterization}
|
|
|
|
In utility SCADA systems, Phasor Measurement Units (PMUs, also called \emph{synchrophasors}) are used to precisely
|
|
measure grid frequency among other parameters. This task is a complicated task since a PMU has to make fast and precise
|
|
measurements given a distorted input signal. Details on the inner workings of commercial phasor measurement units are
|
|
scarce but there is a large amount of academic research on measurement
|
|
algorithms~\cite{narduzzi01,derviskadic01,belega01}.
|
|
|
|
In our application, we do not need the same level of precision. For the sake of simplicity, we use the universal
|
|
frequency estimation approach of Gasior and Gonzalez~\cite{gasior01}. In this algorithm, the windowed input signal is
|
|
processed using a Discrete Fourier Transform (DFT), then the signal's fundamental frequency is interpolated by fitting a
|
|
wavelet to the largest peak in the DFT result. The bias parameter of this curve fit is an accurate estimation of the
|
|
signal's fundamental frequency. This algorithm is similar to the simpler interpolated DFT algorithm referenced by phasor
|
|
measurement literature~\cite{borkowski01}.
|
|
|
|
To collect ground truth measurements for our analysis of grid frequency as a communication channel, we developed a
|
|
device to safely record mains voltage waveforms. Our system consists of an \texttt{STM32F030F4P6} ARM Cortex M0
|
|
microcontroller that records mains voltage using its internal 12-bit ADC and transmits measured values through a
|
|
galvanically isolated USB/serial bridge to a host computer. We derive our system's sampling clock from a crystal oven to
|
|
avoid frequency measurement noise due to thermal drift of a regular crystal: \SI{1}{ppm} of crystal drift would cause a
|
|
grid frequency error of $\SI{50}{\micro\hertz}$. We compared our oven-stabilized clock against a GPS 1 pps reference and
|
|
found that over a time span of 20 minutes both stayed stable within 5 ppb of each other, which corresponds to the drift
|
|
specification of a typical crystal oven.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.8\textwidth]{../notebooks/fig_out/freq_meas_spectrum}
|
|
\caption{The spectrum of grid frequency variations measured over a two-day timespan. The raw spectrum is shown in
|
|
gray, and a smoothed spectrum is shown in red. The blue line is inversely proportional to frequency and illustrates
|
|
the $1/f$ nature of the spectrum. Distinctive peaks in the spectrum are marked with red crosses, and their locations
|
|
are given on the bottom of the diagram.}
|
|
\label{fig_freq_spec}
|
|
\end{figure}
|
|
|
|
A number of effects can be seen in our measurement results in Figure~\ref{fig_freq_spec}. Across the frequency range, we
|
|
observe a broad $1/f$ noise. Above a period of $\SI{10}{\second}$, this $1/f$ noise dips to a flat noise floor. We
|
|
estimate that this low-noise region is caused by the self-regulating effect of loads. %FIXME citation
|
|
Above a $\SI{10}{\second}$ period, primary control is activated and thus the $1/f$ noise we observe is the result of the
|
|
interaction between primary control and consumer demand. On top of this $1/f$ behavior, the spectrum shows several sharp
|
|
peaks at time intervals with a ``round'' number such as $\SI{10}{\second}$, $\SI{60}{\second}$ or multiples of
|
|
$\SI{300}{\second}$. These peaks are due to loads turning on- or off depending on wall-clock time. Besides the narrow
|
|
peaks caused by this effect we can also observe two wider bumps at $\SI{6.3}{\second}$ and $\SI{3.9}{\second}$. These
|
|
bumps closely correlate with continental european synchonous area's oscillation modes at $\SI{0.15}{\hertz}$ (east-west)
|
|
and $\SI{0.25}{\hertz}$ (north-south)~\cite{grebe01}.
|
|
% FIXME measurement results
|
|
|
|
\section{Grid Frequency Modulation}
|
|
|
|
In its most basic form a transmitter for grid frequency modulation would be a very large controllable load located
|
|
centrally within the grid. A spool of wire submerged in a body of cooling liquid such as a small lake along with a
|
|
thyristor rectifier bank would likely suffice. We can however decrease hardware and maintenance investment even compared
|
|
to this rather uncultivated solution by repurposing large industrial loads as transmitters. Going through a list of
|
|
energy-intensive industries in Europe~\cite{ec01}, we found that an aluminium smelter would be a good candidate. In
|
|
aluminium smelting, aluminium is electrolytically extracted from alumina solution. High-voltage mains power is
|
|
transformed, rectified and fed into about 100 series-connected electrolytic cells forming a \emph{potline}. Inside these
|
|
pots alumina is dissolved in molten cryolite electrolyte at about \SI{1000}{\degreeCelsius} and electrolysis is
|
|
performed using a current of tens or hundreds of Kiloampère. The resulting pure aluminium settles at the bottom of the
|
|
cell and is tapped off for further processing.
|
|
|
|
Aluminium smelters are operated around the clock, and due to the high financial stakes their behavior under power
|
|
outages has been carefully characterized by the industry. Power outages of tens of minutes up to two hours reportedly do
|
|
not cause problems in aluminium potlines~\cite{eisma01,oye01}. Recently, even techniques for intentional power modulation
|
|
without affecting cell lifetime or product quality have been developed to take advantage of variable energy
|
|
prices.~\cite{duessel01,eisma01,depree01}. An aluminium plant's power supply is controlled to constantly keep all
|
|
smelter cells under optimal operating conditions. Modern power supply systems employ large banks of diodes or SCRs to
|
|
rectify low-voltage AC to DC to be fed into the potline~\cite{ayoub01}. Potline voltage is controlled through a
|
|
combination of a tap changer and a transductor. Individual cell voltages are controlled by changing the physical
|
|
distance between anode and cathode distance. In this setup, power can be modulated fully electronically. Since this
|
|
system does not have any mechanical inertia, high modulation rates can reasonably be achieved.
|
|
|
|
In~\cite{depree01}, the authors describe a setup where a large Aluminium smelter in continental Europe is used as
|
|
primary control reserve for frequency \emph{regulation}. In this setup, a rise time of $\SI{15}{\second}$ was achieved
|
|
to meet the $\SI{30}{\second}$ requirement posed by local standards for primary control. In their conclusion, the
|
|
authors note that for their system, an energy storage capacity of $\SI{7.7}{\giga\watt\hour}$ is possible if all plants
|
|
of a single operator are used. Given the maximum modulation depth of $\SI{100}{\percent}$ for up to one hour that is
|
|
mentioned by the authors, this results in an effective modulation power of $\SI{7.7}{\giga\watt}$. Over a longer
|
|
timespan of $\SI{48}{\hour}$, they have demonstrated a $\SI{33}{\percent}$ modulation depth which would correspond to
|
|
a modulation power of $\SI{2.5}{\giga\watt}$.
|
|
|
|
From this brief literature review, we conclude that a modulation of part of an aluminium smelter's power consumption
|
|
most likely is possible at no significant production impact and low infrastructure cost (such as for shell heat
|
|
exchangers as used in~\cite{depree01}). Aluminium smelters are connected to the grid in a way that they do not pose a
|
|
danger to other nearby consumers when they turn off or on parts of the plant, as this is commonplace during routine
|
|
maintenance activities. They are very large consumers of electrical power, but they are still small when seen in
|
|
relation to the entire grid.
|
|
|
|
\subsection{Parametrizing Modulation for GFM}
|
|
|
|
Given the grid characteristics we measured using our custom waveform recorder and using a model of our transmitter, we
|
|
can derive parameters for the modulation of our broadcast system. Modulating $\SI{25}{\mega\watt}$ of smelter power
|
|
would yield a frequency shift of $\SI{1}{\milli\hertz}$. At an RMS frequency noise of around $\SI{10}{\milli\hertz}$ in
|
|
the band around $\SI{1}{\hertz}$, this results in challenging SNR. A second layer of modulation yielding some modulation
|
|
gain is necessary to achieve sufficient overall SNR.
|
|
|
|
The grid's frequency noise has significant localized peaks that might interfere with this modulation. Further
|
|
complicating things are the oscillation modes. A GFM system must be designed to avoid exciting these modes. However,
|
|
since these modes are not static, a modulation method that is designed around a specific assumption of their location
|
|
would not be future proof. Given these concerns, the optimal second-level modulation technique for GFM is a
|
|
spread-spectrum technique. By spreading signal energy throughout a wide band, both the impact of local noise spikes is
|
|
minimized and the risk of mode excitation is reduced since spread-spectrum techniques minimize energy in any particular
|
|
sub-band.
|
|
|
|
In this paper, we chose to perform simulations using Direct Sequence Spread Spectrum for its simple implementation and
|
|
good overall performance. DSSS chip timing should be as fast as the transmitter's physics allow to exploit the low-noise
|
|
region between $\SI{0.2}{\hertz}$ to $\SI{2.0}{\hertz}$ in Figure~\ref{fig_freq_spec}. Going past
|
|
$\approx\SI{2}{\hertz}$ would complicate frequency measurement at the receiver side.
|
|
|
|
We simulated a proof-of-concept modulator and demodulator using data captured from our grid frequency sensor. Our
|
|
simulations covered a range of parameters in modulation amplitude, DSSS sequence bit depth, chip duration and detection
|
|
threshold. Figure~\ref{fig_ser_nbits} shows symbol error rate (SER) as a function of modulation amplitude with Gold
|
|
sequences of several bit depths. As can be seen, realistic modulation amplitudes are in the range around
|
|
$\SI{1}{\milli\hertz}$. In the continental European synchronous area, this corresponds to a modulation power of
|
|
approximately $\SI{25}{\mega\watt}$. Figure~\ref{fig_ser_thf} shows SER against detection threshold relative to
|
|
background noise. Figure~\ref{fig_ser_chip} shows SER against chip duration for a given fixed symbol length. As expected
|
|
from looking at our measured grid frequency noise spectrum, performance is best for short chip durations and worsens for
|
|
longer chip durations since shorter chip durations move our signals' bandwidth into the lower-noise region from
|
|
$\SI{0.2}{\hertz}$ to $\SI{2}{\hertz}$.
|
|
%FIXME introduce term "chip" somewhere
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.6\textwidth]{../notebooks/fig_out/dsss_gold_nbits_overview}
|
|
\caption{Symbol Error Rate as a function of modulation amplitude for Gold sequences of several lengths.}
|
|
\label{fig_ser_nbits}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../notebooks/fig_out/dsss_thf_amplitude_5678}
|
|
\caption{SER vs.\ Amplitude and detection threshold. Detection threshold is set as a factor of background noise
|
|
level.}
|
|
\label{fig_ser_thf}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../notebooks/fig_out/chip_duration_sensitivity_6}
|
|
\vspace*{-1cm}
|
|
\caption{SER vs.\ DSSS chip duration.}
|
|
\label{fig_ser_chip}
|
|
\end{figure}
|
|
|
|
\subsection{Parametrizing a proof-of-concept "Safety Reset" System Based on GFM}
|
|
|
|
Taking these modulation parameters as a starting point, we proceeded to create a proof-of-concept smart meter emergency
|
|
reset system. On top of the modulation described in the previous paragraphs we layered simple Reed-Solomon error
|
|
correction~\cite{mackay01} and some cryptography. The goal of our PoC cryptographic implementation was to allow the
|
|
sender of an emergency reset broadcast to authorize a reset command to all listening smart meters. An additional
|
|
constraint of our setting is that due to the extremely slow communication channel all messages should be kept as short
|
|
as possible. The solution we chose for our PoC is a simplistic hash chain using the approach from the Lamport and
|
|
Winternitz One-time Signature (OTS) schemes. Informally, the private key is a random bitstring. The public key is
|
|
generated by recursively applying a hash function to this key a number of times. Each smart meter reset command is then
|
|
authorized by disclosing subsequent elements of this series. Unwinding the hash chain from the public key at the end of
|
|
the chain towards the private key at its beginning, at each step a receiver can validate the current command by checking
|
|
that it corresponds to the previously unknown input of the current step of the hash chain. Replay attacks are prevented
|
|
by recording the most recent valid command. Keys revocation is supported by designating the last key in the chain as a
|
|
\emph{revocation key} upon whose reception the client devices advance their local hash ratchet without taking further
|
|
action. This simple scheme does not afford much functionality but it results in very short messages and removes the
|
|
need for computationally expensive public key cryptography inside the smart meter.
|
|
% FIXME add more precise/formal description of crypto
|
|
% FIXME add description of targeting/scope function?
|
|
% FIXME somewhere above descirbe entire reset system architecture????!!!
|
|
% FIXME add description of disarm message (replay protection)
|
|
|
|
\subsection{Experimental results}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.6\textwidth]{prototype.jpg}
|
|
\caption{The completed prototype setup. The board on the left is the safety reset microcontroller. It is connected
|
|
to the smart meter in the middle through an adapter board. The top left contains a USB hub with debug interfaces to
|
|
the reset microcontroller. The cables on the bottom left are the debug USB cable and the \SI{3.5}{\milli\meter}
|
|
audio cable for the simulated mains voltage input.}
|
|
\label{fig_proto_pic}
|
|
\end{figure}
|
|
|
|
For a realistic proof of concept, we decided to implement our signal processing chain from DSSS demodulator through
|
|
error correction up to our simple cryptography layer in microcontroller firmware and demonstrate this firmware on actual
|
|
smart meter hardware, shown in Figure~\ref{fig_proto_pic}. In our proof of concept a safety reset controller is
|
|
connected to the main application microcontroller of a smart meter. The reset controller is tasked with listening for
|
|
authenticated reset commands on the voltage waveform, and on reception of such a command resetting the smart meter
|
|
application controller by flashing a known-good firmware image to its memory.
|
|
|
|
The signal processing chain of our PoC is shown in Figure~\ref{fig_demo_sig_schema}. To interoperate with existing
|
|
implementations of SHA-512 and reed-solomon decoding, this implementation was written in the C programming language. To
|
|
demonstrate an application close to a field implementation, we chose an Easymeter \texttt{Q3DA1002} smart meter as our
|
|
reset target. This model is popular in the German market and readily available second-hand. The meter consists of three
|
|
isolated metering ASICs connected to a data logging and display PCB through infrared optical links. To demonstrate the
|
|
safety reset's firmware reset functionality, we connected our safety reset microcontroller to the Texas Instruments
|
|
\texttt{MSP430} microcontroller on the meter's display and data logging board through the JTAG debug interface that the
|
|
board's vendor had conveniently left accessible. We ported part of
|
|
\texttt{mspdebug}\footnote{\url{https://dlbeer.co.nz/mspdebug/}} to drive the meter microcontroller's JTAG interface and
|
|
wrote a piece of demonstrator code that overwrites the meter's firmware with one that displays an identifying string on
|
|
the meter's display after boot-up.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{prototype_schema}
|
|
\caption{The signal processing chain of our demonstrator.}
|
|
\label{fig_demo_sig_schema}
|
|
\end{figure}
|
|
|
|
To measure grid frequency in our demonstrator, we ported the same code we used in
|
|
Section~\label{grid-freq-characterization} to our demonstrator, again using the voltage measured using the
|
|
microcontroller's internal ADC but using a regular crystal instead of a crystal oven for the microcontroller's system
|
|
clock. Since we did not have an aluminium smelter ready, we decided to feed our proof-of-concept reset controller with
|
|
an emulated grid voltage sine wave from a computer's headphone jack. Where in a real application this microcontroller
|
|
would take ADC readings of input mains voltage divided down by a long resistive divider chain, we instead feed the ADC
|
|
from a $\SI{3.5}{\milli\meter}$ audio input. For operational safety, we disconnected the meter microcontroller from its
|
|
grid-referenced capacitive dropper power supply and connected it to our reset controlller's debug USB power supply.
|
|
|
|
We performed several successful experiments using a signature truncated at 120 bit and a 5 bit DSSS sequence. Taking the
|
|
sign bit into account, the length of the encoded signature is 20 DSSS symbols. On top of this we used Reed-Solomon error
|
|
correction at a 2:1 ratio inflating total message length to 30 DSSS symbols. At the \SI{1}{\second} chip rate we used in
|
|
other simulations as well this equates to an overall transmission duration of approximately \SI{15}{\minute}. To give
|
|
the demodulator some time to settle and to produce more realistic conditions of signal reception we padded the modulated
|
|
signal unmodulated noise on both ends.
|
|
|
|
\section{Discussion}
|
|
|
|
For our proof of concept, before settling on the commercial smart meter we first tried to use an \texttt{EVM430-F6779}
|
|
smart meter evaluation kit made by Texas Instruments. This evaluation kit did not turn out well for two main reasons.
|
|
One, it shipped with half the case missing and no cover for the terminal blocks. Because of this some work was required
|
|
to get it electrically safe. Even after mounting it in an electrically safe manner the safety reset controller
|
|
prototype would also have to be galvanically isolated to not pose an electrical safety risk since the main MCU is not
|
|
isolated from the grid and the JTAG port is also galvanically coupled. The second issue we ran into was that the
|
|
development board is based around a specific microcontroller from TI's \texttt{MSP430} series that is incompatible with
|
|
common JTAG programmers.
|
|
|
|
Our initial assumption that a development kit would be easier to program than a commercial meter did not prove to be
|
|
true. Contrary to our expectations the commercial meter had JTAG enabled allowing us to easily read out its stock
|
|
firmware without either reverse-engineering vendor firmware update files nor circumventing code protection measures.
|
|
The fact that its firmware was only available in its compiled binary form was not much of a hindrance as it proved not
|
|
to be too complex and all we wanted to know we found with just a few hours of digging in
|
|
Ghidra\footnote{\url{https://ghidra-sre.org/}}.
|
|
|
|
In the firmware development phase our approach of testing every module individually (e.g. DSSS demodulator, Reed-Solomon
|
|
decoder, grid frequency estimation) proved to be very useful. In particular debugging benefited greatly from being able
|
|
to run several thousand tests within seconds. In case of our DSSS demodulator, this modular testing and simulation
|
|
architecture allowed us to simulate thousands of runs of our implementation on test data and directly compare it to our
|
|
Jupyter/Python prototype. Since we spent more time polishing our embedded C implementation it turned out to perform
|
|
better than our Python prototype while still exhibiting the same fundamental response to changes to its parameters.
|
|
|
|
In accordance with our initial estimations we did not run into any code space nor computation bottlenecks for chosing
|
|
floating point emulation instead of porting over our algorithms to fixed point calculations. The extremely slow sampling
|
|
rate of our systems makes even heavyweight processing such as FFT or our brute force dynamic programming approach to
|
|
DSSS demodulation possible well within our performance constraints.
|
|
|
|
The safety reset controller does not require any peripherals except for an ADC. Thus we expect code size to be the main
|
|
factor affecting per-unit cost in an in-field deployment of our concept. At around \SI{64}{\kilo\byte}, our unoptimized
|
|
demonstrator firmware implementation is already on the lower end of the spectrum. Especially with some optimization we
|
|
expect safety reset controllers to be commercially viable given adequate political incentives.
|
|
|
|
\section{Conclusion}
|
|
\label{sec_conclusion}
|
|
|
|
In this paper we have developed an end-to-end design of a reset system to restore smart meters to a safe operating state
|
|
during an ongoing large-scale cyberattack. To allow our system to be triggered even in the middle of a cyberattack we
|
|
have developed a broadcast data transmission system based on intentional modulation of global grid frequency. We have
|
|
shown the viability of our end-to-end design through simulations. To put these simulations on a solid foundation we have
|
|
developed a grid frequency measurement methodology comprising of a custom-designed hardware device for electrically safe
|
|
data capture and a set of software tools to archive and process captured data. Our simulations show good behavior of our
|
|
broadcast communication system and give an indication that cooperating with a large consumer such as an aluminium smelter
|
|
would be a feasible way to set up a transmitter with low hardware overhead. We have outlined a simple cryptographic
|
|
protocol ready for embedded implementation in resource-constrained systems that allows triggering a safety reset with a
|
|
response time of less than 30 minutes. We have experimentally validated our system using simulated grid frequency data
|
|
in a demonstrator setup based on a commercial microcontroller as our safety reset controller and an off-the-shelf smart
|
|
meter. The next step in our evaluation will be to conduct an experimental evaluation of our modulation scheme in
|
|
collaboration with an utility and an operator of a multi-megawatt load. Source code and electronics CAD designs are
|
|
available at the public repository listed at the end of this document.
|
|
|
|
\printbibliography[heading=bibintoc]
|
|
|
|
%%% FIXME remove appendix and work into text.
|
|
|
|
\center{
|
|
\center{This is version \texttt{\input{version.tex}\unskip} of this paper, generated on \today. The git repository
|
|
can be found at:}
|
|
|
|
\center{\url{https://git.jaseg.de/safety-reset.git}}
|
|
}
|
|
\end{document}
|