791 lines
63 KiB
TeX
791 lines
63 KiB
TeX
\documentclass[sigconf,anonymous]{acmart}
|
|
|
|
\usepackage[binary-units]{siunitx}
|
|
\DeclareSIUnit{\baud}{Bd}
|
|
\DeclareSIUnit{\year}{a}
|
|
\usepackage{graphicx,color}
|
|
\usepackage{subcaption}
|
|
\usepackage{array}
|
|
\usepackage{hyperref}
|
|
\usepackage{enumitem}
|
|
|
|
\renewcommand{\floatpagefraction}{.8}
|
|
\newcommand{\degree}{\ensuremath{^\circ}}
|
|
\newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}
|
|
\newcommand{\partnum}[1]{\texttt{#1}}
|
|
|
|
\begin{document}
|
|
|
|
% https://eepublicdownloads.entsoe.eu/clean-documents/pre2015/publications/entsoe/Operation_Handbook/Policy_1_Appendix%20_final.pdf
|
|
|
|
\begin{abstract}
|
|
The dependence of the electrical grid on networked control systems is steadily rising. While utilities are defending
|
|
their side of the grid effectively through rigorous IT security measures such as physically separated control
|
|
networks, the increasing number of networked devices on the consumer side such as smart meters or large
|
|
IoT-connected appliances such as air conditioners are much harder to secure due to their heterogeneity. We consider
|
|
a crisis scenario in which an attacker compromises a large number of consumer-side devices and modulates their
|
|
electrical to destabilize the grid and cause an electrical outage~\cite{ctap+11,wu01,zlmz+21,kgma21,smp18,hcb19}.
|
|
|
|
In this paper propose a broadcast channel based on the modulation of grid frequency through which utility operators
|
|
can issue commands to devices at the consumer premises both during an attack for mitigation and in its wake to aid
|
|
recovery. Our proposed grid frequency modulation (GFM) channel is independent of other telecommunication networks.
|
|
It is resilient towards localized blackouts and it is operational immediately as soon as power is restored.
|
|
|
|
Based on our GFM broadcast channel we propose a ``safety reset'' system to mitigate an ongoing attack by disabling a
|
|
device's network interfaces and restting its control functions. It can also be used in the wake of an attack to aid
|
|
recovery by shutting down non-essential loads to reduce strain on the grid.
|
|
|
|
To validate our proposed design, we conducted simulations based on measured grid frequency behavior. Based on these
|
|
simulations, we performed an experimental validation on simulated grid voltage waveforms using a smart meter
|
|
equipped with a prototype safety reset system based on an inexpensive commodity microcontroller.
|
|
\end{abstract}
|
|
|
|
\date{}
|
|
\title{\large\bf Ripples in the Pond:\\Transmitting Information through Grid Frequency Modulation}
|
|
\author{{\rm Jan Sebastian Götte}\\TU Darmstadt \and {\rm Liran Katzir}\\Tel Aviv University\and {\rm Björn Scheuermann}\\TU Darmstadt}
|
|
%\institute{TU Darmstadt\\ Communication Networks Lab\\ \email{safetyreset@jaseg.de}
|
|
%\and Tel Aviv University\\ Faculty of Engineering\\ \email{lirankat@tau.ac.il}
|
|
%\and TU Darmstadt\\ Communication Networks Lab\\ \email{scheuermann@informatik.hu-berlin.de}}
|
|
\maketitle
|
|
%\keywords{Security, privacy and resilience in critical infrastructures \and Security and privacy in ``internet of
|
|
%things'' \and Cyber-physical systems \and Hardware security \and Network Security \and Energy systems \and Signal theory}
|
|
|
|
\section{Introduction}
|
|
|
|
With the rollout of the smart grid, the IT security of electrical infrastructure has attracted increased attention in
|
|
the last years. Smart Grid security has two major components: The security of central SCADA systems, and the security
|
|
of equipment at the consumer premises such as smart meters and IoT devices. While there is previous work on both sides,
|
|
their interactions have not yet received much attention.
|
|
|
|
We consider the previously proposed scenario where a large number of compromised consumer devices is used alone or in
|
|
conjunction with an attack on the grid's central SCADA systems to destabilize the grid by rapidly modulating the total
|
|
connected load~\cite{ctap+11,wu01,zlmz+21,kgma21,smp18,hcb19}. Several devices have been identified as likely targets
|
|
for such an attack including smart meters with integrated remote disconnect switches~\cite{ctap+11,anderson01}, large
|
|
IoT-connected appliances~\cite{smp18,hcb19,chl20,olkd20} and electric vehicle chargers~\cite{kgma21,zlmz+21,olkd20}.
|
|
Such attacks are hard to mitigate, and existing literature focuses on hardening grid control
|
|
systems~\cite{kgma21,lzlw+20,lam21,zlmz+21} and device firmware\cite{mpdm+10,smp18,zb20,yomu+20} to prevent compromise.
|
|
Despite the infeasibility of perfect firmware security, there is little research on \emph{post-compromise} mitigation
|
|
approaches. A core issue with post-attack mitigation is that network connections such as internet and cellular networks
|
|
between the utility and devices on consumer premises may not work due to the attack. Thus, mitigation strategies that
|
|
involve devices on the consumer premises will need an out-of-band communication channel.
|
|
|
|
In this paper, we propose a novel, resilient, grid-wide communication technique based on \emph{grid frequency
|
|
modulation} (GFM) that can be used to broadcast short messages to all devices connected to the electrical grid. The grid
|
|
frequency modulation channel is robust and can be used even during an ongoing attack. Based on our channel we propose
|
|
the \emph{safety reset} controller, an attack mitigation technique that is compatible with most smart meter and IoT
|
|
device designs. A safety reset controller is a separate controller integrated to the device that awaits an out-of-band
|
|
reset command transmitted through GFM. Upon reception of the reset command, it puts the device into a safe state (e.g.
|
|
\emph{relay on} or \emph{light on}) that interrupts attacker control over the device. The safety reset controller is
|
|
separated from the system's main application controller and itself does not have any conventional network connections to
|
|
reduce attack surface and cost.
|
|
|
|
The grid frequency modulation channel can be operated by transmission system operators (TSOs) even during black-start
|
|
recovery procedures and it bridges the gap between the TSO's private control network and consumer devices that can not
|
|
economically be equipped with other resilient communication techniques such as satellite transceivers. To demonstrate
|
|
our proposed channel, we have implemented a system that transmits error-corrected and cryptographically secured commands
|
|
through an emulated grid frequency-modulated voltage waveform to an off-the-shelf smart meter equipped with a prototype
|
|
safety reset controller based on a small off-the-shelf microcontroller.
|
|
|
|
The frequency behavior of the electrical grid can be analyzed by examining the grid as a large collection of mechanical
|
|
oscillators coupled through the grid via the electromotive force~\cite{rogers01,wcje+12}. The generators and motors that
|
|
are electromagnetically coupled through the grid's transmission lines and transformers run synchronously with each
|
|
other, with only minor localized variations in their rotation angle. The dynamic behavior of grid frequency is a direct
|
|
product of this electromechanical coupling: With increasing load, frequency drops because shafts move slower under
|
|
higher torque, and consequentially with decreasing load frequency rises. Industrial control systems keep frequency close
|
|
to its nominal value over time spans of minutes or hours, but at shorter time frames the combined inertia of all
|
|
grid-connected generators and motors is what regulates frequency.
|
|
|
|
Grid frequency modulation works by quickly modulating the power of a large, grid-connected load or generator. When this
|
|
modulation is at low amplitude and high frequency, it is below the thresholds set for the grid's automated control
|
|
systems and monitoring systems and it will directly affect frequency according to the grid's inertia. GFM differs from
|
|
traditional Powerline Communication (PLC) systems in that it reaches every device within one synchronous area as the
|
|
signal is embedded into the fundamental grid frequency. Traditional PLC uses a superimposed voltage, which is quickly
|
|
attenuated across long distances. Practically speaking, using GFM a single large transmitter can cover an entire
|
|
synchronous area, while in traditional PLC hundreds or thousands of smaller transmitters would be necessary. Unlike
|
|
traditional PLC, any large industrial load that allows for fast computer control can act as a GFM transmitter.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.4\textwidth]{flowchart}
|
|
\caption{Structural overview of our concept. 1 - Government authority or utility operations center. 2 - Emergency
|
|
radio link. 3 - Aluminium smelter. 4 - Electrical grid. 5 - Target smart meter.}
|
|
\label{fig_intro_flowchart}
|
|
\end{figure}
|
|
|
|
Figure~\ref{fig_intro_flowchart} shows an overview of our concept using a smart meter as the target device and a large
|
|
aluminium smelter temporarily re-purposed as a GFM transmitter. Two scenarios for its application are before or during
|
|
a cyberattack, to stop an attack on the electrical grid in its tracks, and after an attack while power is being restored
|
|
to prevent a repeated attack. In both scenarios, our concept is independent of telecommunication networks (such as the
|
|
internet or cellular networks) as well as broadcast systems (such as cable television or terrestrial broadcast radio)
|
|
while requiring only inexpensive signal processing hardware and no external antennas (such as are needed for satellite
|
|
communication). A grid frequency-based system can function as long as power is still available, or as soon as power is
|
|
restored after the attack. One powerful function this allows is ``flushing out`` an attacker from compromised smart
|
|
meters after an attack, before restoring smart meter internet connectivity.
|
|
|
|
Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load such as a large aluminium smelter,
|
|
load bank or photovoltaic farm would allow for the transmission of a crytographically secured safety reset signal within
|
|
$15$ minutes. We have designed and constructed a proof-of-concept prototype receiver that demonstrates the feasibility
|
|
of decoding such signals on a resource-constrained microcontroller.
|
|
|
|
\subsection{Motivation}
|
|
|
|
Consumer devices are increasingly becoming \emph{smart}. Large numbers of IoT devices are connected through the public
|
|
internet, and in several countries internet-connected Smart Meters can disconnect entire households from the grid in
|
|
case of unpaid bills~\cite{anderson01}. The increasing proliferation of smart devices on the consumer side presents an
|
|
opportunity to grid operators, who rely on forecasts for the cost-optimized control of generation and power flow. The
|
|
core of the \emph{Smart Grid} vision is that utilities can now gather detailed data for more accurate consumption
|
|
forecasts, and in some cases can even adjust parameters of large devices like water heaters to smooth out load spikes.
|
|
|
|
However, this increased degree of visibility and control comes with an increased IT security risk. In this paper we
|
|
focus on scenarios where an attacker compromises a large number of grid-connected remote-controllable devices. This may
|
|
be simple smart home devices such as IoT-connected air conditioners, but it may also include Smart Meters that are
|
|
outfitted with a remote disconnect switch as is common in some countries. By rapidly switching large numbers of such
|
|
devices in a coordinated manner, the attacker has the opportunity to de-stabilize the electrical
|
|
grid~\cite{zlmz+21,kgma21,smp18,hcb19}.
|
|
|
|
In this paper, we focus on assisting the recovery procedure after a succesful attack because we estimate that this
|
|
approach will yield a better return of investement in overall grid stability versus resources spent on security
|
|
measures. Previous work on IoT and Smart Grid security has focused on the prevention of attacks though firmware security
|
|
measures. While research on prevention is important, we estimate that its practical impact will be limited by the
|
|
diversity of implementations found in the field~\cite{nbck+19,zlmz+21,smp18}. We predict that it would be a Sisyphean
|
|
task to secure the firmware of sufficiently many devices to deny an attacker the critical mass needed to cause trouble.
|
|
Even if all flaws in the firmware of a broad range of devices would be fixed, users still have to update. In smart grid
|
|
and IoT devices, this presents a difficult problem since user awareness is low~\cite{nbck+19}.
|
|
|
|
\subsection{Contents}
|
|
|
|
Starting from a high level architecture, we have carried out simulations of our concept's performance under real-world
|
|
conditions using measured grid frequency data. Based on these simulations we implemented an end-to-end prototype of our
|
|
proposed safety reset controller as part of a realistic smart meter demonstrator. Finally, we experimentally validated
|
|
our results based on a simulated mains voltage signal and we will conclude with an outline of further steps towards a
|
|
practical implementation.
|
|
|
|
This work contains the following contributions:
|
|
\begin{enumerate}[topsep=4pt]
|
|
\item We introduce Grid Frequency Modulation (GFM) as a communication primitive. % FIXME done before in that one paper
|
|
\item We elaborate the fundamental physics underlying GFM and theorize on the constrains of a practical
|
|
implementation.
|
|
\item We design a communication system based on GFM.
|
|
\item We carry out extensive simulations of our systems to determine its performance characteristics.
|
|
\end{enumerate}
|
|
|
|
%\subsection{Notation}
|
|
% FIXME drop or rework this section ; actually update notation to be consistent throughout
|
|
%To a computer scientist there is one confusing aspect to the theory of grid frequency modulation. GFM can be seen as a
|
|
%frequency modulation (FM) with a baseband signal in the band below approximately $f_m = \SI{5}{\hertz}$ that is
|
|
%modulated on top of a carrier signal at $f_c = \SI{50}{\hertz}$ in case of the European electrical grid. The frequency
|
|
%deviation $f_\Delta$ that the modulated carrier deviates from its nominal value of $f_m$ is very small at only a few
|
|
%milli-Hertz.
|
|
%
|
|
%When grid frequency is measured by first digitizing the mains voltage waveform, then de-modulating digitally, the FM's
|
|
%signal-to-noise ratio (SNR) is very high and is dominated by the ADC's quantization noise and nearby mains voltage noise
|
|
%sources such as resistive droop due to large inrush current of nearby machines.
|
|
%
|
|
%Note that both the carrier signal at $f_c$ and the modulation signal at $f_m$ both have unit Hertz. To disambiguate
|
|
%them, in this paper we will use \textbf{bold} letters to refer to the carrier waveform $\mathbf{U}$ or frequency
|
|
%$\mathbf{f_c}$ as well as its deviation $\mathbf{f_\Delta}$, and we will use normal weight for the actual modulation
|
|
%signal and its properties such as $f_m$.
|
|
|
|
\section{Background on the electrical grid}
|
|
\subsection{Components and interactions}
|
|
|
|
The electrical grid transmits alternating current electrical power from generators to loads. Any device that is
|
|
connected to the grid must run ``synchronously'' with the grid, i.e.\ it must produce or consume power following the
|
|
grid's voltage waveform. In generators and motors, the electromotive force acts to synchronize the device with the grid.
|
|
Connecting a generator that has not been synchronized to the grid leads to large currents flowing through the
|
|
generator's windings, inducing extreme forces that can mechanically destroy the generator. Similarly, if the inverters
|
|
of a solar power station would try to fight the grid, the grid would win and the inverters' power semiconductors would
|
|
release their magic smoke.
|
|
|
|
Originally, all power sources on the grid were synchronous rotating generators. Today, the shift towards renewable
|
|
energies and the introduction of high-voltage DC links has led to some of the grid's generating capacity being replaced
|
|
with inverters that electronically emulate the grid's voltage waveform to efficiently convert a DC input to the grid's
|
|
alternating current.
|
|
|
|
The generators and loads on the grid are linked through a complex network of transmission lines. Transformers are used
|
|
to couple between transmission lines operating at different voltage levels, and several types of switches allow
|
|
utilities to steer power flow throughout this network. Through the electromotive force, all synchronous generators
|
|
connected to the grid are electromechanically coupled. Transmission lines introduce a (small) phase delay to the
|
|
electric fields traversing the grid, but besides local differences in phase, all parts of the grid are synchronous.
|
|
|
|
\subsection{Grid frequency behavior}
|
|
|
|
On the electrical grid, generation and consumption of energy must be precisely matched at all times for the grid to stay
|
|
at a constant, synchronous frequency. If generation outpaces consumption, generators would provide less mechanical
|
|
resistance to their source of mechanical power, or \emph{prime mover}, which would lead the generators to spin faster
|
|
and faster. Similarly, if consumption outpaced production, the increased mechanical load would slow down generators,
|
|
ultimately leading to a collapse.
|
|
|
|
On top of the grid's inherent mechanical inertia, several tiers of control systems are layered to stabilize mains
|
|
frequency during day-to-day operations. Fast-acting automatic primary control stabilizes temporary frequency excursions,
|
|
while slower automatic secondary control and manual tertiary control re-adjust device's operating points back to their
|
|
nominal values after they have shifted due to primary control action.
|
|
|
|
In day-to-day operation, the frequency of the electrical grid is maintained at a
|
|
fixed, stable level through several layers of control systems.
|
|
|
|
\subsection{Black-start recovery}
|
|
|
|
The recovery from a large-scale power outage is a complex operational challenge. Large outages are caused by cascading
|
|
failures. Since all consumers and producers that are connected to the electrical grid are physically coupled through the
|
|
electromotive force, a fault in one part of the grid affects all devices connected across the grid. To function, the
|
|
grid relies on a delicate balance between electricity generation, transmission and consumption. When this balance is
|
|
disturbed, cascading failures can occur. A transmission line shutting off can lead other, nearby lines to overload and
|
|
shut off. Due to the electromechanical coupling of all machines connected to the grid, a generator or consumer suddenly
|
|
shutting off causes a transient in the grid's frequency. If the frequency goes too far out of bounds, protection devices
|
|
take power plants and large industrial loads offline.
|
|
|
|
The recovery from a large-scale outage requires the grid's operators to bring generators and loads back online one by
|
|
one while continuously maintaining balance between generation and consumption to avoid their protection devices shutting
|
|
them down again. To coordinate this process, transmission system operators cannot rely on the public internet or
|
|
cellular networks, as they may not work during a large-scale power outage. Instead, they maintain private communication
|
|
infrastructure using dedicated lines rented from telecommunciations providers, fibers run along transmission lines, and
|
|
dedicated radio links.
|
|
|
|
To start from a complete outage, first a number of \emph{black start}-capable power stations that can start by
|
|
themselves without any external power are brought online. With their help, other power stations and consumers are
|
|
gradually brought online until a part of the grid has been restored to nominal operation. This process can be performed
|
|
simultaneously in different parts of the grid. After these \emph{islands} have been restored, they can then be joined to
|
|
restore the grid to its normal state.
|
|
|
|
\subsection{Demand-side response and Smart Metering}
|
|
|
|
Maintaining the balance between electricity generation and consumption under varying load conditions is critical.
|
|
Utilities can access different energy sources, each of which have their own trade-off in response speed versus energy
|
|
cost. For instance, the availability of wind and solar power cannot be controlled at all, while hydroelectric power
|
|
plants can quickly regulate the speed and power output of their turbines. Combined with the complex layout of the grid's
|
|
infrastructure such as transmission lines, these economical factors lead to a complex optimization problem, the quality
|
|
of whose solution directly manifests itself in the utility's bottom line.
|
|
|
|
For decades, one solution to this issue has been demand-side response (DSR)~\cite{rs48}. In DSR, large loads such as
|
|
water heaters are centrally controlled by the utility to switch on outside of peak demand. Since the precise timing of
|
|
these loads is of no consequence to their user, users are happy to get slightly better prices from their utility while
|
|
utilities gain a degree of control allowing them to optimize their network's performance. As part of the smart grid
|
|
vision, DSR will be utilized in a larger fraction of consumer devices.
|
|
|
|
A core component of the smart grid is the rollout of ``Advanced Metering Infrastructure'' (AMI), colloquially known as
|
|
smart meters. Smart meters are electricity meters that use a real-time communication interface to automatically transmit
|
|
high-resolution measurements to the utility. In contrast to the yearly reading schedule of traditional electricity
|
|
meters, smart meters can provide near-realtime data that the utility can use for more accurate load forecasting.
|
|
|
|
\subsection{Powerline Communication (PLC)}
|
|
|
|
A core issue in smart metering and demand-side response is the communication channel from the meter to the greater
|
|
world. Smart meters are cost-constrained devices, which limits the use of landline internet or cellular conenctions.
|
|
Additionally, electricity meters are often installed in basements, far away from the customer's router and with soil and
|
|
concrete blocking radio signals. For these reasons, in some AMI deployments, powerline communication (PLC) has been
|
|
chosen for the meters' uplink.
|
|
|
|
Since the early days of the electrical grid, powerline communication has been used to control devices spread throughout
|
|
the grid from a central transmitter~\cite{rs48}. PLC systems super-impose a modulated high-frequency signal on top of
|
|
the grid voltage. When the carrier frequency of this modulation is in the audible frequency range, low data rates can be
|
|
transmitted over distances of several tens of kilometers. By using a radio frequency carrier, higher data rates can be
|
|
achieved across shorter distances. Audio frequency PLC, called ``ripple control'', is still used today by utilities to
|
|
enable demand-side response, by remotely switching on and off water heaters to avoid times of peak electricity demand.
|
|
|
|
Usually, such powerline communication systems are uni-directional but they are instance of bi-directional powerline
|
|
communication for smart meter reading such as the italian smart meter deployment~\cite{ec03,rs48,gungor01,agf16}.
|
|
|
|
\section{Related work}
|
|
\label{sec_related_work}
|
|
|
|
\subsection{IoT and Smart Grid security}
|
|
|
|
The security of IoT devices as well as the smart grid has received extensive attention in the
|
|
literature~\cite{nbck+19,acsc20,smp18,ykll17,anderson01,anderson02,zlmz+21,kgma21,hcb19,mpdm+10,lzlw+20,chl20,lam21,olkd20,yomu+20}.
|
|
The challenges of IoT device security and the security of smart meters and other smart grid devices are similar because
|
|
smart grid devices are essentially IoT devices in a particularly sensitive location~\cite{acsc20}. In both device types,
|
|
the challenge is that securing embedded firmware is difficult, and adding network interfaces and cost constraints only
|
|
makes the task harder.
|
|
|
|
In~\cite{smp18}, Soltan, Mittal and Poor investigated an attack scenario where an attacker first gains control over a
|
|
large number of high wattage devices through an IoT security vulnerability, then uses this control to cause rapid load
|
|
spikes. The researchers performed computer simulations for a range of parameters and concluded that given sufficiently
|
|
many compromised devices, an attacker can cause issues up to a large-scale blackout.
|
|
|
|
In~\cite{hcb19}, Huang, Cardenas and Baldick raised a counter-point to the conclusions of Soltan et al., finding that
|
|
limitations of their simulations in~\cite{smp18} have lead them to over-estimate the severity of an attack. Using a more
|
|
accurate model, they confirmed that such attacks can cause problems such as localized blackouts and the decay of the
|
|
grid into islands, but they found that overall the electrical grid is less vulnerable than previously assumed and
|
|
particularly large-scale blackouts are very unlikely, primarily due to the action of protection systems such as load
|
|
shedding and over frequency protection.
|
|
|
|
From literature, we get the overall impression that both IoT and Smart Grid security are challenging. Both lack behind
|
|
the security standard of state of the art desktop, server and smartphone operating systems. Reasons for this are the
|
|
relatively recent nature of the IoT software ecosystem and the large number of independent implementations. A unique
|
|
challenge to Smart Grid security is that due to the fragmentation of markets along national borders, certain devices
|
|
such as smart meters or DSR implementations exist in large monocultures.
|
|
|
|
Compared to IoT and Smart Grid devices, the embedded firmware foundations of modern smartphones have received more
|
|
attention both from the industry and from academia. Pinto and Santos in~\cite{pinto01} conducted a survey of
|
|
implementations based on ARM's TrustZone embedded virtualization architecture and found a significant number of reported
|
|
vulnerabilities across different implementations. For instance, Rosenberg in~\cite{rosenberg01} found critical issues in
|
|
Qualcomm's QSEE hypervisor, and Kanonov and Wool in~\cite{kanonov01} identified a number of design weaknesses and
|
|
security vulnerabilities in Samsung's competing KNOX virtualization product. To us, the state of the field of embedded
|
|
security indicates that even if significant effort is spent on the security of IoT and Smart Grid devices to catch up
|
|
with desktop, server and smartphone security, significant vulnerabilities are likely to remain for some time to come.
|
|
In this instance, market forces do not align with the interest of the public at large. Vulnerabilities remain likely,
|
|
especially in code implementing complex network protocols such as TLS~\cite{georgiev01}, which may even be mandated by
|
|
national standards in some devices such as smart electricity meters.
|
|
|
|
\subsection{Oscillations in the electrical grid}
|
|
|
|
Common to the attacks on the electrical grid proposed in the papers discussed above is their approach of overloading
|
|
parts of the grid. However, scenarios have been proposed that go beyond a simple overload condition, and in which an
|
|
attacker exploits the physcial characteristics of the grid to cause oscillations of increasing amplitude, ultimately
|
|
triggering a cascade of protection mechanisms. The purpose of this type of attack is to use a small controllable load to
|
|
cause outsized damage.
|
|
|
|
Electro-mechanical oscillation modes between different geographical areas of an electrical grid are a well-known
|
|
phenomenon. In their book~\cite{rogers01}, Rogers and Graham provide an in-depth analysis of these oscillations and
|
|
their mitigation. In~\cite{grebe01}, Grebe, Kabouris, López Barba et al.\ analyzed modes inherent to the
|
|
continental European grid. A report on an event where an oscillation on one such mode caused a problem can be found in
|
|
\cite{entsoe01}.
|
|
|
|
In~\cite{zlmz+21}, Zou, Liu, Ma et al.\ analyzed the possibility of a modal attack in which electric vehicle chargers
|
|
rapidly modulate their power to force an oscillation of a poorly dampened wide-area electromechanical mode. Using
|
|
mathematical analysis, small-scale simulations and practical experiments they validated the attack scenario and
|
|
developed a countermeasure that can be implemented as part of generator control systems and that when activated can
|
|
suppress forced oscillations of wide-area electromechanical modes.
|
|
|
|
On the device side of the smart grid, research has concentrated on smart meter security. Smart meters are
|
|
architecturally similar to IoT devices~\cite{zheng01,ifixit01}, but come with different challenges. Similar to a
|
|
high-power IoT device, an attacker could use an off-switch built as part of an attack, a scenario that was investigated
|
|
by Anderson and Fuloria in~\cite{anderson01}. Unique to smart meters, an attacker could, however, also use their control
|
|
to manipulate the meter's energy accounting, quickly leading to potentially severe financial impact on the meter's
|
|
operating utility company. This scenario has received research attention~\cite{anderson02,mcdaniel01} and this is where
|
|
industry incentives are the strongest.
|
|
|
|
Smart electricity meters are consumer devices built down to a price and manufacturers' firmware security R\&D budgets
|
|
are limited by the high degree of market fragmentation that is caused by mutually incompatible national smart metering
|
|
standards. Landis+Gyr, a large utility meter manufacturer, state in their 2019 annual report that they invested
|
|
\SI{36}{\percent} of their total R\&D budget on embedded software while spending only \SI{24}{\percent} on hardware
|
|
R\&D~\cite{landisgyr01,landisgyr02}, which indicates tension between firmware security and the manufacturers's bottom
|
|
line.
|
|
|
|
\subsection{Proposed Countermeasures}
|
|
|
|
In~\cite{kgma21}, the authors propose an extension to grid control algorithms aimed at increasing the grid's robustness
|
|
towards forced oscillations. In~\cite{smp18}, the authors propose that utility operators use a detailed attacker model
|
|
to engineer additional safety margins into the grid while minimizing the economic inefficiency of these measures. On the
|
|
IoT side, they note that due to the wide implementation diversity, the problem cannot be solved by individual measures
|
|
and propose additional fundamental research on IoT device security.
|
|
|
|
In~\cite{hcb19}, the authors conclude that simple demand attacks where compromised loads suddenly increase demand are
|
|
adequately mitigated by existing safety measures, in particular \emph{Under-Frequency Load Shedding} (UFLS). As part of
|
|
UFLS, during a contingency the utility will progressively disconnected loads according to set priorities until the
|
|
production / generation balance has been restored and a blackout has been averted. UFLS is already deployed in any large
|
|
electrical grid.
|
|
|
|
% FIXME more sources!
|
|
|
|
\section{Grid Frequency as a Communication Channel}
|
|
|
|
During a large-scale cyberattack, availability of internet and cellular connectivity cannot be relied upon. An attacker
|
|
may already have disabled such systems in a separate attack, or they may go down along with parts of the electrical
|
|
grid. Powerline communication systems will likely be unaffected by an attack, but at a range of no more than several
|
|
tens of kilometers, covering the entire grid would require a large upfront infrastructure investment for transmitters.
|
|
|
|
We propose to approach the problem of broadcasting an emergency signal to all grid-connected devices such as smart
|
|
meters or IoT appliances within a synchronous area by using grid frequency as a communication channel. Despite the
|
|
technological complexity of the grid, the physics underlying its response to changes in load and generation is
|
|
surprisingly simple. Individual machines (loads and generators) can be approximated by a small number of differential
|
|
equations describing their control systems' interaction with the machine's physics, and the entire grid can be modelled
|
|
by aggregating these approximations into a large system of differential equations. As a consequence, small signal
|
|
changes in generation/consumption power balance cause an approximately proportional change in
|
|
frequency~\cite{kundur01,crastan03,entsoe02,entsoe04}. The slope of this first-order approximation is known as
|
|
\emph{Power Frequency Charactersistic}, and in case of the continental European synchronous area happens to be about
|
|
\SI{25}{\giga\watt\per\hertz} according to the European electricity grid authority, ENTSO-E.
|
|
|
|
If we modulate the power consumption of a large load, this modulation will result in a small change in frequency
|
|
according to this characteristic. As long as we stay within the operational limits set by
|
|
ENTSO-E~\cite{entsoe02,entsoe03}, this change will not degrade the operation of other parts of the grid. The advantages
|
|
of grid frequency modulation are the fact that a single transmitter can cover an entire synchronous area as well as low
|
|
receiver hardware complexity.
|
|
|
|
To the best of the authors' knowledge, grid frequency modulation has only ever been proposed as a communication channel
|
|
at very small scales in microgrids before~\cite{urtasun01} and has not yet been considered for large-scale application.
|
|
|
|
Compared to traditional channels such as Fiber To The Home (FTTH), 5G or LoraWAN, grid frequency as a communication
|
|
channel has a resiliency advantage: If there is power, a grid frequency modulation system is operational. Both FTTH and
|
|
5G systems not only require power at their base stations, but also require centralized infrastructure to operate. Mesh
|
|
networks such as LoraWAN can cover short distances up to $\SI{20}{\kilo\meter}$ without requiring infrastructure to be
|
|
available, but for longer distances LoraWAN relies on the public internet for its network backbone. Additionally,
|
|
systems such as FTTH, 5G and LoraWAN are built around a point-to-point communication model and usually do not support a
|
|
generic broadcast primitive. During times when a large number of devices must be reached simultaneously this can lead to
|
|
congestion of cellular towers and servers. Therefore, during an ongoing cyberattack, grid frequency is promising as a
|
|
communication channel because only a single transmitter facility must be operational for it to function, and this single
|
|
transmitter can reach all connected devices simultaneously. After a power outage, it can resume operation as soon as
|
|
electrical power is restored, even while the public internet and mobile networks are still offline. It is unaffected by
|
|
cyberattacks that target telecommunication networks.
|
|
|
|
\subsection{Characterizing Grid Frequency}
|
|
\label{grid-freq-characterization}
|
|
|
|
Before analyzing grid frequency as a communication channel, we developed a device that allows us to collect ground truth
|
|
for our analysis by safely recording the grid voltage waveform. Our system consists of an \texttt{STM32F030F4P6} ARM
|
|
Cortex M0 microcontroller that records mains voltage using its internal 12-bit ADC and transmits measured values through
|
|
a galvanically isolated USB/serial bridge to a host computer. We derive our system's sampling clock from a crystal oven
|
|
to avoid frequency measurement noise due to thermal drift of a regular crystal: \SI{1}{ppm} of crystal drift would cause
|
|
a grid frequency error of $\SI{50}{\micro\hertz}$. We compared our oven-stabilized clock against a GPS 1 pps reference
|
|
and found that over a time span of 20 minutes both stayed stable within 5 ppb of each other, which corresponds to the
|
|
drift specification of a typical crystal oven.
|
|
|
|
In utility SCADA systems, Phasor Measurement Units (PMUs) are used to precisely measure grid frequency among other
|
|
parameters. Details on the inner workings of commercial phasor measurement units are scarce but there is a large amount
|
|
of academic research on their measurement algorithms. PMUs employ complex signal analysis algorithms to provide fast
|
|
and precise measurements even when given a heavily distorted input signal~\cite{narduzzi01,derviskadic01,belega01}.
|
|
|
|
In our application, we do not need the same level of precision. For the sake of simplicity, we use the universal
|
|
frequency estimation approach of Gasior and Gonzalez~\cite{gasior01}. In this algorithm, the windowed input signal is
|
|
processed using a Discrete Fourier Transform (DFT), then the signal's fundamental frequency is interpolated by fitting a
|
|
wavelet to the largest peak in the DFT result. The bias parameter of this curve fit is an accurate estimation of the
|
|
signal's fundamental frequency. This algorithm is similar to the interpolated DFT algorithm referenced by phasor
|
|
measurement literature~\cite{borkowski01}.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.45\textwidth]{../notebooks/fig_out/freq_meas_spectrum_new}
|
|
\caption{The spectrum of grid frequency variations measured over 24 hours. The raw spectrum is shown in gray, and a
|
|
smoothed spectrum is shown in red. The blue line is inversely proportional to frequency and illustrates the $1/f$
|
|
nature of the spectrum. Distinctive peaks in the spectrum are marked with red crosses, and their locations
|
|
are given on the bottom of the diagram.}
|
|
\label{fig_freq_spec}
|
|
\end{figure}
|
|
|
|
Using our grid frequency recorder, we performed a two-day measurement series of grid frequency.
|
|
Figure~\ref{fig_freq_spec} shows the frequency spectrum of grid frequency over this two-day span. In this spectrum, we
|
|
observe a number of features. Across the frequency range, we observe a broad $1/f$ noise. Above a period of
|
|
$\SI{10}{\second}$, this $1/f$ noise dips to a flat noise floor. We estimate that this low-noise region is caused by the
|
|
self-regulating effect of loads. %FIXME citation Above a $\SI{10}{\second}$ period, primary control is activated and
|
|
thus the $1/f$ noise we observe is the result of the interaction between primary control and consumer demand. On top of
|
|
this $1/f$ behavior, the spectrum shows several sharp peaks at time intervals with a ``round'' number such as
|
|
$\SI{10}{\second}$, $\SI{60}{\second}$ or multiples of $\SI{300}{\second}$. These peaks are due to loads turning on- or
|
|
off depending on wall-clock time, and demand forecasting not being able to precisely match the amplitude of these large
|
|
changes in load. Besides the narrow peaks caused by this effect we can also observe two wider bumps at
|
|
$\SI{7.0}{\second}$ and $\SI{4.7}{\second}$. These bumps closely correlate with continental European synchonous area's
|
|
oscillation modes at $\SI{0.15}{\hertz}$ (east-west) and $\SI{0.25}{\hertz}$ (north-south)~\cite{grebe01}.
|
|
|
|
\section{Grid Frequency Modulation}
|
|
|
|
A transmitter for grid frequency modulation would be a controllable load of several Megawatt that
|
|
is located centrally within the grid. A baseline implementation would be a spool of wire submerged in a body of cooling
|
|
liquid (such as a small lake) which is powered from a
|
|
thyristor rectifier bank. Compared to this baseline solution, hardware and maintenance investment can be decreased
|
|
by repurposing a large industrial load as a transmitter. Going through a
|
|
list of energy-intensive industries in Europe~\cite{ec01}, we found that an aluminium smelter would be a good candidate.
|
|
In aluminium smelting, aluminium is electrolytically extracted from alumina solution. High-voltage mains power is
|
|
transformed, rectified and fed into approximately 100 series-connected electrolytic cells forming a \emph{potline}.
|
|
Inside these pots, alumina is dissolved in molten cryolite electrolyte at approximately \SI{1000}{\degreeCelsius} and
|
|
electrolysis is performed using a current of tens or hundreds of Kiloampère. The resulting pure aluminium settles at the
|
|
bottom of the cell and is tapped off for further processing.
|
|
|
|
Aluminium smelters are operated around the clock, and due to the high financial stakes their behavior under power
|
|
outages has been carefully characterized. Power outages of tens of minutes up to two hours reportedly do not cause
|
|
problems in aluminium potlines~\cite{eisma01,oye01}. Recently, even techniques for intentional power modulation without
|
|
affecting cell lifetime or product quality have been developed to take advantage of variable energy
|
|
prices~\cite{duessel01,eisma01,depree01}. An aluminium plant's power supply is controlled to constantly keep all
|
|
smelter cells under optimal operating conditions. Modern power supply systems employ large banks of diodes or thyristors
|
|
to rectify low-voltage AC to DC to be fed into the potline~\cite{ayoub01}. Potline voltage is controlled through a
|
|
combination of a tap changer and a transductor. Individual cell voltages are controlled by changing the physical
|
|
distance between anode and cathode distance. In this setup, power can be electronically modulated using the thyristor
|
|
rectifier. Since the system does not have any mechanical inertia, high modulation rates are possible.
|
|
|
|
In~\cite{depree01}, the authors describe a setup where a large Aluminium smelter in continental Europe is used as
|
|
primary control reserve for frequency regulation. In this setup, a rise time of $\SI{15}{\second}$ was achieved to meet
|
|
the $\SI{30}{\second}$ requirement posed by local standards for primary control. In their conclusion, the authors note
|
|
that for their system, an effective thermal energy storage capacity of $\SI{7.7}{\giga\watt\hour}$ is possible if all
|
|
plants of a single operator are used. Given the maximum modulation depth of $\SI{100}{\percent}$ for up to one hour that
|
|
is mentioned by the authors, this results in an effective modulation power of $\SI{7.7}{\giga\watt}$. Over a longer
|
|
timespan of $\SI{48}{\hour}$, they have demonstrated a $\SI{33}{\percent}$ modulation depth which would correspond to a
|
|
modulation power of $\SI{2.5}{\giga\watt}$. We conclude that a modulation of part of an aluminium smelter's power
|
|
consumption is possible at no significant production impact and at low infrastructure cost. Aluminium smelters are
|
|
already connected to the grid in a way that they do not pose a danger to other nearby consumers when they turn off or on
|
|
parts of the plant, as this is commonplace during routine maintenance activities.
|
|
|
|
\subsection{Parametrizing Modulation for GFM}
|
|
|
|
Given the grid characteristics we measured using our custom waveform recorder and using a model of our transmitter, we
|
|
can derive parameters for the modulation of our broadcast system. The overall network power-frequency characteristic of
|
|
the continental European synchronous area is approximately $\SI{25}{\giga\watt\per\hertz}$~\cite{entsoe02}. Thus, the
|
|
main challenge for a GFM system will be poor signal-to-noise ratio (SNR) due to low transmission power. A second layer
|
|
of modulation yielding some modulation gain beyond the basic amplitude modulation of the transmitter will be necessary
|
|
to achieve sufficient overall SNR.
|
|
|
|
The grid's frequency noise has significant localized peaks that might interfere with this modulation. Further
|
|
complicating things are the oscillation modes. A GFM system must be designed to avoid exciting these modes. However,
|
|
since these modes are not static, a modulation method that is designed around a specific assumption of their location
|
|
would not be future proof. Given these concerns, the optimal second-level modulation technique for GFM is a
|
|
spread-spectrum technique. By spreading signal energy throughout a wide band, both the impact of local noise spikes is
|
|
minimized and the risk of mode excitation is reduced since spread-spectrum techniques minimize energy in any particular
|
|
sub-band.
|
|
|
|
The spread-spectrum technique that we chose is Direct Sequence Spread Spectrum for its simple implementation and good
|
|
overall performance. DSSS chip timing should be as fast as the transmitter's physics allow to exploit the low-noise
|
|
region between $\SI{0.2}{\hertz}$ to $\SI{2.0}{\hertz}$ in Figure~\ref{fig_freq_spec}. Going past
|
|
$\approx\SI{2}{\hertz}$ would complicate frequency measurement at the receiver side.
|
|
|
|
\subsubsection{Direct Sequence Spread Spectrum (DSSS) modulation}
|
|
|
|
Direct Sequence Spread Spectrum modulation is a common spread-spectrum technique that forms the basis of a number of
|
|
radio systems, most prominently all global navigation satellite systems (GNSS). As a spread-spectrum technique, DSSS
|
|
spreads out the signal's energy across a broad spectral range. This decreases the susceptibility of a DSSS signal to
|
|
narrowband interference. In GNSS, this allows the rejection of other nearby RF sources. In our use case, this makes the
|
|
signal immune to the many narrow peaks in the grid frequency's noise spectrum that are caused by UTC-synchronized
|
|
control systems (cf.~Fig.~\ref{fig_freq_spec}). In addition to better interference immunity, DSSS has two other
|
|
important characteristics: It provides \emph{modulation gain}, i.e.~it allows a trade-off between data rate and receiver
|
|
sensitivity, and it allows for Code Division Multiple Access (CDMA). In CDMA, multiple DSSS-modulated signals can be
|
|
sent simultaneously through a shared channel with less impact to the resulting signal-to-noise ratio (SNR) than would be
|
|
the case for other modulation techniques.
|
|
|
|
A DSSS signal is made up from pseudo-random \emph{symbols}, which in turn are made up from individual physical layer
|
|
bits called \emph{chips}. Chips are encoded in the signal using a lower-layer modulation such as phase-shift keying
|
|
(e.g.~in GPS) or frequency-shift keying (in this work). In DSSS, a \emph{code} is a library of symbols that are
|
|
constructed to have minimal cross-correlation, meaning they are near-orthogonal. A transmitter sends a symbol by
|
|
transmitting its particular pseudo-random chip sequence at a chosen polarity, conveying one bit of information. A
|
|
receiver demodulates the signal by directly correlating the incoming physical-layer signal with the symbol's chip
|
|
pattern, which results in a positive or negative peak depending on symbol polarity when a symbol is received.
|
|
|
|
By increasing the DSSS sequence length by a factor of $2$, SNR is improved by $\sqrt{2}$ assuming an additive white
|
|
gaussian noise (AWGN) channel. At the same time, when doubling the sequence length, common DSSS code construction
|
|
methods provide twice the number of distinctive symbols allowing for twice the number of CDMA participants. The trade
|
|
off between twice the sequence length (and transmission time) for approximately $\SI{1.5}{dB}$ in SNR is a steep
|
|
trade-off, but is necessary in systems where transmitter power cannot be increased further and the resulting signal has
|
|
a marginally low SNR.
|
|
|
|
\subsubsection{DSSS parametrization}
|
|
|
|
To find the parameters for our DSSS modulation, we simulated a proof-of-concept modulator and demodulator using data
|
|
captured from our grid frequency sensor. Our simulations covered a range of combinations of modulation amplitude, DSSS
|
|
sequence bit depth, chip duration and detection threshold. Figure~\ref{fig_ser_nbits} shows our simulation results for
|
|
symbol error rate (SER) as a function of modulation amplitude with Gold sequences of several bit depths. From these
|
|
graphs we conclude that the range of practical modulation amplitudes starts at approximately $\SI{1}{\milli\hertz}$,
|
|
which corresponds to a modulation power of approximately $\SI{25}{\mega\watt}$~\cite{entsoe02}.
|
|
Figure~\ref{fig_ser_thf} shows SER against detection threshold relative to background noise. Figure~\ref{fig_ser_chip}
|
|
shows SER against chip duration for a given fixed symbol length. As expected from looking at our measured grid frequency
|
|
noise spectrum, performance is best for short chip durations and worsens for longer chip durations since shorter chip
|
|
durations move our signals' bandwidth into the lower-noise region from $\SI{0.2}{\hertz}$ to $\SI{2}{\hertz}$.
|
|
%FIXME introduce term "chip" somewhere
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.45\textwidth]{../notebooks/fig_out/dsss_gold_nbits_overview}
|
|
\caption{Symbol Error Rate as a function of modulation amplitude for Gold sequences of several lengths.}
|
|
\label{fig_ser_nbits}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\hspace*{-5mm}\includegraphics[width=0.5\textwidth]{../notebooks/fig_out/dsss_thf_amplitude_5678}
|
|
\vspace*{-5mm}
|
|
\caption{SER vs.\ Amplitude and detection threshold. Detection threshold is set as a factor of background noise
|
|
level.}
|
|
\label{fig_ser_thf}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\hspace*{-5mm}\includegraphics[width=0.5\textwidth]{../notebooks/fig_out/chip_duration_sensitivity_6}
|
|
\vspace*{-5mm}
|
|
\caption{SER vs.\ DSSS chip duration.}
|
|
\label{fig_ser_chip}
|
|
\end{figure}
|
|
|
|
\subsection{Parametrizing a proof-of-concept ``Safety Reset'' System Based on GFM}
|
|
|
|
%FIXME introduce scenario
|
|
Taking these modulation parameters as a starting point, we proceeded to create a proof-of-concept smart meter emergency
|
|
reset system. On top of the modulation described in the previous paragraphs we layered simple Reed-Solomon error
|
|
correction~\cite{mackay01} and some cryptography. The goal of our PoC cryptographic implementation was to allow the
|
|
sender of an emergency reset broadcast to authorize a reset command to all listening smart meters. An additional
|
|
constraint of our setting is that due to the extremely slow communication channel all messages should be kept as short
|
|
as possible. The solution we chose for our PoC is a simplistic hash chain using the approach from the Lamport and
|
|
Winternitz One-time Signature (OTS) schemeS~\cite{lamport02,merkle01}. Informally, the private key is a random
|
|
bitstring. The public key is generated by recursively applying a hash function to this key a number of times. Each smart
|
|
meter reset command is then authorized by disclosing subsequent elements of this series. Unwinding the hash chain from
|
|
the public key at the end of the chain towards the private key at its beginning, at each step a receiver can validate
|
|
the current command by checking that it corresponds to the previously unknown input of the current step of the hash
|
|
chain. Replay attacks are prevented by the device memorizing the most recent valid command. Keys revocation is supported
|
|
by designating the last key in the chain as a \emph{revocation key} upon whose reception the client devices advance
|
|
their local hash ratchet without taking further action. This simple scheme does not afford much functionality but it
|
|
results in very short messages and removes the need for computationally expensive public key cryptography inside the
|
|
smart meter.
|
|
|
|
Formally, we can describe our simple cryptographic protocol as follows. Given an $n$-bit cryptographic hash function $H
|
|
: \{0,1\}^*\rightarrow\{0,1\}^m$ and a private key $k_0 \in \{0,1\}^m$, we construct the public key as
|
|
$k_{n_\text{total}} = H^{n_\text{total}}(k_0)$ where $H^n(x)$ denotes the $n$-times recursive application of $H$ to
|
|
itself, i.e.\ $\underbrace{H(H(\hdots H(}_{n\;\text{times}}x)))$. $q$ is the total number of signatures that the system can
|
|
issue. $n_\text{total}$ must be chosen with adequate safety margin to account for unpredictable future use of the
|
|
system. The choice of $n_\text{total}$ is of no consequence when a device checks reset authorization, but key generation
|
|
time grows linearly with $n_\text{total}$ since $H$ needs to applied $n_\text{total}$ times. In practice, given the
|
|
speed of modern computers, values of $n_\text{total} > 10^9$ should pose no problem during key generation. For public
|
|
key $k_{n_\text{total}}$, the system can authorize up to $n_\text{total}$ commands by successively disclosing the $k_i$
|
|
starting at $i=n-1$ and counting down until finally disclosing $k_0$. Since we only want to transmit a single bit of
|
|
information, we do not need any payload. Instead, we simply send a message $m = (k_i)$ consisting solely of $k_i$. The
|
|
receiver of a message $m$ can check that the message is a legitimate command by checking $\exists i<q: H^i(m) =
|
|
k_\text{last}$ where $k_\text{last}$ is the last valid command that was received. $q$ is the maximum lookup depth that
|
|
the device will accept as valid. To conserve processing power, $q$ should be chosen to be much smaller than
|
|
$n_\text{total}$. Choosing $q$ too small, a device might become out of sync with the transmitter when it is disconnected
|
|
from the electrical grid for a long enough time for at least $q$ commands to be issued in the meantime. In practice,
|
|
this should not be a concern since only few commands should be issued over the life time of the system.
|
|
|
|
During an emergency situation, not all safety reset controllers might be online at the same time. In case the electrical
|
|
grid is restored piece by piece with safety reset controllers coming back online in batches, an utility might repeatedly
|
|
transmit the same reset command. In our protocol, we handle this situation by memorizing the last valid received command
|
|
on the device side, and only acting \emph{once} when a new command is received. The transmission of one command thus
|
|
becomes idempotent, and the utility can repeat the command until sufficiently many devices have received the command and
|
|
e.g.\ performed a safety reset.
|
|
|
|
In our protocol, we define two commands, \emph{reset} and \emph{disarm}. We assign \emph{reset} and \emph{disarm} to the
|
|
$k_i$ alternatingly. For odd $i$, $k_i$ is a reset command and for even $i$, $k_i$ is a \emph{disarm} command. To
|
|
trigger a safety reset, the utility transmits the next unused $k_{2i+1}$. The utility may transmit this command repeatedly
|
|
to also reset devices that have come online only after earlier transmissions have started. After a sufficient number of
|
|
devices have performed a safety reset, the utility then transmits the next disarm command, $k_{2i}$. When devices
|
|
receive the disarm command, they still update the last received command, but they do not perform any other action.
|
|
|
|
The reason for interleaving two commands in this way is to prevent a specific attack scenario in which an attacker first
|
|
observes a safety reset command being transmitted, and then at a later time gains access to a large load that could act
|
|
as a grid frequency modulation transmitter. Without a \emph{disarm} command, this attacker could then later trigger a
|
|
safety reset in any device that has not received the original reset command yet. The \emph{disarm} command gives the
|
|
utility the option to revoke a prior \emph{reset} command before any devices that were offline during the original reset
|
|
without triggering them to reset.
|
|
|
|
% FIXME add more precise/formal description of crypto
|
|
% FIXME add description of targeting/scope function?
|
|
% FIXME somewhere above descirbe entire reset system architecture????!!!
|
|
% FIXME add description of disarm message (replay protection)
|
|
|
|
\subsection{Experimental results}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.45\textwidth]{prototype.jpg}
|
|
\caption{The completed prototype setup. The board on the left is the safety reset microcontroller. It is connected
|
|
to the smart meter in the middle through an adapter board. The top left contains a USB hub with debug interfaces to
|
|
the reset microcontroller. The cables on the bottom left are the debug USB cable and the \SI{3.5}{\milli\meter}
|
|
audio cable for the simulated mains voltage input.}
|
|
\label{fig_proto_pic}
|
|
\end{figure}
|
|
|
|
For a realistic proof of concept, we decided to implement our signal processing chain from DSSS demodulator through
|
|
error correction up to our simple cryptography layer in microcontroller firmware and demonstrate this firmware on actual
|
|
smart meter hardware, shown in Figure~\ref{fig_proto_pic}. In our proof of concept a safety reset controller is
|
|
connected to the main application microcontroller of a smart meter. The reset controller is tasked with listening for
|
|
authenticated reset commands on the voltage waveform, and on reception of such a command resetting the smart meter
|
|
application controller by flashing a known-good firmware image to its memory.
|
|
|
|
The signal processing chain of our PoC is shown in Figure~\ref{fig_demo_sig_schema}. To interoperate with existing
|
|
implementations of SHA-512 and reed-solomon decoding, this implementation was written in the C programming language. To
|
|
demonstrate an application close to a field implementation, we chose an Easymeter \texttt{Q3DA1002} smart meter as our
|
|
reset target. This model is popular in the German market and readily available second-hand. The meter consists of three
|
|
isolated metering ASICs connected to a data logging and display PCB through infrared optical links. To demonstrate the
|
|
safety reset's firmware reset functionality, we connected our safety reset microcontroller to the Texas Instruments
|
|
\texttt{MSP430} microcontroller on the meter's display and data logging board through the JTAG debug interface that the
|
|
board's vendor had conveniently left accessible. We ported part of
|
|
\texttt{mspdebug}\footnote{\url{https://dlbeer.co.nz/mspdebug/}} to drive the meter microcontroller's JTAG interface and
|
|
wrote a piece of demonstrator code that overwrites the meter's firmware with one that displays an identifying string on
|
|
the meter's display after boot-up.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.45\textwidth]{prototype_schema}
|
|
\caption{The signal processing chain of our demonstrator.}
|
|
\label{fig_demo_sig_schema}
|
|
\end{figure}
|
|
|
|
To measure grid frequency in our demonstrator, we ported the same code we used in
|
|
Section~\label{grid-freq-characterization} to our demonstrator, again using the voltage measured using the
|
|
microcontroller's internal ADC but using a regular crystal instead of a crystal oven for the microcontroller's system
|
|
clock. Since we did not have an aluminium smelter ready, we decided to feed our proof-of-concept reset controller with
|
|
an emulated grid voltage sine wave from a computer's headphone jack. Where in a real application this microcontroller
|
|
would take ADC readings of input mains voltage divided down by a long resistive divider chain, we instead feed the ADC
|
|
from a $\SI{3.5}{\milli\meter}$ audio input. For operational safety, we disconnected the meter microcontroller from its
|
|
grid-referenced capacitive dropper power supply and connected it to our reset controlller's debug USB power supply.
|
|
|
|
We performed several successful experiments using a signature truncated at 120 bit and a 5 bit DSSS sequence. Taking the
|
|
sign bit into account, the length of the encoded signature is 20 DSSS symbols. On top of this we used Reed-Solomon error
|
|
correction at a 2:1 ratio inflating total message length to 30 DSSS symbols. At the \SI{1}{\second} chip rate we used in
|
|
other simulations as well this equates to an overall transmission duration of approximately \SI{15}{\minute}. To give
|
|
the demodulator some time to settle and to produce more realistic conditions of signal reception we padded the modulated
|
|
signal unmodulated noise on both ends.
|
|
|
|
\section{Lessons learned}
|
|
|
|
For our proof of concept, before settling on the commercial smart meter we first tried to use an \texttt{EVM430-F6779}
|
|
smart meter evaluation kit made by Texas Instruments. This evaluation kit did not turn out well for two main reasons.
|
|
One, it shipped with half the case missing and no cover for the high-voltage terminal blocks. Because of this some work
|
|
was required to get it electrically safe. Even after mounting it in an electrically safe manner the safety reset
|
|
controller prototype would also have to be galvanically isolated to not pose an electrical safety risk since the main
|
|
MCU is not isolated from the grid and the JTAG port is also galvanically coupled. The second issue we ran into was that
|
|
the development board is based around a specific microcontroller from TI's \texttt{MSP430} series that is incompatible
|
|
with common JTAG programmers.
|
|
|
|
Our initial assumption that a development kit would be easier to program than a commercial meter did not prove to be
|
|
true. Contrary to our expectations the commercial meter had JTAG enabled allowing us to easily read out its stock
|
|
firmware without either reverse-engineering vendor firmware update files nor circumventing code protection measures.
|
|
The fact that its firmware was only available in its compiled binary form was not much of a hindrance as it proved not
|
|
to be too complex and all we wanted to know we found with just a few hours of digging in
|
|
Ghidra\footnote{\url{https://ghidra-sre.org/}}.
|
|
|
|
In the firmware development phase our approach of testing every module individually (e.g. DSSS demodulator, Reed-Solomon
|
|
decoder, grid frequency estimation) proved useful particularly for debugging. The modular architecture allowed us to
|
|
directly compare our demodulator implementation to our Jupyter/Python prototype, where we found that our C
|
|
implementation outperformed the Python prototype. Despite the algorithms's complexity, the microcontroller C
|
|
implementation has no issues processing data in real-time due to the low sampling rate necessary.
|
|
|
|
\section{Conclusion}
|
|
\label{sec_conclusion}
|
|
|
|
In this paper we have developed an end-to-end design for a safety reset system that provides these capabilities.
|
|
Our novel broadcast data transmission system is based on intentional modulation of global grid frequency. Our system is
|
|
independent of normal communication networks and can operate during a cyberattack. We have shown the practical viability
|
|
of our end-to-end design through simulations. Using our purpose-designed grid frequency recorder, we can capture and
|
|
process real-time grid frequency data in an electrically safe way. We used data captured this way as the basis for
|
|
simulations of our proposed grid frequency modulation communication channel. In these simulations, our system has proven
|
|
feasible. From our simulations we conclude that a large consumer such as an aluminium smelter at a small cost can be
|
|
modified to act as an on-demand grid frequency modulation transmitter.
|
|
|
|
We have demonstrated our modulation system in a small-scale practical demonstration. For this demonstration, we have
|
|
developed a simple cryptographic protocol ready for embedded implementation in resource-constrained systems that allows
|
|
triggering a safety reset with a response time of less than 30 minutes. In this demonstration we use simulated grid
|
|
frequency data to trigger a commercial microcontroller to perform a firmware reset of an off-the-shelf smart meter. The
|
|
next step in our evaluation will be to conduct an experimental evaluation of our modulation scheme in collaboration with
|
|
an utility and an operator of a multi-megawatt load.
|
|
|
|
\subsection{Discussion}
|
|
|
|
During an emergency in the electrical grid, the ability to communicate to large numbers of end-point devices is a
|
|
valuable tool for restoring normal operation. When a resilient communcation channel is available, loads such as smart
|
|
meters and IoT devices can be equipped with a supervisor circuit that allows for a remote ``safety reset'' that puts the
|
|
device into a safe operating state. Using this safety reset, an attacker that uses compromised smart meters or IoT
|
|
devices to attack grid stability can be interrupted before the can conclude their attack. During recovery from an
|
|
outage, a safety reset can be used to reduce stress on the system during a black start by temporarily disabling
|
|
non-essential loads such as air conditioners.
|
|
|
|
The safety reset controller does not require any peripherals except for an ADC. Thus we expect code size to be the main
|
|
factor affecting per-unit cost in an in-field deployment of our concept. At around \SI{64}{\kilo\byte}, our demonstrator
|
|
firmware implementation is viable on low-end microcontrollers. Given that modern smart meters and IoT devices usually
|
|
use complex Systems on Chip (SoCs), a safety reset controller could be integrated into the main application processor
|
|
itself at little added complexity. In summary, we expect safety reset controllers to be commercially viable.
|
|
|
|
Safety reset controllers can be adapted to most IoT device and smart meter designs. Because they are independent from
|
|
other pubilc utilities such as the internet or cellular networks, we believe in their potential as a last line of
|
|
defense providing resilience under large-scale cyberattacks. The next steps towards a practical implementation will be
|
|
a practical demonstration of broadcast data transmission through grid frequency modulation using a megawatt-scale
|
|
controllable load as well as further optimization of the modulation and data encoding as well as the demodulator
|
|
implementation.
|
|
|
|
Source code and EDA designs are available at the public repository listed at the end of this document.
|
|
|
|
\begin{acks}
|
|
This work has been co-funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY center.
|
|
\end{acks}
|
|
|
|
\bibliographystyle{plain}
|
|
\bibliography{\jobname}
|
|
|
|
\center{
|
|
\footnotesize
|
|
\center{This is version \texttt{\input{version.tex}\unskip} of this paper, generated on \today.}
|
|
}
|
|
\end{document}
|