final proof by myself

This commit is contained in:
jaseg 2022-10-06 14:32:44 +02:00
parent 09f918187d
commit 713564b829

View file

@ -399,9 +399,16 @@ communication for smart meter reading~\cite{ec03,rs48,gungor01,agf16}.
The security of IoT devices as well as the smart grid has received extensive attention in the
literature~\cite{nbck+19,acsc20,smp18,ykll17,anderson01,anderson02,zlmz+21,kgma21,hcb19,mpdm+10,lzlw+20,chl20,lam21,olkd20,yomu+20}.
The challenges of IoT device security and the security of smart meters and other smart grid devices are similar because
smart grid devices are essentially IoT devices in a particularly sensitive location~\cite{acsc20}. In both device types,
the challenge is that securing embedded firmware is difficult, and adding network interfaces and cost constraints only
makes the task harder.
smart grid devices are essentially IoT devices in a particularly sensitive location~\cite{zheng01,ifixit01,acsc20}. In
both device types, the challenge is that securing embedded firmware is difficult, and adding network interfaces and cost
constraints only makes the task harder.
In some countries, smart meters can have a built-in off-switch that is used to disconnect customers who do not pay their
electricity bill. An attack scenario in which the attacker compromises a large number of such meters has been discussed
by Anderson and Fuloria in~\cite{anderson01}. In meters that do not have such a switch, an attacker can still use their
access to manipulate the meter's energy accounting, leading to financial impact on the utility operating the meter. This
scenario has received research attention~\cite{anderson02,mcdaniel01} and comes with the most direct industry
incentives.
In~\cite{smp18}, Soltan, Mittal and Poor investigated an attack scenario where an attacker first gains control over a
large number of high wattage devices through an IoT security vulnerability, then uses this control to cause rapid load
@ -424,6 +431,13 @@ relatively recent nature of the IoT software ecosystem and the large number of i
challenge to Smart Grid security is that due to the fragmentation of markets along national borders, certain devices
such as smart meters or DSR implementations exist in large monocultures.
Smart meters are consumer devices built down to a price and manufacturers' firmware security R\&D budgets are limited by
the high degree of market fragmentation that is caused by mutually incompatible national smart metering standards.
Landis+Gyr, a large utility meter manufacturer, state in their 2019 annual report that they invested \SI{36}{\percent}
of their total R\&D budget on embedded software while spending only \SI{24}{\percent} on hardware
R\&D~\cite{landisgyr01,landisgyr02}, which indicates tension between firmware security and the manufacturers's bottom
line.
Compared to IoT and Smart Grid devices, the embedded firmware foundations of modern smartphones have received more
attention both from the industry and from academia. Pinto and Santos in~\cite{pinto01} conducted a survey of
implementations based on ARM's TrustZone embedded virtualization architecture and found a significant number of reported
@ -459,49 +473,37 @@ mathematical analysis, small-scale simulations and limited practical experiments
developed a countermeasure that can be implemented as part of generator control systems and that when activated can
suppress forced oscillations of wide-area electromechanical modes.
On the device side of the smart grid, research has concentrated on smart meter security. Smart meters are
architecturally similar to IoT devices~\cite{zheng01,ifixit01}, but come with different challenges. Similar to a
high-power IoT device, an attacker could use an off-switch built as part of an attack, a scenario that was investigated
by Anderson and Fuloria in~\cite{anderson01}. Unique to smart meters, an attacker could, however, also use their control
to manipulate the meter's energy accounting, quickly leading to potentially severe financial impact on the meter's
operating utility company. This scenario has received research attention~\cite{anderson02,mcdaniel01} and this is where
industry incentives are the strongest.
Smart electricity meters are consumer devices built down to a price and manufacturers' firmware security R\&D budgets
are limited by the high degree of market fragmentation that is caused by mutually incompatible national smart metering
standards. Landis+Gyr, a large utility meter manufacturer, state in their 2019 annual report that they invested
\SI{36}{\percent} of their total R\&D budget on embedded software while spending only \SI{24}{\percent} on hardware
R\&D~\cite{landisgyr01,landisgyr02}, which indicates tension between firmware security and the manufacturers's bottom
line.
\subsection{Proposed Countermeasures}
In~\cite{kgma21}, the authors propose an extension to grid control algorithms aimed at increasing the grid's robustness
towards forced oscillations. In~\cite{smp18}, the authors propose that utility operators use a detailed attacker model
to engineer additional safety margins into the grid while minimizing the economic inefficiency of these measures. On the
IoT side, they note that due to the wide implementation diversity, the problem cannot be solved by individual measures
and propose additional fundamental research on IoT device security.
In parallel with research on theoretical attacks, countermeasures to these have also been proposed in academic
literature. In~\cite{kgma21}, the authors propose an extension to grid control algorithms aimed at increasing the grid's
robustness towards forced oscillations. In~\cite{smp18}, the authors propose that utility operators use a detailed
attacker model to engineer additional safety margins into the grid while minimizing the economic inefficiency of these
measures. On the IoT side, they note that due to the wide implementation diversity, the problem cannot be solved by
individual measures and propose additional fundamental research on IoT device security.
In~\cite{hcb19}, the authors conclude that simple demand attacks where compromised loads suddenly increase demand are
adequately mitigated by existing safety measures, in particular \emph{Under-Frequency Load Shedding} (UFLS). As part of
UFLS, during a contingency the utility will progressively disconnected loads according to set priorities until the
production / generation balance has been restored and a blackout has been averted. UFLS is already deployed in any large
electrical grid.
adequately mitigated by existing safety measures, in particular \emph{Under-Frequency Load Shedding} (UFLS), which forms
the basis of any grid's automatic emergency response. As part of UFLS, during a contingency the utility will
progressively disconnected loads according to set priorities until the production / generation balance has been restored
and a blackout has been averted.
% FIXME more sources!
\section{Grid Frequency as a Communication Channel}
During a large-scale cyber attack, availability of internet and cellular connectivity cannot be relied upon. An attacker
may already have disabled such systems in a separate attack, or they may go down along with parts of the electrical
grid. Powerline communication systems will likely be unaffected by an attack, but at a range of no more than several
tens of kilometers, covering the entire grid would require a large upfront infrastructure investment for transmitters.
The countermeasures discussed above are fully automatic. Such systems can provide a good first line of defense, but they
must be complemented by means of manual intervention since not every eventuality can be anticipated. During a
large-scale cyber attack, availability of internet and cellular connectivity cannot be relied upon. An attacker may
already have disabled such systems in a separate attack, or they may go down along with parts of the electrical grid.
Powerline communication systems will likely be unaffected by an attack, but at a range of no more than several tens of
kilometers, covering the entire grid would require a large upfront infrastructure investment for transmitters.
We propose to approach the problem of broadcasting an emergency signal to all grid-connected devices such as smart
meters or IoT appliances within a synchronous area by using grid frequency as a communication channel. Despite the
technological complexity of the grid, the physics underlying its response to changes in load and generation is
We propose to approach the problem of broadcasting an emergency control signal to all grid-connected devices such as
smart meters or IoT appliances within a synchronous area by using grid frequency as a communication channel. Despite
the technological complexity of the grid, the physics underlying its response to changes in load and generation is
surprisingly simple. Individual machines (loads and generators) can be approximated by a small number of differential
equations describing their control systems' interaction with the machine's physics, and the entire grid can be modelled
equations describing their control systems' interaction with the machines' physics, and the entire grid can be modelled
by aggregating these approximations into a large system of differential equations. As a consequence, small signal
changes in generation/consumption power balance cause an approximately proportional change in
frequency~\cite{kundur01,crastan03,entsoe02,entsoe04}. The slope of this first-order approximation is known as
@ -509,7 +511,7 @@ frequency~\cite{kundur01,crastan03,entsoe02,entsoe04}. The slope of this first-o
\SI{25}{\giga\watt\per\hertz} according to the European electricity grid authority, ENTSO-E.
If we modulate the power consumption of a large load, this modulation will result in a small change in frequency
according to this characteristic. As long as we stay within the operational limits set by
according to that characteristic. As long as we stay within the operational limits set by
ENTSO-E~\cite{entsoe02,entsoe03}, this change will not degrade the operation of other parts of the grid. The advantages
of grid frequency modulation are the fact that a single transmitter can cover an entire synchronous area as well as low
receiver hardware complexity.
@ -521,29 +523,26 @@ at very small scales in microgrids before~\cite{urtasun01} and has not yet been
Compared to traditional channels such as Fiber To The Home (FTTH), 5G or LoraWAN, grid frequency as a communication
channel has a resiliency advantage. It can start transmission as soon as a power island with a connected transmitter is
powered up, while communication networks such as FTTH or 5G are still rebooting, or might be waiting for parts of their
centralized infrastructure that are connected to different power islands to come back online. Mesh networks such as
LoraWAN can cover short distances up to $\SI{20}{\kilo\meter}$ without requiring infrastructure to be available, but for
longer distances LoraWAN relies on the public internet for its network backbone. Additionally, systems such as FTTH, 5G
and LoraWAN are built around a point-to-point communication model and usually do not support a global broadcast
primitive. During times when a large number of devices must be reached simultaneously this can lead to congestion of
cellular towers and servers. Therefore, during an ongoing cyber attack, grid frequency is promising as a communication
channel because only a single transmitter facility must be operational for it to function, and this single transmitter
can reach all connected devices simultaneously. After a power outage, it can resume operation as soon as electrical
power is restored, even while the public internet and mobile networks are still offline. It is unaffected by
cyber attacks that target telecommunication networks.
powered up, while communication networks such as FTTH or 5G are still rebooting or waiting for their centralized
infrastructure to come back online. Mesh networks such as LoraWAN can cover short distances up to $\SI{20}{\kilo\meter}$
without requiring infrastructure to be available, but for longer distances LoraWAN relies on the public internet for its
network backbone. Additionally, systems such as FTTH, 5G and LoraWAN are built around a point-to-point communication
model and usually do not support a global broadcast primitive. During times when a large number of devices must be
reached simultaneously this can lead to congestion of cellular towers and servers. Therefore, during an ongoing cyber
attack, grid frequency is promising as a communication channel because only a single transmitter facility must be
operational for it to function, and this single transmitter can reach all connected devices simultaneously.
\subsection{Characterizing Grid Frequency}
\label{grid-freq-characterization}
Before analyzing grid frequency as a communication channel, we developed a device that allows us to collect ground truth
for our analysis by safely recording the grid voltage waveform. Our system consists of an \texttt{STM32F030F4P6} ARM
Cortex M0 microcontroller that records mains voltage using its internal 12-bit ADC and transmits measured values through
a galvanically isolated USB/serial bridge to a host computer. We derive our system's sampling clock from a crystal oven
to avoid frequency measurement noise due to thermal drift of a regular crystal: \SI{1}{ppm} of crystal drift would cause
a grid frequency error of $\SI{50}{\micro\hertz}$. We compared our oven-stabilized clock against a GPS 1 pps reference
and found that over a time span of 20 minutes both stayed stable within 5 ppb of each other, which corresponds to the
drift specification of a typical crystal oven.
To prepare our analysis of grid frequency modulation, we developed a device that allows us to collect measurements of
actual grid frequency behavior through safely recording the grid voltage waveform. Our system consists of an
\texttt{STM32F030F4P6} ARM Cortex M0 microcontroller that records mains voltage using its internal 12-bit ADC and
transmits measured values through a galvanically isolated USB/serial bridge to a host computer. We derive our system's
sampling clock from a crystal oven to avoid frequency measurement noise due to thermal drift of a regular crystal:
\SI{1}{ppm} of crystal drift would cause a grid frequency error of $\SI{50}{\micro\hertz}$. We compared our
oven-stabilized clock against a GPS 1 pps reference and found that over a time span of 20 minutes both stayed stable
within 5 ppb of each other, which corresponds to the drift specification of a typical crystal oven.
In utility SCADA systems, Phasor Measurement Units (PMUs) are used to precisely measure grid frequency among other
parameters. Details on the inner workings of commercial phasor measurement units are scarce but there is a large amount
@ -579,14 +578,14 @@ Using our grid frequency recorder, we performed a two-day measurement series of
Figure~\ref{fig_freq_spec} shows the frequency spectrum of grid frequency over this two-day span. In this spectrum, we
observe a number of features. Across the frequency range, we observe a broad $1/f$ noise. Above a period of
$\SI{10}{\second}$, this $1/f$ noise dips to a flat noise floor. We estimate that this low-noise region is caused by the
self-regulating effect of loads. %FIXME citation Above a $\SI{10}{\second}$ period, primary control is activated and
thus the $1/f$ noise we observe is the result of the interaction between primary control and consumer demand. On top of
this $1/f$ behavior, the spectrum shows several sharp peaks at time intervals with a ``round'' number such as
$\SI{10}{\second}$, $\SI{60}{\second}$ or multiples of $\SI{300}{\second}$. These peaks are due to loads turning on- or
off depending on wall-clock time, and demand forecasting not being able to precisely match the amplitude of these large
changes in load. Besides the narrow peaks caused by this effect we can also observe two wider bumps at
$\SI{7.0}{\second}$ and $\SI{4.7}{\second}$. These bumps closely correlate with continental European synchonous area's
oscillation modes at $\SI{0.15}{\hertz}$ (east-west) and $\SI{0.25}{\hertz}$ (north-south)~\cite{grebe01}.
self-regulating effect of loads. Above a $\SI{10}{\second}$ period, primary control is activated and thus the $1/f$
noise we observe is the result of the interaction between primary control and consumer demand. On top of this $1/f$
behavior, the spectrum shows several sharp peaks at time intervals with a ``round'' number such as $\SI{10}{\second}$,
$\SI{60}{\second}$ or multiples of $\SI{300}{\second}$. These peaks are due to loads turning on- or off depending on
wall-clock time, and demand forecasting not being able to precisely match the amplitude of these large changes in load.
Besides the narrow peaks caused by this effect we can also observe two wider bumps at $\SI{7.0}{\second}$ and
$\SI{4.7}{\second}$. These bumps closely correlate with continental European synchonous area's oscillation modes at
$\SI{0.15}{\hertz}$ (east-west) and $\SI{0.25}{\hertz}$ (north-south)~\cite{grebe01}.
\section{Grid Frequency Modulation}
@ -598,8 +597,8 @@ energy-intensive industries in Europe~\cite{ec01}, we found that an aluminium sm
aluminium smelting, aluminium is electrolytically extracted from alumina solution. High-voltage mains power is
transformed, rectified and fed into approximately 100 series-connected electrolytic cells forming a \emph{potline}.
Inside these pots, alumina is dissolved in molten cryolite electrolyte at approximately \SI{1000}{\degreeCelsius} and
electrolysis is performed using a current of tens or hundreds of Kiloampère. The resulting pure aluminium settles at the
bottom of the cell and is tapped off for further processing.
electrolysis is performed using a current of tens or hundreds of Kiloampère at a few Volt per cell. The resulting pure
aluminium settles at the bottom of the cell and is tapped off for further processing.
Aluminium smelters are operated around the clock, and due to the high financial stakes their behavior under power
outages has been carefully characterized. Power outages of tens of minutes up to two hours reportedly do not cause
@ -609,8 +608,8 @@ prices~\cite{duessel01,eisma01,depree01}. An aluminium plant's power supply is
smelter cells under optimal operating conditions. Modern power supply systems employ large banks of diodes or thyristors
to rectify low-voltage AC to DC to be fed into the potline~\cite{ayoub01}. Potline voltage is controlled through a
combination of a tap changer and a transductor. Individual cell voltages are controlled by changing the physical
distance between anode and cathode distance. In this setup, power can be electronically modulated using the thyristor
rectifier. Since the system does not have any mechanical inertia, high modulation rates are possible.
distance between anode and cathode. In this setup, power can be electronically modulated using the thyristor rectifier.
Since the system does not have any mechanical inertia, high modulation rates are possible.
In~\cite{depree01}, the authors describe a setup where a large Aluminium smelter in continental Europe is used as
primary control reserve for frequency regulation. In this setup, a rise time of $\SI{15}{\second}$ was achieved to meet
@ -631,12 +630,12 @@ continental European synchronous area, we have to consider operation during a bl
divides into a number of disconnected power islands. A single transmitter would only be able to reach receivers on the
same power island.
Instead, the system can use a number of transmitters that are distributed throughout the network. Piggy-backing
transmitters on existing industrial loads keeps the implementation cost of additional transmitters low. By running
transmitters from gps-synchronized ovenized crystal oscillators or rubidium frequency standards, transmissions can be
precisely synchronized across power islands even after a holdover period of several days. This allows a transmission to
continue un-interrupted while the utility re-joins power island into the larger grid, since the transmissions on both
islands are precisely synchronized.
To alleviate this constraint, the system can use a number of transmitters that are distributed throughout the network.
Piggy-backing transmitters on existing industrial loads keeps the implementation cost of additional transmitters low. By
running transmitters from stable, synchronized frequency standards such as gps-disciplined rubidium standards,
transmissions can be precisely synchronized across power islands even after a holdover period of several days. This
allows a transmission to continue uninterrupted while the utility rejoins power island into the larger grid, since the
transmissions on both islands are precisely synchronized.
As illustrated in Figure~\ref{fig_intro_flowchart}, the transmitters are connected to a command center. For this
connection, a redundant set of long-range radio or satellite links can be used, as well as wired connections through the
@ -672,20 +671,20 @@ Direct Sequence Spread Spectrum modulation is a common spread-spectrum technique
radio systems, most prominently all global navigation satellite systems (GNSS). As a spread-spectrum technique, DSSS
spreads out the signal's energy across a broad spectral range. This decreases the susceptibility of a DSSS signal to
narrowband interference. In GNSS, this allows the rejection of other nearby RF sources. In our use case, this makes the
signal immune to the many narrow peaks in the grid frequency's noise spectrum that are caused by UTC-synchronized
control systems (cf.~Fig.~\ref{fig_freq_spec}). In addition to better interference immunity, DSSS has two other
important characteristics: It provides \emph{modulation gain}, i.e.~it allows a trade-off between data rate and receiver
sensitivity, and it allows for Code Division Multiple Access (CDMA). In CDMA, multiple DSSS-modulated signals can be
sent simultaneously through a shared channel with less impact to the resulting signal-to-noise ratio (SNR) than would be
the case for other modulation techniques.
signal immune to the many narrow peaks in the grid frequency's noise spectrum that are caused by control systems
sychronized to wall-clock time(cf.~Fig.~\ref{fig_freq_spec}). In addition to better interference immunity, DSSS has two
other important characteristics: It provides \emph{modulation gain}, i.e.~it allows a trade-off between data rate and
receiver sensitivity, and it allows for Code Division Multiple Access (CDMA). In CDMA, multiple DSSS-modulated signals
can be sent simultaneously through a shared channel with less impact to the resulting signal-to-noise ratio (SNR) than
would be the case for other modulation techniques.
A DSSS signal is made up from pseudo-random \emph{symbols}, which in turn are made up from individual physical layer
bits called \emph{chips}. Chips are encoded in the signal using a lower-layer modulation such as phase-shift keying
(e.g.~in GPS) or frequency-shift keying (in this work). In DSSS, a \emph{code} is a library of symbols that are
constructed to have minimal cross-correlation, meaning they are near-orthogonal. A transmitter sends a symbol by
constructed to have minimal cross-correlation, i.e.\ they are near-orthogonal. A transmitter sends a symbol by
transmitting its particular pseudo-random chip sequence at a chosen polarity, conveying one bit of information. A
receiver demodulates the signal by directly correlating the incoming physical-layer signal with the symbol's chip
pattern, which results in a positive or negative peak depending on symbol polarity when a symbol is received.
pattern, which results in a positive or negative peak when a symbol is received depending on its polarity.
By increasing the DSSS sequence length by a factor of $2$, SNR is improved by $\sqrt{2}$ assuming an additive white
gaussian noise (AWGN) channel. At the same time, when doubling the sequence length, common DSSS code construction
@ -807,7 +806,7 @@ grid is restored piece by piece with safety reset controllers coming back online
transmit the same reset command. In our protocol, we handle this situation by memorizing the last valid received command
on the device side, and only acting \emph{once} when a new command is received. The transmission of one command thus
becomes idempotent, and the utility can repeat the command until sufficiently many devices have received the command and
e.g.\ performed a safety reset.
performed a safety reset.
In our protocol, we define two commands, \emph{reset} and \emph{disarm}. We assign \emph{reset} and \emph{disarm} to the
$k_i$ in an alternating way. For odd $i$, $k_i$ is a reset command and for even $i$, $k_i$ is a \emph{disarm} command.
@ -900,7 +899,7 @@ sign bit into account, the length of the encoded signature is 20 DSSS symbols. O
correction at a 2:1 ratio inflating total message length to 30 DSSS symbols. At the \SI{1}{\second} chip rate we used in
other simulations as well this equates to an overall transmission duration of approximately \SI{15}{\minute}. To give
the demodulator some time to settle and to produce more realistic conditions of signal reception we padded the modulated
signal unmodulated noise on both ends.
signal with unmodulated noise on both ends.
\section{Lessons learned}
@ -915,14 +914,14 @@ with common JTAG programmers.
Our initial assumption that a development kit would be easier to program than a commercial meter did not prove to be
true. Contrary to our expectations the commercial meter had JTAG enabled allowing us to easily read out its stock
firmware without either reverse-engineering vendor firmware update files nor circumventing code protection measures.
firmware requiring neither reverse-engineering vendor firmware update files nor circumventing code protection measures.
The fact that its firmware was only available in its compiled binary form was not much of a hindrance as it proved not
to be too complex and all we wanted to know we found with just a few hours of digging in
Ghidra\footnote{\url{https://ghidra-sre.org/}}.
In the firmware development phase our approach of testing every module individually (e.g. DSSS demodulator, Reed-Solomon
decoder, grid frequency estimation) proved useful particularly for debugging. The modular architecture allowed us to
directly compare our demodulator implementation to our Jupyter/Python prototype, where we found that our C
In the firmware development phase we tested every module such as DSSS demodulator, Reed-Solomon decoder, or grid
frequency estimation individually. This approach proved particularly useful for debugging. The modular architecture
allowed us to directly compare our demodulator implementation to our Jupyter/Python prototype, where we found that our C
implementation outperformed the Python prototype. Despite the algorithms's complexity, the microcontroller C
implementation has no issues processing data in real-time due to the low sampling rate necessary.
@ -965,7 +964,7 @@ Safety reset controllers can be adapted to most IoT device and smart meter desig
other public utilities such as the internet or cellular networks, we believe in their potential as a last line of
defense providing resilience under large-scale cyber attacks. The next steps towards a practical implementation will be
a practical demonstration of broadcast data transmission through grid frequency modulation using a megawatt-scale
controllable load as well as further optimization of the modulation and data encoding as well as the demodulator
controllable load as well as further optimization of the modulation and data encoding and the demodulator
implementation.
\subsection{Artifacts}