2750 lines
208 KiB
TeX
2750 lines
208 KiB
TeX
\documentclass[12pt,a4paper,notitlepage]{report}
|
|
\usepackage[ngerman, english]{babel}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[a4paper,textwidth=17cm, top=2cm, bottom=3.5cm]{geometry}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage[
|
|
backend=biber,
|
|
style=numeric,
|
|
natbib=true,
|
|
url=false,
|
|
doi=true,
|
|
eprint=false
|
|
]{biblatex}
|
|
\addbibresource{safety_reset.bib}
|
|
\usepackage{amssymb,amsmath}
|
|
\usepackage{listings}
|
|
\usepackage{eurosym}
|
|
\usepackage{wasysym}
|
|
\usepackage{amsthm}
|
|
\usepackage{tabularx}
|
|
\usepackage{multirow}
|
|
\usepackage{multicol}
|
|
\usepackage{tikz}
|
|
\usepackage{mathtools}
|
|
\DeclarePairedDelimiter{\ceil}{\lceil}{\rceil}
|
|
\DeclarePairedDelimiter{\paren}{(}{)}
|
|
|
|
\usetikzlibrary{arrows}
|
|
\usetikzlibrary{chains}
|
|
\usetikzlibrary{backgrounds}
|
|
\usetikzlibrary{calc}
|
|
\usetikzlibrary{decorations.markings}
|
|
\usetikzlibrary{decorations.pathreplacing}
|
|
\usetikzlibrary{fit}
|
|
\usetikzlibrary{patterns}
|
|
\usetikzlibrary{positioning}
|
|
\usetikzlibrary{shapes}
|
|
|
|
\usepackage[binary-units]{siunitx}
|
|
\DeclareSIUnit{\baud}{Bd}
|
|
\usepackage{hyperref}
|
|
\usepackage{tabularx}
|
|
\usepackage{commath}
|
|
\usepackage{graphicx,color}
|
|
\usepackage{ccicons}
|
|
\usepackage{subcaption}
|
|
\usepackage{float}
|
|
\usepackage{footmisc}
|
|
\usepackage{array}
|
|
\usepackage[underline=false]{pgf-umlsd}
|
|
\usetikzlibrary{calc}
|
|
%\usepackage[pdftex]{graphicx,color}
|
|
\usepackage{epstopdf}
|
|
\usepackage{pdfpages}
|
|
\usepackage{minted} % pygmentized source code
|
|
% Needed for murks.tex
|
|
\usepackage{setspace}
|
|
\usepackage[draft=false,babel,tracking=true,kerning=true,spacing=true]{microtype} % optischer Randausgleich etc.
|
|
% For german quotation marks
|
|
|
|
\newcommand{\degree}{\ensuremath{^\circ}}
|
|
\newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}
|
|
|
|
\usepackage{fancyhdr}
|
|
\fancyhf{}
|
|
\fancyfoot[C]{\thepage}
|
|
\newcommand{\includenotebook}[2]{
|
|
\fancyhead[C]{Included Jupyter notebook: #1}
|
|
\includepdf[pages=1,
|
|
pagecommand={\thispagestyle{fancy}\section{#1}\label{#2_notebook}}
|
|
]{resources/#2.pdf}
|
|
\includepdf[pages=2-,
|
|
pagecommand={\thispagestyle{fancy}}
|
|
]{resources/#2.pdf}
|
|
}
|
|
|
|
\begin{document}
|
|
\selectlanguage{ngerman}
|
|
\input{murks}
|
|
\titelen{A Post-Attack Recovery Architecture for Smart Electricity Meters}
|
|
\titelde{Eine Architektur zur Kontrollwiederherstellung nach Angriffen auf Smart Metering in Stromnetzen}
|
|
\typ{Masterarbeit}
|
|
\grad{Master of Science (M. Sc.)}
|
|
\autor{Jan Sebastian Götte}
|
|
\gebdatum{Aus Datenschutzgründen nicht abgedruckt} % Geburtsdatum des Autors
|
|
\gebort{Aus Datenschutzgründen nicht abgedruckt} % Geburtsort des Autors
|
|
\gutachter{Prof. Dr. Björn Scheuermann}{Prof. Dr.-Ing. Eckhard Grass}
|
|
\mitverteidigung
|
|
\makeTitel
|
|
\selbstaendigkeitserklaerung{\today}
|
|
\vfill
|
|
\selectlanguage{english}
|
|
{\center{
|
|
\begin{minipage}[t][10cm][b]{\textwidth}
|
|
\center{\ccbysa}
|
|
|
|
\center{This work is licensed under a Creative-Commons ``Attribution-ShareAlike 4.0 International'' license. The
|
|
full text of the license can be found at:}
|
|
|
|
\center{\url{https://creativecommons.org/licenses/by-sa/4.0/}}
|
|
|
|
\center{For alternative licensing options, source files, questions or comments please contact the author at
|
|
\texttt{masterarbeit@jaseg.de}}.
|
|
|
|
\center{This is version \texttt{\input{version.tex}\unskip} generated on \today. The printed version of this
|
|
document will be marked \texttt{-dirty} due to the private personal information on the title page that is not
|
|
checked in to git. The git repository can be found at:}
|
|
|
|
\center{\url{https://git.jaseg.de/master-thesis.git}}
|
|
\end{minipage}
|
|
}}
|
|
\newpage
|
|
|
|
% Hier folgt die eigentliche Arbeit (bei doppelseitigem Druck auf einem neuen Blatt):
|
|
\tableofcontents
|
|
\newpage
|
|
|
|
\chapter{Introduction}
|
|
|
|
%FIXME: sprinkle this section with citations.
|
|
Like in all fields of engineering there is an ongoing diffusion of information systems into industrial control systems
|
|
in the power grid. Automation of these control systems has been practised for the better part of a century already.
|
|
Until recently this automation was mostly limited to core components of the grid. Generators in power stations are
|
|
computer-controlled according to electromechanical and economic models. Switching in substations is automated to allow
|
|
for fast failure recovery. Humans are still vital to these systems, but their tasks have shifted from pure operation to
|
|
engineering, maintenance and surveillance.
|
|
|
|
A large-scale trend in power systems is the move from a model of centralized generation built around massive large-scale
|
|
fossil and nuclear power plants towards a more heterogenous model. In this new model large-scale fossil power plants
|
|
still serve a major role but two new factors come into play. One is the advance of renewable energies. The large-scale
|
|
use of wind and solar power in particular from a current standpoint seems unavoidable for our continued existence on
|
|
this planet. For the electrical grid however, these systems constitute a significant challenge. Fossil-fueled power
|
|
plants can be precisely controlled to match the expected energy consumption at any point in time. This tracking of
|
|
production and consumption is vital to the stability of the grid. Renewable energies such as wind and solar power do not
|
|
provide the same degree of controllability, and they introduce a large degree of uncertainty due to the
|
|
unpredictable way of the forces of nature.
|
|
|
|
Along with this change in dynamic behavior renewable energies have brought forth the advance of distributed generation.
|
|
In distributed generation end-customers that previously only consumed energy have started to feed energy into the grid
|
|
from small solar installations on their property. Distributed generation is a chance for customers to gain autonomy and
|
|
shift from a purely passive role to being active participants of the electricity market\cite{crastan03}.
|
|
|
|
To match this new landscape of decentralized generation and unpredictable renewable resources the utility industry has
|
|
had to adapt itself in major ways. One aspect of this adaption that is particularly visible to ordinary people is the
|
|
computerization of end-user energy metering. Despite the widespread use of industrial control systems inside the
|
|
electrical grid and the far-reaching diffusion of computers into people's everyday lifes the energy meter has long been
|
|
one of the last remnants of an offline, analog time. Until the 2010s many households were still served through
|
|
electromechanical Ferraris-style meters that have their origin in the late 19th
|
|
century\cite{borlase01,ukgov04,bnetza02}.
|
|
|
|
Today under the umbrella term \emph{Smart Grid} the shift towards fully computerized, often networked meters has been
|
|
partially accomplished. The roll out of these \emph{Smart Meters} has not been very smooth overall with some countries
|
|
severely lagging behind other countries. As a safety-critical technology smart meter technology is usually standardized
|
|
on a per-country basis. This leads to an inhomogenous landscape with in some instances wildly incompatible systems.
|
|
Often vendors only serve a single country or have a separate model of their meter for each country. This complex
|
|
standardization landscape and market situation has led to a proliferation of highly complex, custom-coded
|
|
microcontroller firwmare. The complexity and scale of this often network-connected firmware makes for a ripe substrate
|
|
for bugs to surface.
|
|
|
|
A remotely exploitable flaw inside a smart meter's firmware\footnote{
|
|
There are several smart metering architectures that ascribe different roles to the component called \emph{smart
|
|
meter}. Coarsely divided into two camps these are systems where all metering and communication code resides within
|
|
one physical unit and systems where metering and communication are separated into two units, the \emph{smart meter}
|
|
and the \emph{smart meter gateway}\cite{stuber01}. An example for the former are setups in the USA, an example of
|
|
the latter is the one in Germany. For clarity in this introductory chapter we use \emph{smart meter} to describe the
|
|
entire system at the customer premises including both the meter and a potential gateway.
|
|
} could have consequences ranging from impaired billing
|
|
functionality to an existential threat to grid stability\cite{anderson01,anderson02}. A coördinated attack on meters in
|
|
a country where load switches are common could at worst cause widespread activation of grid safety systems by repeatedly
|
|
connecting and disconnecting megawatts of load capacity in just the wrong moments\cite{wu01}.
|
|
|
|
Mitigation of these attacks through firmware security measures is unlikely to yield satisfactory results. The enormous
|
|
complexity of smart meter firmware makes firmware security extremely labor-intensive. The diverse standardization
|
|
landscape makes a coördinated, comprehensive response unlikely.
|
|
|
|
In this thesis instead of lamenting the state of firmware security we introduce a pragmatic solution to the in our minds
|
|
likely scenario of a large-scale compromise of smart meter firmware. In our proposal the components of the smart meter
|
|
that are threatened by remote compromise are equipped with a physically separate \emph{safety reset controller} that
|
|
listens for a reset command transmitted through the electrical grid itself and on reception forcibly resets the smart
|
|
meter's entire firmware to a known-good state. Our safety reset controller receives commands through Direct Sequence
|
|
Spread Spectrum (DSSS) modulation carried out on grid frequency through a large controllable load such as an aluminium
|
|
smelter. After forward error correction and cryptographic verification it re-flashes the target application
|
|
microcontroller over the standard JTAG interface.
|
|
|
|
In this thesis starting from a high-level architecture we have carried out extensive simulations of our proposal's
|
|
performance under real-world conditions. Based on these simulations we implemented an end-to-end prototype of our
|
|
proposed safety reset controller as part of a realistic smart meter demonstrator. Finally we experimentally validate our
|
|
results and give an outline of further steps towards practical implementation.
|
|
|
|
\chapter{Fundamentals}
|
|
|
|
\section{Structure and operation of the electrical grid}
|
|
|
|
Since this thesis is filed under \emph{computer science} we will provide a very brief overview of some basic aspects of
|
|
modern power grids.
|
|
|
|
\subsection{Structure of the electrical grid}
|
|
|
|
The electical grid is composed of a large number of systems such as distribution systems, power stations and substations
|
|
interconnected by long transmission lines. Mostly due to ohmic losses\footnote{
|
|
Power dissipation of a resistor of resistance $R [\Omega]$ given current $I [A]$ is $P_\text{loss} [W] =
|
|
U_\text{drop} \cdot I = I^2 \cdot R$. Fixing power $P_\text{transmitted} [W] = U_\text{line} \cdot I$ this yields a
|
|
dependency on line voltage $U_\text{line} [V]$ of $P_\text{loss} =
|
|
\left(\frac{P_\text{transmitted}}{U_\text{line}}\right)^2 \cdot R$. Thus, ignoring other losses a $2\times$ increase
|
|
in transmission voltage halves current and cuts ohmic losses to a quarter. In practice the economics of this are
|
|
much more complicated due to the cost of better isolation for higher-voltage parts and the added factor of power
|
|
factor compensation. }
|
|
the efficiency of transmission of electricity through long transmission lines increases with the square of
|
|
voltage\cite{crastan01,simon01}. % simon01: p. 425, 9.4.1.1, crastan p.55, 3.1
|
|
In practice economic considerations take into account a reduction of the considerable transmission losses (about
|
|
\SI{6}{\percent} in case of Germany\cite{destatis01}) as well as the cost of equipment such as additional transformers
|
|
and the cost increase for the increased volatage rating of components such as transmission lines. Overall these
|
|
considerations have led to a hierarchical structure where large amounts of energy are transmitted over very long
|
|
distances (up to thousands of kilometers) at very high voltages (upwards of \SI{200}{\kilo\volt}) and voltages get lower
|
|
the closer one gets to end-customer premises. In Germany at the local level a substation will distribute
|
|
\SIrange{10}{30}{\kilo\volt} to large industrial consumers and streets with small transformer substations converting
|
|
this to the \SI{400}{\volt} three-phase AC households are usually hooked up with\cite{crastan01}.
|
|
|
|
\subsubsection{Transmission lines, bus bars and tie lines}
|
|
|
|
The number one component of the electrical grid are transmission lines. Short transmission lines that tightly couple
|
|
parts of a substation are called \emph{bus bars}. Transmission lines that couple otherwise independent grid segments are
|
|
called \emph{tie lines}. A tie line often connects grid segments operated by two different operators e.g.\ across a
|
|
country border.
|
|
|
|
\emph{Short} transmission lines can be approximated as a simple lumped-component
|
|
RLC\footnote{resistor-inductor-capacitor} circuit. In this case the effect of wave propagation along the line does not
|
|
have to be taken into consideration. In this lumped model the transmission line is represented by a circuit of one or
|
|
two inductors, one or two capacitors and some resistors. This representation simplifies analysis. For \emph{long}
|
|
transmission lines above \SI{50}{\kilo\meter} (cable) or \SI{250}{\kilo\meter} (overhead lines) this approximation
|
|
breaks down and wave propagation along the line's length has to be taken into account. The resulting model is what RF
|
|
engineering calls a \emph{transmission line} and models the line's parasitics\footnote{stray capacitance, ohmic
|
|
resistance and stray inductance} as being uniformly distributed along the length of the line. To approximate this model
|
|
in lumped-element evaluations the line is represented as a long chain of small lumped-component RLC sections. This
|
|
complex structure makes modelling more difficult in comparison to short lines\cite{crastan01}.
|
|
|
|
Almost all transmission lines used in the transmission and distribution grid use three-phase AC. Long-distance overland
|
|
lines are usually implemented as overhead lines due to their low cost and ease of maintenance. Underground cables are
|
|
much more expensive due to their isolation and are only used when overhead lines cannot be used for e.g.\ safety or
|
|
aesthetic reasons. In some specialized applications such as long, high-power undersea cables high-voltage DC (HVDC) is
|
|
used. In HVDC converter stations at both ends of the line convert between three-phase AC and the line's DC voltage.
|
|
These converter stations are controlled electronically and do not exhibit any of the electromechanical effects
|
|
generators in a power plant do. Since HVDC re-synthesizes three-phase AC from DC at the receiving end of the line it can
|
|
be used to couple non-synchronous grids. This also allows for additional degrees of control over the transmission of
|
|
power compared to a regular transmission line. These technical benefits are offset by the high initial cost (mostly due
|
|
to the converter stations) leading to HVDC being used in specific situations only\cite{crastan03}.
|
|
|
|
\subsubsection{Generators}
|
|
|
|
Traditionally all generators in the power grid were synchronous machines. A synchronous machine is a generator that is
|
|
wound and connected in such a way that during normal operation its rotation is synchonous with the grid frequency. Grid
|
|
frequency and generator rotation speed are bidirectionally electromechanically coupled. If a generator would lag behind
|
|
the grid it would receive electrical energy from the grid and convert it into mechanical energy, acting as a motor.
|
|
Small deviations between rotational speed and grid frequency will be absorbed by the electromechanical coupling between
|
|
both. All generators connected to the grid operate synchronously. Maintaining this synchronization over time is the task
|
|
of complex control systems within each power station\cite{simon01,crastan01}.
|
|
|
|
Nowadays besides traditional rotating generators the grid also contains a large amount of electronically controlled
|
|
inverters. These inverters are used in photovoltaic installations and other setups where either DC or non-synchronous AC
|
|
is to be fed into the grid. Setups like this behave differently to rotating generators. In particular \emph{inertia} in
|
|
these setups is either absent or a software parameter potentially reducing their overload capacity compared to rotating
|
|
generators. The fundamentally different nature of electronically controlled inverters has to be taken into account in
|
|
planning and regulation\cite{crastan03}.
|
|
|
|
\subsubsection{Switchgear}
|
|
|
|
In the electrical grid switches perform various roles. The ones a computer scientist would recognize are used for
|
|
routing electricity between transmission lines and transformers and can be classified into ones that can be switched
|
|
under load (called load switches) and ones that can not (called disconnectors). The latter are used to ensure parts of
|
|
the network are free from voltage. The former are used to re-route flows of electrical currents. A major difference in
|
|
their construction is that in contrast to disconnectors load switches have built-in components that extinguish the
|
|
high-power arc discharge that forms when the circuit is interrupted under load\footnote{
|
|
While an arc discharge is considered a fault condition in most low-voltage systems including computers, in energy
|
|
systems it is often part of normal operation.
|
|
}. Beyond this there are circuit breakers. Circuit breakers are safety devices that can still switch even under failure
|
|
conditions at several times the circuit's nominal current. They are activated automatically on conditions such as
|
|
overcurrent or overvoltage. Fuses can be considered non-resettable switches. The fuse in a computer power supply is
|
|
barely more than a glass tube with some wire in it that is designed to melt at the designated current. In energy systems
|
|
fuses are often much more complex devices that in some cases even utilize explosivese to quickly and decisively open the
|
|
circuit and extinguish the resulting arc discharge\cite{nelles01,crastan01,simon01}.
|
|
% disconnect switches, fuses, breakers -> crastan 1 (ch. 8)
|
|
|
|
\subsubsection{Transformers}
|
|
|
|
Along with transmission lines transformers are one of the main components most people will be thinking of when talking
|
|
about the electrical grid. Transformers connect grid segments at different voltage levels with one another. In the
|
|
distribution grid transformers are used to provide standard end-user voltage levels to the customer (e.g. 230/400V in
|
|
Europe) from a \SIrange{10}{25}{\kilo\volt} feeder. Transformers can also be used to convert between buses without a
|
|
fourth neutral conductor and buses with one.
|
|
|
|
Transformers are large and heavy devices consisting of thick copper wire or copper foil windings arranged around a core
|
|
made from thin stacked, insulated iron sheets. The entire core sits within a large metal enclosure that is filled with
|
|
liquid (usually a specialized oil) for both cooling and electrical insulation. This cooling liquid is cooled by means
|
|
such as radiator fins on the transformer enclosure itself or an external radiator. Depending on the design cooling may
|
|
rely on natural convection within the cooling liquid or on electrical pumps\cite{crastan01,simon01}.
|
|
|
|
Transformers come in a large variety of coil and wiring configurations. There exist autotransformers where the secondary
|
|
is part of the primary (or vice-versa) that are used to translate between voltage levels without galvanic isolation at
|
|
lower cost. Transformers used in parts of the electrical grid often have several taps and include \emph{tap changers}. A
|
|
tap changer is a system of mechanical switches that can be used to switch between several discrete transformer ratios to
|
|
adjust secondary voltage under load\cite{simon01}. Tap changers are used in the distribution grid to maintain the
|
|
specified voltage tolerances at the customer's connection.
|
|
|
|
\subsubsection{Instrument transformers}
|
|
|
|
While operating on the exact same physical principles instrument transformers are very different from regular
|
|
transformers in an energy system. Instrument transformers are specialized low-power transformers that are used as
|
|
transducers to measure voltage or current at very high voltages. They are part of the control and protection systems of
|
|
substations\cite{crastan01}.
|
|
|
|
\subsubsection{Chokes}
|
|
|
|
Chokes are large inductors. In power grid applications their construction is similar to the construction of a
|
|
transformer with the exception that they only have a single winding on the core. They are used for a variety of
|
|
purposes. A frequent use is as a series inductor on one of the phases or the neutral connection to limit transient fault
|
|
currents. In addition to use as simple series inductances for current limiting inductors are also used to tune LC
|
|
circuits. One such use are Petersen coils, large inductors in series with the earth connection at a transformer's star
|
|
point are used to quickly extinguish arcs between phase and ground on a transmission line. The Petersen coil forms a
|
|
parrallel LC resonant circuit with the transmission line's earth capacitance. Tuning this circuit through adjusting the
|
|
petersen coil reduces earth fault current to levels low enough to quickly extinguish the arc\cite{simon01}.
|
|
|
|
\subsubsection{Power factor correction}
|
|
|
|
Power factor is a power engineering term that is used to describe how close the current waveform of a load is to that of
|
|
a purely resistive load. Given sinusoidal input voltage $V(t) = V_\text{pk} \sin \paren{\omega_\text{nom} t}$ with
|
|
$\omega_\text{nom} = 2 \pi f_\text{nom} = 2 \pi \cdot \SI{50}{\hertz}$ being the nominal angular frequency, the current
|
|
waveform of a resistor with resistance $R \left[\Omega\right]$ according to Ohm's law would be $I(t) = \frac{V(t)}{R} =
|
|
\frac{1}{R} V_\text{pk} \sin\paren{\omega_\text{nom} t}$. In this case voltage and current are perfectly in phase, i.e.
|
|
the current at time $t$ is linear in voltage at constant factor $\frac{1}{R}$.
|
|
|
|
In contrast to this idealized scenario reality provides us with two common issues: One, the load may be reactive. This
|
|
means its current waveform is an ideal sinusoid, but there is a phase difference between mains voltage and load current
|
|
like so: $I(t) = \frac{V(t)}{R} = \frac{1}{\left|Z\right|} V_\text{pk} \sin\paren{\omega_\text{nom} t + \varphi}$ $Z$
|
|
would be the load's complex impedance combining inductive, capacitive and resistive components and $\varphi$ the phase
|
|
difference between the resulting current waveform and the mains voltage waveform. A common case of such loads are motors
|
|
and the inductive ballasts in old fluorescent lighting fixtures.
|
|
|
|
The second potential issue are loads with non-sinusoidal current waveform. There are many classes of these but the most
|
|
common one are switching-mode power supplies. Most SMPS for modern electronic devices have an input stage consisting of
|
|
a bridge rectifier followed by a capacitor that provide high-voltage DC power to the following switch-mode convert
|
|
circuit. This rectifier-capacitor input stage under normal load draws a high current only at the very peak of the input
|
|
voltage sinusoid and draws almost zero current for most of the period.
|
|
|
|
These two cases are measured by \emph{displacement power factor} and \emph{distortion power factor} that when combined
|
|
yield the overall true power factor. The power factor is a key quantity in the design and operation of the power grid
|
|
since a high power factor (close to $1.0$ or an in-phase sinusoidal current waveform) yields lowest transmission and
|
|
generation losses.
|
|
|
|
Reactive power (also referred to as \emph{VAR} after its is unit Volt-Ampère Reactive) an important variable in the
|
|
operation of electrical grids (see sec.\ \ref{frequency_estimation}). If reactive power generation and consumption are
|
|
mismatched and power factor is low, high currents develop that lead to high transmission losses. For this reason grids
|
|
include circuits to compensate reactive power imbalances\cite{crastan01}. These circuits can be as simple as inductors
|
|
or capacitors connected to a power line but often can be switched to adapt to changing load conditions. Static Var
|
|
compensators are particularly fast-acting reactive power compensation devices whose purpose is to maintain bus
|
|
voltage\cite{rogers01}.
|
|
|
|
\subsubsection{Loads}
|
|
|
|
Lastly, there is the loads that the electrical grid serves. Loads range from mains-powered indicator lights in devices
|
|
such as light switches or power strips weighing in at mere milliwatts to large smelters in industrial metal production
|
|
that can consume a good fraction of a gigawatt all on their own.
|
|
|
|
\subsection{Operational concerns}
|
|
\subsubsection{Modelling the electrical grid}
|
|
|
|
Modelling performs an important role in the engineering of a reliable power infrastructure. The grid is a complex,
|
|
highly dynamic system. To maintain operational parameters such as voltage in various parts of the grid, grid frequency
|
|
and currents inside their specified ranges complex control systems are necessary. To design and parametrize such control
|
|
systems simulations are a valuable tool. Using model calculations the effects of control systems on operational
|
|
variables such as transmission efficiency or generation losses can be estimated. Model simulations can be used to
|
|
identify structural issues such as potential points of congestion. The same models can then be used to engineer
|
|
solutions to such issues, e.g.\ by simulating the effect of a new transmission line.
|
|
|
|
There are several aspects under which the grid or parts of the grid can be simulated. There are static analysis methods
|
|
such as modal analysis that yield information on electromechanical oscillations by computing the eigenvalues of a
|
|
large system of differential equations describing the collective behavior of all components of the grid. Modal analysis
|
|
is one example of simulations used in grid planning. Using modal analysis likely oscillatory modes can be identified and
|
|
ultimately these results can inform a decision to install additional stabilization systems in a particular location.
|
|
In contrast to static analysis, transient simulations calculate an approximation of the time-domain behavior of some
|
|
variable of interest under a given model. Transient simulations are used e.g.\ in the design of control systems.
|
|
Power flow equations describe the flow of electrical energy throughout the network from generator to load. Numerical
|
|
solutions these equations are used to optimize control parameters to increase overall efficiency.
|
|
|
|
% TODO decide what of this to keep.
|
|
% \subsubsection{Generator controls}
|
|
% \subsubsection{Load shedding}
|
|
% \subsubsection{System stability}
|
|
% \subsubsection{Power System Stabilizers}
|
|
|
|
\section{Smart meter technology}
|
|
|
|
Smart meters were a concept pushed by utility companies throughout the 00's. Smart metering is one component of the
|
|
larger societal shift towards digitally interconnected technology. Old analog meters required that service pesonnel
|
|
physically come to read the meter. \emph{Smart} meters automatically transmit their readings through modern
|
|
technologies. Utility companies were very interested in this move not only because of the cost savings for meter reading
|
|
personnel. Beyond this, an always-connected meter allows several entirely new use cases that have not been possible
|
|
before. One often-cited one is utilizing the new high-resolution load data to improve load forecasting to allow for
|
|
greater generation efficiency. Computerizing the meter also allows for new fee models where electricity cost is no
|
|
longer fixed over time but adapts to market conditions. Models such as prepayment electricity plans where the customer
|
|
is automatically disconnected until they pay their bill are significantly aided by a fully electronic system that can be
|
|
controlled and monitored remotely\cite{anderson02}. A remotely controllable load switch can also be used to coerce
|
|
customers in situations where that was not previously economically possible\footnote{
|
|
The swiss association of electrical utility companies in sec.\ 7.2 par.\ (2)a of their 2010 whitepaper on the
|
|
introduction of smart metering\cite{vseaes01} cynically writes that remotely controllable load switches ``lead a new
|
|
tenant to swiftly register'' with the utility company. This whitepaper completely vanished from their website some
|
|
time after publication, but the internet archive has a copy.
|
|
}. Figure \ref{fig_smgw_schema} shows a schema of the smart metering installation in a typical household\cite{stuber01}.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics{resources/smgw_usage_scenario}
|
|
\caption{A typical usage scenario of a smart metering system in a typical home.}
|
|
\label{fig_smgw_schema}
|
|
\end{figure}
|
|
|
|
To the customer the utility of a smart meter is largely limited to the convenience of being able to read it without
|
|
going to the basement. In the long term it is said that there will be second-order savings to the customer since
|
|
electricity prices adapting to the market situation along with this convenience will lead them to consume less
|
|
electricity and to consume it in a way that is more amenable to utilities, both leading to reduced
|
|
cost\cite{borlase01,bmwi03,anderson02}.
|
|
|
|
Traditional Ferraris counters with their distinctive rotating aluminium disc are simple electromechanical devices. Since
|
|
it does not include any failure-prone semiconductors or other high technology a cheap Ferraris-style meter can easily
|
|
last decades. In contrast to this, smart meters are complex high technology. They are vastly more expensive to develop
|
|
in the first place since they require the development and integration of large amounts of complex, custom firwmare. Once
|
|
deployed, their lifetime is severely limited by this very complexity. Complex semiconductor devices tend to fail, and
|
|
firmware that needs to communicate with the outside world tends to not age well\cite{borkar01}.
|
|
This combination of higher unit cost and lower expected lifetime leads to grossly increased costs per household. This
|
|
cost is usually shared between utility and customer.
|
|
|
|
As part of its smart metering rollout the German government in 2013 had a study conducted on the economies of smart
|
|
meter installations. This study came to the conclusion that for the majority of households computerizing an existing
|
|
ferraris meter is uneconomical. For larger consumers or new installations the higher cost of installation over time is
|
|
offset by the resulting savings in electricity cost\cite{bmwi03}.
|
|
|
|
\subsection{Human-Computer Interaction aspects of smart meter technology}
|
|
|
|
A fundamental aspect in realizing the cost and energy savings promised by the smart metering revolution is that it
|
|
requires a paradigm shift in consumer interaction. Previously most consumers would only confront their energy use when
|
|
their monthly or yearly electricity bill arrived. All of the cost savings smart meters promise over traditional metering
|
|
infrastructure\footnote{
|
|
We are excluding savings from Demand-Side Response (DSR) implemented through smart meters here: Traditional ripple
|
|
control systems already allowed for these, and due to the added cost of high-power relays many smart meters do not
|
|
include such features.
|
|
} critically depend on the consumer regularly interacting with the meter through an in-home display or app. We live in
|
|
an era where our attention is already highly contested. A myriad of apps and platforms compete for our attention through
|
|
our smart phones and other devices. Introducing an entirely new service into this already complex battleground is a large
|
|
endeavour. On the one hand it is not clear how this new service would compete with everything else. On the other hand if
|
|
it does manage to capture our attention and lead us to modify our behavior, what are the side effects? For instance,
|
|
does an in-home display increase financial anxiety in economically disadvantaged customers?
|
|
|
|
Human Computer Interaction research has touched the topic of smart metering several times and has many insights to offer
|
|
for technologists\cite{pierce01,rodden01,lupton01,costanza01,fell01}. An issue pointed out in \cite{rodden01} is that at
|
|
least in some countries consumers fundamentally distrust their utility companies. This trust issue is exacerbated by
|
|
smart meters being unilaterally forced onto consumers by utility companies. Much of the success of smart metering's
|
|
ubiquitous promises of energy savings fundamentally depends on consumer coöperation. Here, the aforementioned trust
|
|
issue calls into question smart metering's chances of long-term success.
|
|
|
|
As \text{pierce01} pointed out smart metering developments could benefit greatly from early involvement of HCI research.
|
|
HCI research certainly would not have overlooked entire central issues such as privacy as it happened in the dutch
|
|
case\cite{cuijpers01}. The current corporate-driven approach to a technological advance forced through national
|
|
standardization bears a serious risk of failing to meet its ostensible objectives for consumers. The role of consumers
|
|
and the complex sociotechnological environment posed by this new technology is seriously considered nowhere in the
|
|
standardization process. While certainly noone will admit to outright ignoring consumers in smart meter standardization
|
|
their role is largely limited to the occassional public consultation. At the same time the standards are written by
|
|
technologists--it seems largely without input on their practicality or socio-technological implications from fields such
|
|
as HCI. % TODO citation? too much burn?
|
|
|
|
\subsection{Common components}
|
|
\label{sm-cpu}
|
|
|
|
Smart meters usually are built around an off-the-shelf microcontroller. Some meters use specialized smart metering
|
|
SOCs\cite{ifixit01} while others use standard microcontrollers with core metering functions implemented in external
|
|
circuitry (cf.\ sec.\ \ref{sec-easymeter} where we detail the meter in our demonstration setup). Specialized SoCs
|
|
usually contain a segment LCD driver along with some high-resolution analog-to-digital converters for the actual
|
|
measurement functions. In many smart meter designs used outside of Germany the metering SoC will be connected to another
|
|
full-featured SoC acting as the modem. At a casual glance this might seem to be a security measure, but it may be more
|
|
likely that this is done to ease integration of one metering platform with several different communication stacks (e.g.\
|
|
proprietary sub-gigahertz wireless, powerline communication (PLC) or ethernet). In these architectures there is a clear
|
|
line of functional demarcation between the metering SoC and the modem. As evidenced by over-the-air software update
|
|
functionality (see e.g.\ \cite{honeywell01}) this does not however extend to an actual security boundary.
|
|
|
|
Energy usage is calculated by measuring both voltage and current at high resolution and then integrating the
|
|
measurements. Current measurements are usually made with either a current transformer or a shunt in a four-wire
|
|
configuration. Voltage is measured by dividing input AC down with a resistor chain. Both are integrated digitally using
|
|
the MCU's time base as a reference.
|
|
|
|
Whereas legacy electromechanical energy meters only provided a display of aggregate energy use through a decimal counter
|
|
as well as an indirect indication of power through a rotating wheel one of the selling points of smart meters is their
|
|
ability to calculate advanced statistics on energy use. These statistics are supposed to help customers better target
|
|
energy conservation measures\cite{bmwi03}.
|
|
|
|
In addition to the pure measurement and data aggregation functions smart meters can perform additional functions. One is
|
|
to serve as a gateway between the utility company's control systems and large controllable loads in the consumer's
|
|
household for Demand-Side Management (DSM)\cite{borlase01}. In DSM the utility company can control when exactly a
|
|
high-power device such as a water storage heater is turned on. To the customer the precise timing does not matter since
|
|
the storage heater is set so that it has enough hot water in its reservoir at all times. The utility company however can
|
|
use this degree of control to reduce load variations during temporary imbalances such as peaks. The efficiency gains
|
|
realized with this system translate into lower electricity prices for DSM-enabled loads for the customer. Traditionally
|
|
DSM was realized on a local level using ripple control systems. In ripple control control data is coded by modulating a
|
|
carrier at a low frequency such as \SI{400}{\hertz} on top of the regular mains voltage. These systems require
|
|
high-power transmitters at tens of kilowatts and still can only bridge regional distances\cite{dzung01}.
|
|
|
|
Another important additional function is that in some countries some smart meters can be used to remotely disconnect
|
|
consumer households with outstanding bills. Using euphemisms such as \emph{utility revenue protection}\cite{kamstrup01}
|
|
or \emph{reducing nontechnical losses}\cite{brown01} while cynically claiming \emph{Consumer
|
|
Empowerment}\cite{kamstrup01} these systems allow an utility company to remotely disconnect a customer at any time.
|
|
Whereas before smart metering this required either additional hardware or an expensive site visit by a qualified
|
|
technician smart meters have ushered in an era of frictionless control\footnote{
|
|
Note that in some countries such as the UK non-networked mechanical prepayment meters did exist. In such systems the
|
|
user inserts coins into a coin slot that activates a load switch at the household's main electricity connection.
|
|
These systems were non-networked and did not allow for remote control. A disadvantage of such systems compared to
|
|
modern \emph{smart} systems are the high cost of the coin acceptor and the overhead of site visits required to empty
|
|
the coin box\cite{anderson02}.
|
|
}.
|
|
|
|
\subsection{Cryptographic coprocessors}
|
|
|
|
Just like in legacy electricity meters in smart meters physical security is still a key component of the overall system
|
|
design. Since in both types of meter cost depends on physical quantities being measured at the customer premises
|
|
customers can save cost in case they are able to falsify the meter's measurements without being
|
|
detected\cite{anderson02}. For this reason both types of meters employ countermeasures against physical intrusion.
|
|
Compared to high-risk devices such as card payment processing terminals or ATMs the tamper proofing used in smart meters
|
|
is only basic\cite{anderson02}. Common measures include sealing the case by irreversibly ultrasonically welding front
|
|
and back plastic shells together or the use of security seals on the lid covering the input/output screw terminals.
|
|
Low-tech attacks using magnets to saturate the current transformer's ferrite cores are detected using hall
|
|
sensors\cite{anderson02,anderson03,itron01,hager01,easymeter01}. German smart metering standards specify the use of a
|
|
smartcard-like security module to provide transport encryption and other cryptographic
|
|
services\cite{bsi-tr-03109-2,bsi-tr-03109-2-a}. During our literature review we did not find many references to similar
|
|
requirements in other national standards, though this does not mean that individual manufacturers do not use smartcards
|
|
for engineering reasons or due to pressure from utilities. The limited documentation on meter internals that we did find
|
|
such as \cite{ifixit01} suggests where no such regulation exists manufacturers and utilities likely choose to forego
|
|
such advanced measures and instead settle on simple software implementations.
|
|
|
|
\subsection{Physical structure and installation}
|
|
|
|
Smart meters are installed like traditional electricity meters. In Japan this means they are usually installed on an
|
|
exterior wall and need to be resistant against weather and extreme environmental conditions (direct sunlight, high
|
|
temperature, high humidity). In Germany the meter is always installed either indoors or in an outdoor utility closet
|
|
that is sealed to keep out the weather. In most countries the meter is connected through large integrated screw
|
|
terminals. In the US meters compliant with the domestic ANSI C12 standard are round and plug into a large socket that is
|
|
wired into the house or apartment's electrical connection.
|
|
|
|
Modern smart meters are usually made with plastic cases. Ferraris meters often used cases stamped from sheet metal with
|
|
glass windows on them. Smart meters now look much more like other modern electronic devices. A common construction style
|
|
is to separate the case in a front and back half with both halves clipped or ultrasonically welded together. Ultrasonic
|
|
welding gives a robust, airtight interface. This interface cannot easily be separated and re-connected without leaving
|
|
visible traces, which helps with tamper evidence properties. As an industry-standard process common in various consumer
|
|
goods ultrasonic welding is a cheap and accessible technology\cite{easymeter01,ifixit01}.
|
|
|
|
Communication interfaces sometimes are brought out through regular electromechanical connectors but often also are
|
|
optical interfaces. A popular style here is to use a regular UART connected to an LED/phototransistor optocoupler
|
|
mounted on the side of the case. The user interface is usually limited to an LCD display. For cost and ingress
|
|
protection smart meters rarely use mechanical buttons. Some smart meters use a phototransistor mounted behind the
|
|
faceplate that can be activated with a flashlight as a crude contact-less input device\cite{easymeter01}.
|
|
|
|
All meters provide several options for security seals to be installed to detect opening of the meter or access to its
|
|
terminal block. The shape and type of these security seals varies. Factory-installed seals are used to detect tampering
|
|
of the meter itself while seals made by the utility during meter installation are used to guard the meter's terminal
|
|
block and detect attempts at by-passing\cite{czechowski01}.
|
|
|
|
\section{Regulatory frameworks around the world}
|
|
|
|
Smart metering regulation varies from country to country as it is tightly coupled to the overall regulation of the
|
|
electrical grid. The standardization of the physical form factor and metrological parameters of a meter is usually
|
|
separate from the standardization of its \emph{smart} functionality. Most countries base the standard for their meters'
|
|
outwards-facing communication interface on a family of standards unified under the IEC as DLMS/COSEM. Employing this
|
|
base protocol ountry-specific standardization only covers which precise variant of it is spoken and what features are
|
|
supported.
|
|
|
|
\subsection{International standards}
|
|
|
|
The family of standards one encounters most in smart metering applications are IEC 62056 specifying the Device Language
|
|
Message Specification (DLMS) and the Companion Specification for Electronic Metering (COSEM). DLMS/COSEM are
|
|
application-layer standards describing a request/response schema similar to e.g.\ HTTP. DLMS/COSEM are mapped onto a
|
|
multitude of wire protocols. They can be spoken over TCP/IP or mapped onto low-speed UART serial interfaces
|
|
\cite{sato01,stuber01}. Besides DLMS/COSEM there are a multitude of standards usually specifying how DLMS/COSEM are to
|
|
be applied.
|
|
|
|
DLMS/COSEM show some amount of feature creep. They do not adhere to the age-old systems design adage that a tool should
|
|
\emph{do one thing and do it well}. Instead they try to capture the convex hull of all possible applications. This led
|
|
to a complicated design that requires extensive additional specification and testing to maintain even basic
|
|
interoperability. In particular in the area of transport security it becomes evident that the IEC as an electrical
|
|
engineering standards body stretched their area of expertise and resorting to established standard protocols would have
|
|
improved the situation\cite{weith01}. Compared to industry-standard transport security the IEC standards provide
|
|
a simplistic key management framework based on a static shared key with unlimited lifetime and provide sub-optimal
|
|
transport security properties (e.g.\ lack of forward-secrecy)\cite{khurana01,sato01}.
|
|
% TODO maybe expand this?
|
|
|
|
\subsection{The regulatory situation in selected countries}
|
|
|
|
In this section we will give an overview of the situation in a number of countries. This list of countries is not
|
|
representative and notably does not include any developing countries and is geographically biased. We selected these
|
|
countries for illustration only and based our selection in a large part on the availability of information in a language
|
|
we read. We will conclude this section with a summarization of common themes.
|
|
|
|
\subsubsection{Germany}
|
|
|
|
Germany standardized smart metering on a national level. Apart from the calibration standards applying to any type of
|
|
meter smart meters are covered by a set of communications and security standards developed by the German Federal Office
|
|
for Information Security (BSI). Germany mandates smart meter installations for newly constructed buildings and during
|
|
major renovations but does not require most legacy residential installations to be upgraded. This is a consequence of a
|
|
2013 cost-benefit analysis that found these upgrades to be uneconomical for the majority of residential
|
|
customers\cite{bmwi03,bmwi1,bmwe01,brown01}.
|
|
|
|
The German standards strictly separate between metering and communication functions. Both are split into separate
|
|
devices, the \emph{meter} and the \emph{gateway} (called emph{smart meter gateway} in full and often abbreviated
|
|
emph{SMGW}). One or several meters connect to a gateway through a COSEM-derived protocol. The communication interface
|
|
between meter and gateway can optionally be physically unidirectional. An unidirectional interface eliminates any
|
|
possibility of meter firmware compromise. The gateway contains a cryptographic security module similar to a
|
|
smartcard\cite{mahlknecht01} that is entrusted with signing of measurements and maintaining an authenticated and
|
|
encrypted communication channel with its authorities. Security of the system is certified according to a Common Criteria
|
|
process.
|
|
|
|
The German specification does not include any support for load switches outside of demand-side management as they are
|
|
common in some other countries. It does not prohibit the installation of one behind the smart meter installation. This
|
|
makes it theoretically possible for a utility company to still install a load switch to disconnect a customer, but this
|
|
would be a spearate installation from the smart meter. In Germany there are significant barriers that have to be met
|
|
before a utility company may cut power to a household\cite{delaw01}. The elision of a load switch means attacks on
|
|
German meters will be limited in influence to billing irregularities and attacks using DSM equipment.
|
|
|
|
% TODO elaborate DSM attacks vs. whole-household attacks in attacks section
|
|
|
|
\subsubsection{The Netherlands}
|
|
The Netherlands were early to take initiative to roll out smart metering after its recognition by the European
|
|
Commission in 2006\cite{cuijpers01,ec04}. After overcoming political issuses the Netherlands were above the European
|
|
median in 2018 having replaced almost half of all meters\cite{cuijpers01,ec03}. Dutch smart meters are standardized by a
|
|
consortium of distribution system operators. They integrate gateway and metrology functions into one device. The
|
|
utility-facing interface is a IEC DLMS/COSEM-based interface over cellular radio such as GPRS or
|
|
LTE\cite{aubel01}. Like e.g.\ the German standard, the Dutch standard precisely specifies all communication
|
|
interfaces of the meter\cite{dsmrp3}. Another parallel is that the Dutch standard also does not cover any functionality
|
|
for remotely disconnecting a household. This absence of a load switch limits attacks on Dutch smart meters to causing
|
|
billing irregularities.
|
|
|
|
\subsubsection{The UK}
|
|
|
|
The UK is currently undergoing a smart metering rollout. Meters in the UK are nationally standardized to provide both
|
|
Zigbee ZSE-based and IEC DLMS/COSEM connectivity. UK smart metering specifications are shared between electrical and gas
|
|
meters. Different to other countries' specifications the UK national specifications require electrical meters to have an
|
|
integrated load switch and gas meters to have an integrated valve. In Northern Ireland most consumers use prepaid
|
|
electricity contracts\cite{anderson02}. Prepayment and credit functionality are also specified in the UK's national
|
|
smart metering standard, as is remote firmware update functionality\cite{ukgov02}. Outside communications in these
|
|
standards is performed through a gateway (there called \emph{communications hub}) that can be shared between several
|
|
meters \cite{ukgov01,ukgov02,ukgov03,brown01,sato01}. The combination of both gas and electricity metering into one
|
|
family of standards and the exceptionally large set of \emph{required} features make the UK regulations the maximalist
|
|
among the ones in this section. The mandatory inclusion of both load switches and remote connectivity up to remote
|
|
firmware update make it an interesting attack target.
|
|
|
|
\subsubsection{Italy}
|
|
|
|
Italy was among the first countries to legally mandate the widespread installation of smart meters in households. Italy
|
|
in 2006 and 2007 by law set a starting date for the rollout in 2008\cite{brown01}. The Italian electricity market was
|
|
recently privatized. While the wholesale market and transmission network privatization has advanced the vast majority of
|
|
retail customers continued to use the incumbent distribution system operator ENEL as their supplier\cite{ec03}. This
|
|
dominant position allowed ENEL to orchestrate the large-scale rollout of smart meters in Italy. Almost every meter in
|
|
Italy had been replaced by a smart meter by 2018\cite{ec03}. An unique feature of the Italian smart metering
|
|
infrastructure is that it relies on Powerline Communication (PLC) to bridge distances between meters and cellular radio
|
|
gateways\cite{gungor01}.
|
|
|
|
\subsubsection{Japan}
|
|
|
|
Japan is currently rolling out smart metering infrastructure. Compared to other countries in Japan significant
|
|
standardization effort has been spent on smart home integration\cite{usitc01,sato01,brown01}. Japan has domestic
|
|
standards (JIS) for metrology and physical dimensions. The TEPCO deployment currently being rolled out is based on the
|
|
IEC DLMS/COSEM standards suite for remote meter reading in conjuction with the Japanese ECHONET protocol for the
|
|
home-area network. Smart meters are connected to TEPCO's backend systems through the customer's internet connection,
|
|
sub-gigahertz radio based on 802.15.4 framing, regular landline internet or PLC\cite{toshiba01,sato01}.
|
|
|
|
A unique point in the Japanese utility metering landscape is that the current practice is monthly manual readings. In
|
|
Japan residential utility meters are usually mounted outside the building on an exterior wall and every month someone
|
|
with a mirror on a long stick will come and read the meter. The meter reader then makes a thermal paper print-out of the
|
|
updated utility bill and puts it into the resident's post box. This practice gives consumers good control over their
|
|
consumption but does incur significant pesonnel overhead. % TODO decide on citation. Maybe the toshiba one?
|
|
|
|
\subsubsection{The USA}
|
|
|
|
In the USA the rollout of smart meters has been promoted by law as early as 2005. The US electricity market is highly
|
|
complex with states having significant authority to decide on their own policies\cite{brown01}. Different from the IEC
|
|
standards used in large fraction of the rest of the world, the USA have their own domestic set of standards for smart
|
|
meters developed by ANSI\cite{sato01}. The main difference between IEC and ANSI-standard meters is that ANSI-standard
|
|
meters are round devices that plug into a wall-mounted socket while IEC devices are usually rectangular and connected
|
|
directly to the mains wiring through large screw terminals\cite{ifixit01}.
|
|
|
|
\subsection{Common themes}
|
|
|
|
Researching the current situation around the world for the above sections we were able to distill some common themes.
|
|
First, smart metering is slowly advancing on a global scale and despite significant reservations from privacy-conscious
|
|
people and consumer advocates it seems it is here to stay. There are some notable exceptions of countries that have
|
|
decided to scale-back an ongoing rollout effort after subsequent analysis showed economical or other
|
|
issues\footnote{cf.\ the Netherlands and Germany}.
|
|
|
|
\subsubsection{The introduction of smart metering}
|
|
|
|
The smart meter rollout is largely driven by utility companies. Utility companies field a variety of arguments for the
|
|
rollout. The most prominent argument is a general increase in energy-efficiency along with a reduction of emissions.
|
|
This argument is based on the estimation that smart metering will increase private customers' awareness of their own
|
|
consumption and this will lead them to reduce their consumption. The second highly popular argument for smart metering
|
|
is that it is necessary for the widespread adoption of renewable energies. This argument again builds on the trend
|
|
towards \emph{green} energy to rationalize smart metering. Often it is formulated as an \emph{inevitability} instead of
|
|
a choice.
|
|
|
|
Academic reception of smart metering is dyed with an almost unanimous enthusiasm. In particular smart meter
|
|
communication infrastructure has received a large amount of research
|
|
attention\cite{dzung01,gungor01,kabalci01,lloret01,mahmood01,yan01,anderson01,anderson02}. Outside of human-computer
|
|
interaction claims that smart meters will reduce customer energy consumption have often been uncritically accepted.
|
|
|
|
\subsubsection{Standardization and reality of smart devices}
|
|
|
|
Regulators, utilities and academics meet in their enthusiasm on the issue of smart home integration of smart metering. A
|
|
feature of many setups is that the meter acts as the centerpiece of a modern, fully integrated smart
|
|
home\cite{aubel01,geelen01,bsi-tr-03109-1,abdallah01}. The smart meter serves as a communication hub between a new class
|
|
of grid-aware loads and the utility company's control center. Large (usually thermal) loads such as dishwashers,
|
|
refrigerators and air conditioners are forecasted to intelligently adapt their heating/cooling cycles to better match
|
|
the grid's supply. A frequent scenario is that in which the meter bills the customer using near-real time pricing, and
|
|
supplies large loads in the customer's household with this pricing information. These loads then intelligently schedule
|
|
their operation to minimize cost\cite{sato01}. At the time in the mid-2000nds when smart metering proposals were first
|
|
advanced this vision might have been an effect of the \emph{law of the instrument}\cite{kaplan01,anderson02}. Back then
|
|
outside of specialty applications household devices were not usually networked\cite{merz01}. Smart meters at the time
|
|
may have seemed the obvious choice for a smart home communications hub.
|
|
|
|
From today's perspective, this idea is obviously outdated. Smart \emph{things} now have found their way into many homes.
|
|
Only these things are directly interconnected through the internet--foregoing the home-area network (HAN) technologies
|
|
anticipated by the smart metering pioneers. The simple reason for this is that nowadays anyone has Wifi, and Wifi
|
|
transceivers have become inexpensive enough to disappear in the bill of materials (BOM) cost of a large home device such
|
|
as a washing machine. Smart meters are usually situated in the basement--physically far away from most of one's devices.
|
|
This makes connecting them to said devices awkward and connecting them via the local Wifi lends the question why the
|
|
smart devices should not simply use the internet in the first place.
|
|
|
|
Connecting things to a smart meter through a local bus is academically appealing. It promises cost-savings from a
|
|
simpler physical layer (such as ZigBee instead of Wifi) and it neatly separates concerns into \emph{home infrastructure}
|
|
and the regular internet. Communication between smart meter and devices never leaves the house. This gives potential
|
|
additional tolerance to utility backend systems breaking. It also physically keeps communication inside the house,
|
|
bypassing the utility's eyes improving both customer privacy and agency. The presently popular model of a device as
|
|
simple as a light switch proxying its every action through a manufacturer's servers somewhere on the public internet is
|
|
in stark contrast to this scenario. Alas, the reason that this model is as popular is that in most cases it simply
|
|
works. Device manufacturers simply integrate one of many off-the-shelf Wifi modules. The resulting device will work
|
|
anywhere on earth\footnote{For some places channel assignments may have to be updated. This is a configuration-level
|
|
change and in some devices is done by the end-user during provisioning.}. A HAN-connected device would have several
|
|
variants with different modems for different standards. Some might work across countries, but some might not. And in
|
|
some countriese there might not even be a standard for smart grid HANs.
|
|
|
|
Looking at the situation like this begs the question why this realization has not yet found its way into mainstream
|
|
acceptance by smart metering implementors. The customer-facing functionality promised through smart meters would be
|
|
simple to implement as part of a now-standard \emph{internet of things} application. An in-home display that shows
|
|
real-time energy consumption and cost statistics would simply be an android tablet fetching summarized data from the
|
|
utility's billing backend. Demand-side response by large loads would be as simple as an HTTP request with a token
|
|
identifying the customer's contract that returns the electricity price the meter is currently charging along with a
|
|
recommendation to switch on or off. It seems the smart home has already arrived while smart metering standardization is
|
|
still getting off the starting blocks\cite{anderson02}.
|
|
% TODO is this too critical? Is maybe the modern smart home compatible with smart meters? Is maybe the local-only path
|
|
% of data, avoiding utility clouds a design feature? (may be true in DE, NL, probably not anywhere else)
|
|
|
|
\section{Security in smart distribution grids}
|
|
|
|
The smart grid in practice is nothing more or less than an aggregation of embedded control and measurement devices that
|
|
are part of a large control system. This implies that all the same security concerns that apply to embedded systems in
|
|
general also apply to most components of a smart grid in some way. Where programmers have been struggling for decades
|
|
now with input validation\cite{leveson01}, the same potential issue raises security concerns in smart grid scenarios as
|
|
well\cite{mo01, lee01}. Only, in smart grid we have two complicating factors present: Many components are embedded
|
|
systems, and as such inherently hard to update. Also, the smart grid and its control algorithms act as a large
|
|
(partially-)distributed system, making problems such as input validation or authentication difficult to
|
|
implement\cite{blaze01} and adding a host of distributed systems problems on top\cite{lamport01}.
|
|
|
|
Given that the electrical grid is a major piece of essential infrastructure in modern civilization, these problems
|
|
amount to significant issues in practice. Attacks on the electrical grid may have grave
|
|
consequences\cite{anderson01,lee01} all the while the long maintenance cycles of various components make the system slow
|
|
to adapt. Thus, components for the smart grid need to be built to a much higher standard of security than most consumer
|
|
devices to ensure they live up to well-funded attackers even decades down the road. This requirement intensifies the
|
|
challenges of embedded security and distributed systems security among others that are inherent in any modern complex
|
|
technological system. The safety-critical nature of modern smart metering ecosystems in particular was quickly
|
|
recognized by security experts\cite{anderson01}.
|
|
|
|
A point we will not consider in much depth is theft of electricity. An incentive for the introduction of smart metering
|
|
that is frequently cited in utility industry publications outside of a general public's view is the reduction of
|
|
electricity theft\cite{czechowski01}. Academic papers tend to either focus on other benefits such as generation
|
|
efficiency gains through better forecasting or try to rationalize the funamentally anti-consumer nature of smart
|
|
metering with strenuous claims of ``enormous social benefits''\cite{mcdaniel01}. Academics rarely point out the large
|
|
economical incentive such \emph{revenue protection} mechanisms provide\cite{anderson01,anderson02}.
|
|
|
|
This thesis will entirely focus on grid stability and discard electricity theft. For the attack scenarios we lay out
|
|
billing inaccuracies of utility companies are of very low urgency compared to grid stability. In fact stability is a
|
|
precondition for billing to happen. Additionally utility companies can already limit the volume of theft by
|
|
cross-refrencing meter readings against trusted readings from upstream sections of the grid. This capability works even
|
|
without smart meters and only gains speed from smart meters. A smart meter cannot prevent the customer from bypassing it
|
|
with a section of wire. Due to the limit on its volume, electricity theft using smart meter hacking would not scale.
|
|
Hackers would quickly be triangulated with no damage to consumers and limited damage to utility companies.
|
|
|
|
\subsection{Privacy in the smart grid}
|
|
|
|
A serious issue in smart metering setups is customer privacy. Even though the meter ``only'' collects aggregate energy
|
|
consumption of a whole household this data is highly sensitive\cite{markham01}. This counterintuitive fact was initially
|
|
overlooked in smart meter deployments leading to outrage, delays and reduced features\cite{cuijpers01}. The root cause
|
|
for this is that given sufficient timing resolution these aggregate measurements contain ample entropy. Through
|
|
disaggregation individual loads can be identified and through pattern matching even complex usage patterns can be
|
|
discerned with alarming accuracy\cite{greveler01}. Similar privacy issues arise in many other areas of modern life
|
|
through pervasive tracking and surveillance\cite{zuboff01}. What makes the case of smart metering worse is that even the
|
|
fig leaf of consent such practices hide behind does not apply here. If I as a citizen do not consent to Google's privacy
|
|
policy Google says I can choose not to use their service. In today's world this may not be a free choice making this
|
|
argument totally invalid, but it is at least technically possible. Smart metering on the other hand is mandated by law.
|
|
In some countries such as Germany a customer unwilling to accept the accompanying privacy violation cannot legally
|
|
evade it\cite{bmwi04}.
|
|
|
|
\subsection{Smart grid components as embedded devices}
|
|
|
|
A fundamental challenge in smart grid implementations is the central role smart electricity meters play. Smart meters
|
|
are used both for highly-granular load measurement and (in some countries) load switching\cite{zheng01}.
|
|
Smart electricity meters are effectively consumer devices. They are built down to a certain price point that is measured
|
|
by the burden it puts on consumers. The cost of a smart meter is ultimately limited by it being a major factor in the
|
|
economies of a smart meter rollout\cite{bmwi03}. Cost requirements preclude some hardware features such as the use of a
|
|
standard hardened software environment on a high-powerded embedded system (such as a hypervirtualized embedded linux
|
|
setup) that would both increase resilience against attacks and simplify updates. Combined with the small market sizes in
|
|
smart grid deployments\footnote{
|
|
Most vendors of smart electricity meters only serve a handful of markets. For the most part, smart meter development
|
|
cost lies in the meter's software % TODO cite?
|
|
There exist multiple competing standards applicable to various parts of a smart electricity meter. In addition,
|
|
most countries have their own certification regimen\cite{cenelec01}. This complexity creates a large development
|
|
burden for new market entrants\cite{perez01}.
|
|
}
|
|
this produces a high cost pressure on the software development process for smart electricity meters.
|
|
|
|
\subsection{The state of the art in embedded security}
|
|
|
|
Embedded software security generally is much harder than security of higher-level systems. This is due to a combination
|
|
of the unique constraints of embedded devices (hard to update, usually small quantity) and their lack of capabilities
|
|
(processing power, memory protection functions, user interface devices). Even very well-funded companies continue to
|
|
have serious problems securing their embedded systems. A spectacular example of this difficulty is the recently-exposed
|
|
flaw in Apple's iPhone SoC first-stage ROM bootloader\footnote{
|
|
Modern system-on-chips integrate one or several CPUs with a multitude of peripherals, from memory and DMA
|
|
controllers over 3D graphics accelerators down to general-purpose IO modules for controlling things like indicator
|
|
LEDs. Most SoCs boot from one of several boot devices such as flash memory, ethernet or USB according to a
|
|
configuration set e.g. by connecting some SoC pins a certain way or set by device-internal write-only fuse bits.
|
|
|
|
Physically, one of the processing cores of the SoC (usually one of the main CPU cores) is connected such that it is
|
|
taken out of reset before all other devices, and is tasked with switching on and configuring all other devices of
|
|
the SoC. In order to run later intialization code or more advanced bootloaders, this core on startup runs a very
|
|
small piece of code hard-burned into the SoC in the factory. This ROM loader initializes the most basic peripherals
|
|
such as internal SRAM memory and selects a boot device for the next bootloader stage.
|
|
|
|
Apple's ROM loader performs some authorization checks, to ensure no unauthorized software is loaded. The present
|
|
flaw allows an attacker to circumvent these checks, booting code not authorized by Apple on a USB-connected iPhone,
|
|
compromising Apple's chain of trust from ROM loader to userland right at its root.
|
|
}, that allows a full compromise of any iPhone before the iPhone X. iPhone 8, one of the affected models, is still being
|
|
manufactured and sold by Apple until April 2020. In another instance in 2016 researchers found multiple flaws in the
|
|
secure-world firmware used by Samsung in their mobile phone SoCs. The flaws they found were both severe architectural
|
|
flaws such as secret user input being passed through untrusted userspace processes without any protection and shocking
|
|
cryptographic flaws such as CVE-2016-1919\footnote{\url{http://cve.circl.lu/cve/CVE-2016-1919}}\cite{kanonov01}. And
|
|
Samsung is not the only large multinational corporation having trouble securing their secure world firmware
|
|
implementation. In 2014 researchers found an embarrassing integer overflow flaw in the low-level code handling untrusted
|
|
input in Qualcomm's QSEE firmware\cite{rosenberg01}. For an overview of ARM TrustZone including a survey of academic
|
|
work and past security vulnerabilities of TrustZone-based firmware see \cite{pinto01}.
|
|
|
|
If all of these very large companies have trouble securing parts of their secure embedded software stacks measuring a
|
|
mere few hundred bytes in Apple's case or a few kilobytes in Qualcomm's, what is a smart electricity meter manufacturer
|
|
to do? For their mass-market phones, these two companies have R\&D budgets that dwarf some countries' national budgets.
|
|
|
|
Since thorough formal verification of code is not yet within reach for either large-scale software development or code
|
|
heavy in side-effects such as embedded firmware or industrial control software\cite{pariente01} the two most effective
|
|
measures for embedded security is reducing the amount of code on one hand, and labour-intensively checking and
|
|
double-checking this code on the other hand. A smart electricity manufacturer does not have a say in the former since it
|
|
is bound by the official regulations it has to comply with, and will likely not have sufficient resources for the
|
|
latter. We are left with an impasse: Manufacturers in this field likely do not have the saftey resources to keep up with
|
|
complex standards requirements. At the same time they have no option to reduce the scope of their implementation to
|
|
alleviate the burden on firmware security.
|
|
|
|
\subsection{Attack avenues in the smart grid}
|
|
|
|
If we model the smart grid as a control system responding to changes in inputs by regulating outputs, on a very high
|
|
level we can see two general categories of attacks: Attacks that directly change the state of the outputs, and attacks
|
|
that try to influence the outputs indirectly by changing the system's view of its inputs. The former would be an attack
|
|
such as one that shuts down a power plant to decrease generation capacity\cite{lee01}. The latter would be an attack
|
|
such as one that forges grid frequency measurements where they enter a power plant's control systems to provoke
|
|
increasing oscillation in the amount of power generated by the plant according to the control systems'
|
|
directions\cite{kosut01,wu01,kim01}.
|
|
|
|
\subsubsection{Communication channel attacks}
|
|
|
|
Communication channel attacks are attacks on the communication links between smart grid components. This could be
|
|
attacks on IP-connected parts of the core network or attacks on shared busses between smart meters and IP gateways in
|
|
substations. Generally, these attacks can be mitigated by securing the aforementioned communication links using modern
|
|
cryptography. IP links can be protected using TLS, and more low-level busses can be protected using more lightweight
|
|
Noise\cite{perrin01}-based protocols.
|
|
|
|
Cryptographic security transforms an attackers ability to manipulate communication contents into a mere denial of
|
|
service attack. Thus, in addition to cryptographic security safety under DoS conditions must be ensured to ensure
|
|
continued system performance under attacks. This safety property is identical with the safety required to withstand
|
|
random outages of components, such as communications link outages due to physical damage from storms, flooding
|
|
etc\cite{sato01}. In general attacks at the meter level are hard to weaponize. Meters primarily serve billing purposes.
|
|
The use of smart meter data for load forecasting is not yet common practice. Additionally smart meter data will only be
|
|
used to refine existing forecasting models based on aggregate data collected at higher vantage points in the
|
|
distribution grid. This combination of smart metering data with more trusted aggregate data from sensors within the grid
|
|
infrastructure limits the potential impact of a data falsification attack on smart meters. It also allows the utility to
|
|
identify potentially corrupt meter readings and thus detect manipulation above a certain threshold. In order for an
|
|
attack to have more far-reaching consequences the attacker would need to compromise additional grid
|
|
infrastructure\cite{kim01,kosut01}.
|
|
|
|
\subsubsection{Exploiting centralized control systems}
|
|
|
|
The type of smart grid attack most often cited in popular discourse, and to the author's knowledge the only type that
|
|
has so far been conducted in practice, is a direct attack on centralized control systems. In this attack, computer
|
|
components of control systems are compromised by the same techniques used to compromise any other kind of computer
|
|
system such as spearfishing, exploiting insecure services running on internet-exposed ports and using one compromised
|
|
system to compromise other systems on the same ostensably secure internal network. These attacks are very powerful as
|
|
they yield the attacker direct control over whatever outputs the control systems are controlling. If an attacker manages
|
|
to compromise the right set of control computers, they may even be able to cause a blackout\cite{lee01}.
|
|
|
|
Despite their potentially large impact, these attacks are only moderately interesting from a scientific perspective. For
|
|
one, their mitigation mostly consists of a straightforward application of security practices well-known for decades.
|
|
Though there is room for the implementation of genuinely new, application-specific security systems in this field, the
|
|
general state of the art is lacking behind other fields of embedded security. From this background low-hanging fruit
|
|
should take priority\cite{heise02}.
|
|
|
|
Given political will these systems can readily be fortified. There is only a comparatively small number of them and
|
|
having a technician drive to every one of them in turn to install a firmware security update is feasible.
|
|
|
|
\subsubsection{Control function exploits}
|
|
|
|
Control function exploits are attacks on the mathematical control loops used by the centralized control system. One
|
|
example of this type of attack are resonance attacks as described in \cite{wu01}. In this kind of attack, inputs from
|
|
peripheral sensors indicating grid load to the centralized control system are carefully modified to cause a
|
|
disproportionally large oscillation in control system action. This type of attack relies on complex resonance effects
|
|
that arise when mechanical generators are electrically coupled. These resonances, coloquially called ``modes'' are
|
|
well-studied in power system engineering\cite{rogers01,grebe01,entsoe01,crastan03}. Even disregarding modern attack
|
|
scenarios, for stability electrical grids are designed with measures in place to dampen any resonances inherent to grid
|
|
structure. Still, requiring an accurate grid model these resonances are hard to analyze and unlikely to be noiticed
|
|
under normal operating conditions.
|
|
|
|
Mitigation of these attacks can be achieved by ensuring unmodified sensor inputs to the control systems in the first
|
|
place. Carefully designing control systems not to exhibit exploitable behavior such as oscillations is also possible but
|
|
harder.
|
|
|
|
\subsubsection{Endpoint exploits}
|
|
|
|
One rather interesting attack on smart grid systems is one exploiting the grid's endpoint devices such as smart
|
|
electricity meters. These meters are deployed on a massive scale, with at least one meter per household on
|
|
average\footnote{Households rarely share a meter but some households may have a separate meter for detached properties
|
|
such as a detached garage or basement.}. Once compromised, restoration to an uncompromised state can potentially be
|
|
very difficult if it requires physical access to thousands of devices hidden inaccessible in private homes.
|
|
|
|
By compromising smart electricity meters, an attacker can trivially forge the distributed energy measurements these
|
|
devices perform. In a best-case scenario, this might only affect billing and lead to customers being under- or
|
|
over-charged if the attack is not noticed in time. In a less ideal scenario falsified energy measurements reported by
|
|
these devices could impede the correct operation of centralized control systems.
|
|
|
|
In some countries such as the UK smart meters have one additional function that is highly useful to an attacker: They
|
|
contain high-current load switches to disconnect the entire household or business in case electricity bills are left
|
|
unpaid for a certain period. In countries that use these kinds of systems on a widespread level, the load disconnect
|
|
switch is controlled by the smart meter's central microcontroller. This allows anyone compromising this
|
|
microcontroller's firmware to actuate the load switch at will. Given control over a large number of network-connected
|
|
smart meters, an attacker might thus be able to cause large-scale disruptions of power
|
|
consumption\cite{anderson01,temple01}. Combined with an attack method such as the resonance attack from \cite{wu01}
|
|
that was mentioned above, this scenario poses a serious danger to grid stability.
|
|
|
|
In places where Demand-Side Management (DSM) is common this functionality may be abused in a similar way. In DSM the
|
|
smart metering system directly controls power to certain devices such as heaters. The utility can remotely control the
|
|
turn-on and turn-off of these devices to smoothen out the load curve. In exchange the customer is billed a lower price
|
|
for the energy consumed by these loads. DSM was traditionally done with de-centralized systems mostly through
|
|
low-frequency PLC over the distribution grid. Smart metering systems no longer require large, resource-intensive
|
|
transmitters in substations and thus potentially allow the rollout of such technology on a much wider scale than before.
|
|
This leads to a potentially significant role of DSM systems in the impact calculation of an attack on a smart metering
|
|
system. DSM does not control as much load capacity as remote disconnect switches do. The attacks cited in the above
|
|
paragraph still fundamentally apply.
|
|
|
|
\subsection{Practical threats}
|
|
|
|
As a highly integrated system the electrical grid is vulnerable to attacks from several angles. One way to classify
|
|
attacks is by their motivation. Along this axis we found the following motives:
|
|
|
|
\begin{description}
|
|
\item[Service disruption.] An attack aimed at disrupting service could e.g.\ aim at causing a blackout. It could
|
|
also take aim in a more subtle way targeting a degradation of parameters such as power quality (voltage,
|
|
frequency and waveform). It could target a particular customer, geographic area or all parts of the grid.
|
|
Possible motivations range from a bored tennage hacker to actual cyberwar\cite{cleveland01,lee01}.
|
|
\item[Commercial disruption.] Simple commercial motives already motivate a wide variety of attacks on grid
|
|
infrastructure\cite{czechowski01}. Though generally mostly harmless from a cypersecurity point of view there are
|
|
instances where these attacks put the lives of both the attacker and bystanders at grave risk\cite{anderson01}.
|
|
Such attacks generally aim at the meter itself but a more sophisticated attacker might also target the
|
|
utility's backend computer-bureaucracy.
|
|
\item[Data extraction.] The smart grid collects large amounts of data on both individual consumers and on an
|
|
aggregate level. The privacy risk in individual consumer's data is obvious. On the web
|
|
data collection practices from questionable to flat-out illegal have widely proliferated for various purposes up
|
|
to manipulation of elections\cite{heise03}. Assuming criminals in this field would eschew fertile ground such as
|
|
this due to legal or ethical concerns is optimistic. Taking the risk to individual customer's data out of the
|
|
equation even aggregate data is still highly attractive to some. Aggregate real-time electricity usage data is a
|
|
potential source on timely information on things such as national social events (through TV set energy
|
|
consumption\cite{greveler01}) or just plainly the state of the economy.
|
|
\end{description}
|
|
|
|
A factor to consider in all these cases is that one actor's attacks have the potential to weaken system security
|
|
overall. An attacker might add new backdoors to gain persistence or they might disable existing mitigations to enable
|
|
further steps of their attack.
|
|
|
|
In this paper we will largely concentrate on attacks of the first type because they both have the most serious
|
|
consequences and the most motivated attackers. Attackers that may want to disrupt service include cyberwar operations of
|
|
enemy nation states. This type of attacker is both highly skilled and highly funded.
|
|
|
|
\subsection{Conclusion or, why we are doomed}
|
|
|
|
We can conclude that a compromise of a large number of smart electricity meters cannot be ruled out. The complexity of
|
|
network-connected smart meter firmware makes it exceedingly unlikely that it is in fact flawless. Large-scale
|
|
deployments of these devices under some circumstances such as where they are used with load disconnect relays make them
|
|
an attractive target for attackers interested in causing grid instability. The attacker model for these devices includes
|
|
nation states, who have considerable resources at their disposal.
|
|
|
|
For a reasonable guarantee that no large-scale compromises of hard- and software built today will happen over a span of
|
|
some decades, we would have to radically simplify its design and limit attack surface. Unfortunately, the complexity of
|
|
smart electricity meter implementations mostly stems from the large list of requirements these devices have to conform
|
|
with. Alas, the standards have already been written, political will has been cast into law and changes that reduce scope
|
|
or functionality have become exceedingly unlikely at this point.
|
|
|
|
A general observation with smart grid systems of any kind is that they comprise a departure from the decentralized
|
|
control structure of yesterday's dumb grid and the advent of centralization at an enormous scale. This modern,
|
|
centralized infrastructure has been carefully designed to defend against malicious actors and all involved parties have
|
|
an interest in keeping it secure. In decentralized systems scaling attacks is inherently harder than in centralized
|
|
systems\cite{anderson02}. Centralization makes for an attractive attack target. An attacker can employ this centralized
|
|
control to their advantage. From this perspective the centralization of smart metering control sytems--sometimes at a
|
|
national level\cite{anderson01,anderson02}--poses a security risk.
|
|
|
|
\chapter{Restoring endpoint safety in an age of smart devices}
|
|
|
|
As laid out in the previous paragraph we cannot fully rule out a large-scale compromise of smart energy meters at some
|
|
point in the long-term future. We have to rephrase our claim to security. We cannot rule out exploitation: We have to
|
|
limit its impact. Assuming that we cannot strip any functionality from smart meters (it may be required by standards or
|
|
for enormous social benefits\cite{mcdaniel01}). All we can do is to flush out an attacker once they are in, i.e.\
|
|
mitigation instead of prevention.
|
|
|
|
In a worst-case scenario an attacker would gain unconstrained code execution (e.g.\ by exploiting a flaw in a network
|
|
protocol implentation). Smart meters use standard microcontrollers that do not have advanced memory protection functions
|
|
(cf.\ Section \ref{sm-cpu}). We can assume the attacker has full control over the main microcontroller given any such
|
|
flaw. With this control they can actuate the load switch if present. They can transmit data through the device's
|
|
communication interfaces or use the user interface components such as LEDs and the LCD. Using the self-programming
|
|
capabilities of flash microcontrollers an attacker may even gain persistency. Note that in systems separating
|
|
cryptographic functions into some form of cryptographic module\footnote{such as systems used in
|
|
Germany\cite{bsi-tr-03109}.} we can be optimistic and assume the attacker has not yet compromised this cryptographic
|
|
co-processor.
|
|
|
|
With the meter's core microcontroller under attacker control we cannot use this microcontroller to restore control over
|
|
the system. We have no way of ensuring the attacker does not simply delete a security mechanism we include in the core
|
|
microcontroller's firmware.
|
|
|
|
Our solution to this problem is to add another smaller microcontroller to the smart meter design. This microcontroller
|
|
will contain a small piece of software that receives cryptographically authenticated commands from utility companies. On
|
|
demand it can reset the meter's core microcontroller to a known-good state. To reliably flush out an attacker from a
|
|
compromised core microcontroller we re-program the core microcontroller in its entirety. We propose using JTAG to
|
|
re-program the core microcontroller with a known-good firmware image read from a sufficiently large SPI flash connected
|
|
to the reset controller. JTAG is supported by most microcontrollers complex enough to be used in a smart meter design.
|
|
JTAG programming functionality can be ported to a new microcontroller with relatively little work.
|
|
|
|
Our solution requires the core mircocontroller's JTAG interface to be activated (i.e. not fused-shut). For our solution
|
|
to work the core microcontroller firmware must not be able to permanently disable the JTAG interface by itself. In
|
|
microcontrollers that do not yet provide this functionality this is a minor change that could be added to a custom
|
|
microcontroller variant at low cost. On most microcontrollers keeping JTAG open should not interfere with code readout
|
|
protection\footnote{Readout protection usually forces a device erase before allowing JTAG access.}. Code secrecy should
|
|
be of no concern\cite{schneier01} here but some manufacturers have strong preferences due to a fear of copyright
|
|
infringement.
|
|
|
|
\section{The theory of endpoint safety}
|
|
\label{sec_criteria}
|
|
|
|
In order to gain anything by adding our reset controller to the smart meter's already complex design we must satisfy two
|
|
interrelated conditions.
|
|
\begin{enumerate}
|
|
\item \emph{security} means our reset controller itself does not have any remotely exploitable flaws
|
|
\item \emph{safety} menas our reset controller will perform its job as intended
|
|
\end{enumerate}
|
|
|
|
Note that our \emph{security} property includes only remote exploitation, and excludes any form of hardware attack.
|
|
Even though most smart meters provide some level of physical security, we do not wish to make any assumptions on this.
|
|
In the following section we will elaborate our attacker model and it will become apparent that sufficient physical
|
|
security to defend against all attackers in our model would be infeasible, and thus we will design our overall system
|
|
to remain secure even assuming some number of physically compromised devices.
|
|
% FIXME expand
|
|
|
|
\subsection{Attack characteristics}
|
|
The attacker model these two conditions must hold under is as follows. We assume three angles of attack: Attacks by the
|
|
customer themselves, attacks by an insider within the metering systems controlling utility company and lastly attacks
|
|
from third parties. Examples for these third parties are hobbyist hackers or outside cyber-criminals on the one hand,
|
|
but also other companies participating in the smart grid infrastructure besides the utility company such as intermediary
|
|
providers of meter-reading services.
|
|
|
|
Due to the critical nature of the electrical grid, we have to include hostile state actors in our attacker model. When
|
|
acting directly, these would be classified as third-party attackers by the above schema, but they can reasonably be
|
|
expected to be able to assume either of the other two roles as well e.g. through infiltration or bribery. In the
|
|
generalized attacker model in \cite{fraunholz01} the authors give a classification of attackers and provide a nice
|
|
taxonomy of attacker properties. In their threat/capability rating, criminals are still considered to have higher threat
|
|
rating than state-sponsored attackers. The New York Times reported in 2016 that some states recruit their hacking
|
|
personnel in part from cyber-criminals. If this report is true, in a worst-case scenario we have to assume a
|
|
state-sponsored attacker to be the worst of both types. Comparing this against the other attacker types in
|
|
\cite{fraunholz01}, this state-sponsored attacker is strictly worse than any other type in both variables. We are left
|
|
with a highly-skilled, very well-funded, highly intentional and motivated attacker.
|
|
|
|
Based on the above classification of attack angles and our observations on state-sponsored attacks, we can adapt
|
|
\cite{fraunholz01} to our problem, yielding the following new attacker types:
|
|
|
|
\begin{enumerate}
|
|
\item \textbf{Utility company insiders controlled by a state actor}
|
|
We can ignore the other internal threats described in \cite{fraunholz01} since an insider cooperating with a
|
|
state actor is strictly worse in every respect.
|
|
\item \textbf{State-sponsored external attackers}
|
|
A state actor can directly attack the system through the internet.
|
|
\item \textbf{Customers controlled by a state actor}
|
|
A state actor can very well compromise some customers for their purposes. They might either physically
|
|
infiltrate the system posing as legitimate customers, or they might simply deceive or bribe existing customers
|
|
into cooperation.
|
|
\item \textbf{Regular customers}
|
|
Though a hostile state actor might gain control of some number of customers through means such as voluntary
|
|
cooperation, bribery, infiltration, they are limited in attack scale since they do not want to arouse premature
|
|
attention. Though regular customers may not have the motivation, skill or resources of a state-sponsored
|
|
attacker, potentially large numbers of them may try to attack a system out of financial incentives. To allow for
|
|
this possibility, we consider regular customers separate from state actors posing as customers in some way.
|
|
\end{enumerate}
|
|
|
|
\subsection{Overall structural system security}
|
|
|
|
Considering overall security, we first introduce the \emph{reset authority}, a trusted party acting as the single
|
|
authority for issuing reset commands in our system. In practice this trusted party may be part of the utility company,
|
|
part of an external regulatory body or a hybrid setup requiring both to cooperate. We assume this party will be designed
|
|
to be secure against all of the above attacker types. The precise design of this trusted party is out of scope for this
|
|
work but we will list some practical suggestions on how to achieve security below. % FIXME do the list
|
|
% FIXME put up a large box on this limitation
|
|
|
|
Using an asymmetric cryptographic design centered around the \emph{reset authority}, we rule out all attacks except for
|
|
denial-of-service attacks on our system by any of the four attacker types. All reset commands in our system originate
|
|
from the \emph{reset authority} and are cryptographically secured to provide authentication and tamper detection.
|
|
Under this model, attacks on the electrical grid components between the \emph{reset authority} and the customer device
|
|
degrade into man-in-the-middle attacks. To ensure the \emph{safety} criterion from Section \ref{sec_criteria} holds we
|
|
must make sure our cryptography is secure against man-in-the-middle attacks and we must try to harden the system against
|
|
denial-of-service attacks by the attacker types listed above. Given our attacker model we cannot fully guard against
|
|
this sort of attack but we can at least choose a commmunication channel that is resilient against denial of service
|
|
attacks under the above model.
|
|
|
|
Finally, we have to consider the issue of hardware security. We will solve the problem of physical attacks on some small
|
|
number of devices by simply not programming any secret information into these devices. This also simplifies hardware
|
|
production. From consideration in this work we explicitly rule out any form of supply-chain attack as
|
|
out-of-scope.
|
|
% FIXME include considerations on production testing somewhere (is the device working? is the right key programmed?)
|
|
|
|
\subsection{Complex microcontroller firmware}
|
|
|
|
The \emph{security} property from \ref{sec_criteria} is in a large part reliant on the security of our reset
|
|
controller firmware. The best method to increase firmware security is to reduce attack surface by limiting external
|
|
interfaces as much as possible and by reducing code complexity as much as possible.
|
|
% FIXME formalize this as something like "Design Goal DG-023-42-1" ?
|
|
If we avoid the complexity of most modern microcontroller firmware we gain another benefit beyond implicitly reduced
|
|
attack surface: If the resulting design is small enough we may attempt formal verification of our security property.
|
|
Though formal verification tools are not yet suitable for highly complex tasks they are already adequate for small
|
|
amounts of code and simple interfaces.
|
|
|
|
\subsection{Modern microcontroller hardware}
|
|
|
|
Microcontrollers have gained enormously in both performance/efficiency as well as in peripheral support. Alas, these
|
|
gains have largely been driven by insatiable customer demand for faster, more powerful chips and for a long time
|
|
security has not been considered important outside of some specific niches such as smartcards. Traditionally a
|
|
microcontroller would spend its entire lifetime without ever being exposed to any networks. Though this trend has been
|
|
reversing with the increasing adoption of internet-of-things things
|
|
and more advanced security features have started appearing in general-purpose microcontrollers, most still lack even
|
|
basic functionality found in processors for computers or smartphones.
|
|
|
|
One of the components lacking from most microcontrollers is strong memory protection or even a memory mapping unit as
|
|
it is found in all modern computer processors and SoCs for applications such as smartphones. Without an MPU/MPU some
|
|
mitigations for memory safety violations cannot be implemented. This and the absence of virtualization tools such as
|
|
ARM's TrustZone make hardening microcontroller firmware a big task. It is very important to ensure memory safety in
|
|
microcontroller firmware through tools such as defensive coding, extensive testing and formal verification.
|
|
|
|
In our design we achieve simplicity on two levels: One, we isolate the very complex metering firmware from our reset
|
|
controller by having both run on separate microcontrollers. Two, we keep the reset controller firmware itself extremely
|
|
simple to reduce attack surface there.
|
|
|
|
\subsection{Regulatory and economical constraints}
|
|
%FIXME
|
|
|
|
\subsection{Safety vs. security: Opting for restoration instead of prevention}
|
|
|
|
By implementing our reset system as a physically separate microcontroller we sidestep most security issues around the
|
|
main application microcontroller. There are some simple measures that can be taken to harden this firmware.
|
|
Implementing industry best practices such as memory protection or stack canaries will harden the system and increase the
|
|
cost of an attack but it will not yield a system that we can be confident enough in to say it is fully secure. The
|
|
complexity of the main application controller firmware makes fully securing the system a formidable effort--and one that
|
|
would have to be repeated by every meter vendor for every one of their code bases.
|
|
|
|
In contrast to this our reset system does not provide any additional security. Any attack that could occur without it
|
|
can still occur with it in place. What it provides is a fail-safe mechanism that can quickly immobilize a malicious
|
|
actor even mid-attack. It does this in a way that can be adapted to any meter architecture and any microcontroller
|
|
platform with low effort since it relies on established standard interfaces such as JTAG and SWD. Concentrating
|
|
research and development resources on a single platform like this allows for a system that is more economical to
|
|
implement across device series and across vendors.
|
|
|
|
Attack resilience in the power grid can benefit from a safety-focused approach. The greater danger such an attack poses
|
|
is not the temporary denial of service of utility metering functions. Even in a highly integrated smart grid as
|
|
envisioned by utility companies their measurement functions are used by utility companies to increase efficiency and
|
|
reduce cost but are not necessary for the grid to function at all. % TODO citation
|
|
Thus if we can provide mere \emph{safety} with a fail-safe semantic instead of unattainable perfect \emph{security} we
|
|
have gained resilience against a large class of realistic attack scenarios.
|
|
|
|
\subsection{Technical outline of a safety reset system}
|
|
|
|
There are several ways our system could be practically implemented. The most basic way is to add a separate
|
|
microcontroller connected to the meter's main application MCU and optionally other embedded microcontrollers such as
|
|
modems. This discrete chip could either be placed on the metering board itself or it could be placed on a separate PCB
|
|
connected to the programming interface(s) of the metering board. In certain cases the latter might allow use in
|
|
otherwise unmodified legacy designs.
|
|
|
|
The saftey reset controller would be a much simpler MCU than the meter's main application controller. Its software can
|
|
be held simple leading to low program flash and RAM requirements. Since it does not need to address rich periphery such
|
|
as external parallel memory, LCDs etc.\ it can be a physically small, low-pin count device. If the main application
|
|
controller is supposed to be reset to a full factory image with little or no reduced functionality its firmware image
|
|
size is certainly too large for the reset controller's embedded flash. Thus a realistic setup would likely use an
|
|
external SPI flash chip to store this image.
|
|
|
|
The most likely interfaces to reset the main application controller and possibly other microcontrollers such as modem
|
|
chips would be the controller's integrated programming port such as JTAG. There exist a variety of programming
|
|
interfaces for microcontrollers but for moderately complex ones JTAG has grown to be by far the most broadly supported
|
|
one. Parallel high-voltage flash programming has come to be uncommon in modern microcontrollers and most chips nowadays
|
|
use some form of a serial interface. Some vendors have their own proprietary serial in-system programming interfaces
|
|
that they use on certain parts instead of or in addition to JTAG. The reasons for this usually are either lower
|
|
complexity in parts that do not require full debugging capabilities as provided by JTAG or the high pin count of JTAG.
|
|
|
|
The kind of microcontroller that would likely be used as the main application controller in a smart meter application
|
|
will almost certainly support JTAG. These microcontrollers are high pin-count devices since they need to connect to a
|
|
large set of peripherals such as the LCD and the large program flash makes it likely for a proper debugging interface to
|
|
be present.
|
|
|
|
The one remaining issue in this coarse technical outline is what communication interface should be used to transmit the
|
|
trigger command to the reset controller. In the following section we will give an overview on communication interfaces
|
|
established in energy metering applications and evaluate each of them for our purpose.
|
|
|
|
\section{Communication channels on the grid}
|
|
|
|
There is a number of well-established technologies for communication on or along power lines. We can distinguish three
|
|
basic system categories: Systems using separate wires (such as DSL over landline telephone wiring), wireless radio
|
|
systems (such as LTE) and \emph{powerline communication} (PLC) systems that re-use the existing mains wiring and
|
|
superimpose data transmissions on the 50 Hz mains sine\cite{gungor01,kabalci01}.
|
|
|
|
For our scenario, we will ignore short-range communication systems. There exists a large number of \emph{wideband}
|
|
powerline communication systems that are popular with consumers for bridging ethernet between parts of an apartment or
|
|
house. These systems transmit at up to several hundred megabits over distances up to several tens of
|
|
meters\cite{kabalci01}. Technologically, these wideband PLC systems are very different from \emph{narrowband} systems
|
|
used by utilities for load management among other applications and they are not relevant to our analysis.
|
|
|
|
\subsection{Powerline communication (PLC) systems and their use}
|
|
|
|
In long-distance communications for applications such as load management, PLC systems are attractive since they allow
|
|
re-using the existing wiring infrastructure and have been used as early as in the 1930s\cite{hovi01}. Narrowband PLC
|
|
systems are a potentially low-cost solution to the problem of transmitting data at small bandwidth over distances of
|
|
several hundred meters up to tens of kilometers.
|
|
|
|
Narrowband PLC systems transmit on the order of kilobits per second or slower. A common use of this sort of system are
|
|
\emph{ripple control} systems. These systems superimpose a low-frequency signal at some few hundred Hertz carrier
|
|
frequency on top of the 50Hz mains sine. This low-frequency signal is used to encode switching commands for
|
|
non-essential residential or industrial loads. Ripple control systems provide utilities with the ability to actively
|
|
control demand while promising small savings in electricity cost to consumers\cite{dzung01}.
|
|
|
|
In any PLC system there is a strict tradeoff between bandwidth, power and distance. Higher bandwidth requires higher
|
|
power and reduces maximum transmission distance. Where ripple control systems usually use few transmitters to cover
|
|
the entire grid of a regional distribution utility, higher-bandwidth bidirectional systems used for automatic meter
|
|
reading (AMR) in places such as italy or france require repeaters within a few hundred meters of a transmitter.
|
|
|
|
\subsection{Landline and wireless IP-based systems}
|
|
|
|
Especially in automated meter reading (AMR) infrastructure the cost-benefit tradeoff of powerline systems does not
|
|
always work out for utilities. A common alternative in these systems is to use the public internet for communication.
|
|
Using the public internet has the advantage of low initial investment on the part of the utility company as well as
|
|
quick commissioning. Disadvantages compared to a PLC system are potentially higher operational costs due to recurring
|
|
fees to network providers as well as lower reliability. Being integrated into power grid infrastructure, a PLC system's
|
|
failure modes are highly correlated with the overall grid. Put briefly, if the PLC interface is down, there is a good
|
|
chance that power is out, too. In contrast to this general internet services exhibit a multitude of failures that are
|
|
entirely decorrelated from power grid stability.
|
|
|
|
For purposes such as meter reading for billing purposes, this stability is sufficient. However for systems that need to
|
|
hold up in crisis situations such as the recovery system we are contemplating in this thesis, the public internet may
|
|
not provide sufficient reliability.
|
|
|
|
\subsection{Short-range wireless systems}
|
|
|
|
Smart meters contain copious amonuts of firmware but still pale in comparison to the complexity of full-scale computers
|
|
such as smartphones. For short-range communication between a meter and a cellular radio gateway mounted nearby or
|
|
between a meter an an meter reading operator in a vehicle on the street a protocol such as Wifi (802.11) might be too
|
|
complex in most cases. Absent widely-used standards in this space proprietary radio protocols instead grow very
|
|
attractive. These might be based on some standardized lower-level protocol such as ZigBee (802.15) or might be entirely
|
|
home-grown. To a meter manufacturer a proprietary radio protocol has several advantages. It is easy to implement and
|
|
requires zero external certification. It can be customized to its specific application. In addition it provides some
|
|
level of vendor lock-in to customers sharing infrastructure such as a cellular radio gateway between multiple devices.
|
|
In other fields where a lack of standardization has led to a proliferation of proprietary protocols such as home
|
|
automation this has led to a fragmented protocol landscape. In other fields this is a large problem since consumer
|
|
cannot easily integrated products made by different manufacturers into one system. In advanced metering infrastructure
|
|
this is unlikely to be a disadvantage since ususally there is only one distribution grid operator for an area.
|
|
Additionally shared resources such as a cellular radio gateway would most likely only be shared within a single building
|
|
and within a single building usually all meters are operated by the same provider.
|
|
|
|
Systems in Europe commonly support Wireless M-Bus, an european standardized protocol\cite{silabs01} that operates on
|
|
several ISM bands\footnote{
|
|
Frequency bands that can be used for \emph{Industrial, Scientific and Medical} applications by anyone and that do
|
|
not require obtaining a license for transmitter operation. Manufacturers can use whatever protocol they like on
|
|
these bands as long as they obtain certification that their transmitters obey certain spectral and power
|
|
limitations.
|
|
}. ZigBee is another popular standard and some vendors additionally support their own proprietary protcols\footnote{
|
|
For an example see \cite{honeywell01}.
|
|
}.
|
|
% TODO expand this?
|
|
|
|
\subsection{Frequency modulation as a communication channel}
|
|
|
|
For our system, we chose grid frequency modulation (henceforth GFM) as a low-bandwidth uni-directional broadcast
|
|
communications channel. Compared to traditional PLC GFM requires only a small amount of additional hardware, works
|
|
reliably throughout the grid and is harder to manipulate by a malicious actor.
|
|
|
|
Grid frequency in europe's synchronous areas is nominally 50 Hertz, but there are small load-dependent variations from
|
|
this nominal value. Any device connected to the power grid (or even just within physical proximity of power wiring) can
|
|
reliably and accurately measure grid frequency at low hardware overhead. By intentionally modifying grid frequency, we
|
|
can create a very low-bandwidth broadcast communication channel. Grid frequency modulation has only ever been proposed
|
|
as a communications channel at very small scales in microgrids before\cite{urtasun01} but to our knowledge has not yet
|
|
been considered for large-scale application.
|
|
|
|
Advantages of using grid frequency for communication are low receiver hardware complexity as well as the fact that a
|
|
single transmitter can cover an entire synchronous area. Though the transmitter has to be very large and powerful, setup
|
|
of a single large transmitter faces lower bureaucratic hurdles than integration of hundreds of smaller ones into
|
|
hundreds of local systems each with autonomous goverance.
|
|
|
|
\subsubsection{The frequency dependency of grid frequency}
|
|
|
|
Despite the awesome complexity of large power grids the physics underlying their response to changes in load and
|
|
generation is surprisingly simple. Individual machines (loads and generators) can be approximated by a small number of
|
|
differential equations and the entire grid can be modelled by aggregating these approximations into a large system of
|
|
nonlinear differential equations. Evaluating these systems it has been found that in large power grids small-signal
|
|
steady-state changes in generation/consumption power balance cause an approximately linear change in
|
|
frequency\cite{kundur01,crastan03,entsoe02,entsoe04}. \emph{Small signal} here describes changes in power balance that
|
|
are small compared to overall grid power. \emph{Steady state} describes changes over a timeframe of multiple cycles as
|
|
opposed to transient events that only last a few milliseconds.
|
|
|
|
This approximately linear relationship allows the specification of a coefficient linking $\Delta P$ and $\Delta f$ with
|
|
unit \si{\watt\per\hertz}. In this thesis we are using the European power grid as our model system. We are
|
|
using data provided by ENTSO-E (formerly UCTE), the governing association of european transmission system operators. In
|
|
our calculations we use data for the continental european synchronous area, the largest synchronous area. $\frac{\Delta
|
|
P}{\Delta f}$, called \emph{Overall Network Power Frequency Characteristic} by ENTSO-E is around
|
|
\SI{25}{\giga\watt\per\hertz}.
|
|
|
|
We can derive general design parameter for any system utilizing grid frequency as a communications channel from the
|
|
policies of ENTSO-E\cite{entsoe02,entsoe03}. Any such system should stay below a modulation amplitude of
|
|
\SI{100}{\milli\hertz} which is the threshold defined in the ENTSO-E incidents classification scale for a Scale 0-1
|
|
(from "Anomaly" to "Noteworthy Incident" scale) frequency degradation incident\cite{entsoe03} in the continental europe
|
|
synchronous area.
|
|
|
|
\subsubsection{Control systems coupled to grid frequency}
|
|
|
|
The ENTSO-E Operations Handbook Policy 1 chapter defines the activation threshold of primary control to be
|
|
\SI{20}{\milli\hertz}. Ideally a modulation system would stay well below this threshold to avoid fighting the primary
|
|
control reserve. Modulation line rate should likely be on the order of at most a few hundred millibaud. Modulation at
|
|
such high rates would outpace primary control action which is specified by ENTSO-E as acting within between ``a few
|
|
seconds'' and \SI{15}{\second}.
|
|
|
|
The effective \emph{Network Power Frequency Characteristic} of primary control in the european grid is reported by
|
|
ENTSO-E at around \SI{20}{\giga\watt\per\hertz}. Keeping modulation amplitude below this threshold would help to avoid
|
|
spuriously triggering these control functions. This works out to an upper bound on modulation power of
|
|
\SI{20}{\mega\watt\per\milli\hertz}.
|
|
|
|
\subsubsection{An outline of practical transmitter implementation}
|
|
|
|
In its most basic form a transmitter for grid frequency modulation would be a very large controllable load connected to
|
|
the power grid at a suitable vantage point. A spool of wire submerged in a body of cooling water (such as a small lake
|
|
with a fence around it) along with a thyristor rectifier bank would likely suffice to perform this function during
|
|
occassional cybersecurity incidents. We can however decrease hardware and maintenance investment even further compared
|
|
to this rather uncultivated solution by repurposing regular large industrial loads to our transmitter purposes in an
|
|
emergency situation. For some preliminary exploration we went through a list of energy-intensive industries in
|
|
Europe\cite{ec01}. The most electricity-intensive industries in this list are primary aluminium and steel production.
|
|
In primary production raw ore is converted into raw metal for further refinement such as casting, rolling or extrusion.
|
|
In steelmaking iron is smolten in an electric arc furnace. In aluminium smelting aluminium is electrolytically extracted
|
|
from alumina. Both processes involve large amounts of electricity with electricity making up \SI{40}{\percent} of
|
|
production costs. Given these circumstances a steel mill or aluminium smelter would be good candidates as transmitters
|
|
in a grid frequency modulation system.
|
|
|
|
In aluminium smelting high-voltage mains is transformed, rectified and fed into about 100 series-connected cells forming
|
|
a \emph{potline}. Inside the pots alumina is dissolved in molten cryolite electrolyte at about
|
|
\SI{1000}{\degreeCelsius} and electrolysis is performed using a current of tens or hundreds of kiloampere. Resulting
|
|
pure aluminium settles at the bottom of the cell and is tapped off for further processing.
|
|
|
|
Like steelworks, aluminium smelters are operated night and day without interruption. Aside from metallurgical issues the
|
|
large thermal mass and enormous heating power requirements do not permit power-cycling. Due to the high costs of
|
|
production inefficiencies or interruptions the behavior of aluminium smelters under power outages is a fairly
|
|
well-characterized phenomenon in the industry. The recent move away from nuclear power and to renewable energy has lead
|
|
to an increase in fluctuations of electricity price throughout the day. These electricity price fluctuations have
|
|
provided enough economic incentive to aluminium smelters to develop techniques to modulate smelter power consumption
|
|
without affecting cell lifetime or the output product\cite{duessel01,eisma01}. Power outages of tens of minutes up to
|
|
two hours reportedly do not cause problems in aluminium potlines and are in fact part of routine operation for purposes
|
|
such as electrode changes\cite{eisma01,oye01}.
|
|
|
|
The power supply system of an aluminium plant is managed through a highly-integrated control system as keeping all cells
|
|
of a potline under optimal operating conditions is challenging. Modern power supply systems employ large banks of diodes
|
|
or SCRs to rectify low-voltage AC to DC to be fed into the potline\cite{ayoub01}. The potline voltage can be controlled
|
|
almost continuously through a combination of a tap changer and a transductor. The individual cell voltages can be
|
|
controlled by changing the anode to cathode distance (ACD) by physically lowering or raising the anode. The potline
|
|
power supply is connected to the high voltage input and to the potline through isolators and breakers.
|
|
|
|
In an aluminium smelter most of the power is sunk into resistive losses and the electrolysis process. As such an
|
|
aluminium smelter does not have any significant electromechanical inertia compared to the large rotating machines used
|
|
in other industries. Depending on the capabilities of the rectifier controls high slew rates should be possible,
|
|
permitting modulation at high\footnote{Aluminium smelter rectifiers are \emph{pulse rectifiers}. This means instead of
|
|
simply rectifying the incoming three-phase voltage they use a special configuration of transformer secondaries and in
|
|
some cases additional coils to produce a large number (such as 6) of equally spaced phases. Where
|
|
a direct-connected three-phase rectifier would draw current in 6 pulses per cycle a pulse rectifier draws current in
|
|
more, smaller pulses to increase power factor. E.g. a 12-pulse rectifier will draw current in 12 pulses per cycle. In
|
|
the best case an SCR pulse rectifier switched at zero crossing should allow \SIrange{0}{100}{\percent} load changes from
|
|
one rectifier pulse to the next, i.e. within a fraction of a single cycle.} data rates.
|
|
|
|
% FIXME validate this \subsubsection with an expert
|
|
|
|
\subsubsection{Avoiding dangerous modes}
|
|
|
|
Modern power systems are complex electromechanical systems. Each component is controlled by several carefully tuned
|
|
feedback loops to ensure voltage, load and frequency regulation. Multiple components are coupled through transmission
|
|
lines that themselves exhibit complex dynamic behavior. The overall system is generally stable, but may exhbit some
|
|
instabilities to particular small-signal stimuli\cite{kundur01,crastan03}. These instabilities, called \emph{modes}
|
|
occur when due to mis-tuning of parameters or physical constraints the overall system exhibits oscillation at particular
|
|
frequencies. These are separated into four categories in \cite{kundur01}:
|
|
|
|
\begin{description}
|
|
\item[Local modes] where a single power station oscillates in some parameter
|
|
\item[Interarea modes] where subsections of the overall grid oscillate w.r.t.\ each other due to weak coupling
|
|
between them
|
|
\item[Control modes] caused by imperfectly tuned control systems
|
|
\item[Torsional modes] that originate from electromechanical oscillations in the generator itself
|
|
\end{description}
|
|
|
|
The oscillation frequencies associated with each of these modes are usually between a few tens of Millihertz and a few
|
|
Hertz\cite{grebe01,entsoe01,crastan03}. It is hard to predict the particular modes of a power system at the scale of the
|
|
central-european interconnected system. Theoretical analysis and simulation may give rough indications but cannot yield
|
|
conclusive results. Due to the obvious danger as well as high economical impact due to inefficiencies experimental
|
|
measurements are infeasible. Finally, modes are highly dependent on the power grid's structure and will change with
|
|
changes in the power grid over time. For all of these reasons, a grid frequency modulation system must be designed very
|
|
conservatively without relying on the absence (or presence) of modes at particular frequencies. A concrete design
|
|
guideline that we can derive from this situation is that the frequency spectrum of any grid frequency modulation system
|
|
should not exhibit any notable peaks and should avoid a concentration of spectral energy in certain frequency ranges.
|
|
|
|
\subsubsection{Overall system parameters}
|
|
|
|
In conclusion we end up with the following tunable parameters for a grid frequency modulation based on a large
|
|
controllable load:
|
|
|
|
\begin{description}
|
|
\item[Modulation amplitude.] Amplitude is proportionally related to modulation power. In a practical setup we might
|
|
realize a modulation power up to a few hundred \si{\mega\watt} which would yield maybe a few tens of
|
|
\si{\milli\hertz} of frequency amplitude.
|
|
\item[Modulation pre-emphasis and slew-rate control.] Pre-emphasis might be necessary to ensure an adequate
|
|
Signal-to-Noise ratio (SNR) at the receiver. Slew-rate control and other shaping measures might be necessary to
|
|
reduce the impact of these sudden load changes on the transmitter's primary function (say, aluminium smelting)
|
|
and to prevent disturbances to grid components.
|
|
\item[Modulation frequency]. For a practical implementation a careful study would be necessary to determine an
|
|
optimal frequency band for operation. On one hand we need to prevent disturbances to the grid such as through
|
|
excitation of some local or inter-area modes. On the other hand we need to optimize Signal-to-Noise ratio (SNR)
|
|
and data rate to achieve optimal latency between transmission start and successful reception and to reduce the
|
|
overall burden on transmitter and grid.
|
|
\item[Further modulation parameters.] The modulation itself has numerous parameters that are discussed in sec.\
|
|
\ref{mod_params} below.
|
|
\end{description}
|
|
|
|
\section{From grid frequency to a reliable communication channel}
|
|
|
|
\subsection{Channel properties}
|
|
In this section we will explore how we can construct a reliable communication channel from the analog primitive we
|
|
outline in the previous section. Our load control approach to grid frequency modulation leads to a channel with the
|
|
following properties.
|
|
|
|
\begin{description}
|
|
\item[Slow-changing.] Accurate grid frequency measurements need several periods of the mains sine wave. Faster
|
|
sampling rates can be achieved with more complex specialized synchrophasor estimation algorithms but this will
|
|
result in a tradeoff between sampling rate and accuracy\cite{belega01}.
|
|
\item[Analog.] Grid frequency is an analog signal.
|
|
\item[Noisy.] While stable over long periods of time thanks to Load-Frequency Control\cite{entsoe04} it shows
|
|
considerable random short-term variations. In addition our modulation amplitude is limited by technical and
|
|
economic constraints so we have to find a system that will work at poor SNRs.
|
|
\item[Polarized.] Grid frequency measurements have an inherent sense of \emph{up} (higher frequencies). We can use
|
|
this in a polarized modulation scheme to encode information without first transmitting some reference signal to
|
|
establish this polarization.
|
|
\end{description}
|
|
|
|
\subsection{Modulation and its parameters}
|
|
\label{mod_params}
|
|
|
|
In this section we will consider how to select a good set of parameters for a modulation scheme fitting grid frequency
|
|
modulation.
|
|
|
|
The sensitivity of the grid to oscillation at particular frequencies described above means we should avoid any
|
|
modulation technique that would concentrate a lot of energy in a small bandwidth. Taking this principle to its extreme
|
|
provides us with a useful pointer towards techniques that might work well: Spread-spectrum techniques. By employing
|
|
spread-spectrum modulation we can produce an almost ideal frequency-domain behavior that spreads the modulation energy
|
|
almost flat across the modulation bandwidth\cite{goiser01} while at the same time achieving some modulation gain,
|
|
increasing system sensitivity. This modulation gain spread-spectrum techniques yield potentially allows us to use a
|
|
weaker stimulus, allowing further reduction of the probability of disturbance to the overall system. Spread-spectrum
|
|
techniques also inherently allow us to tune the tradeoff between receiver sensitivity and data rate. This tunability is
|
|
a highly useful parameter to have for the overall system design.
|
|
|
|
Spread spectrum covers a whole family of techniques. In \cite{goiser01} these techniques are divided into the coarse
|
|
categories of \emph{Direct Sequence Spread Spectrum}, \emph{Frequency Hopping Spread Spectrum} and \emph{Time Hopping
|
|
Spread Spectrum}.
|
|
|
|
In \cite{goiser01} a BPSK or similar modulation is assumed underlying the spread-spectrum technique. Our grid frequency
|
|
modulation channel effectively behaves more like a DC-coupled wire than a traditional radio channel: Any change in
|
|
excitation will cause a proportional change in the receiver's measurement. Using our fft-based measurement methodology
|
|
we get a real-valued signed quantity. In this way grid frequency modulation is similar to a channel using coherent
|
|
modulation. We can transmit not only signal strength, but polarity too.
|
|
|
|
For our purposes we can discount both Time and Frequency Hopping Spread Spectrum techniques. Time hopping aids to reduce
|
|
interference between multiple transmitters but does not help with SNR any more than Direct Sequence does since all it
|
|
does is allowing other transmitters to transmit. Our system is strictly limited to a single transmitter so we do not
|
|
gain anything through Time Hopping.
|
|
|
|
Frequency Hopping Spread Spectrum techniques require a carrier. Grid frequency modulation itself is very limited in
|
|
peak frequency deviation $\Delta f$. Frequency hopping could only be implemented as a second modulation on top of GFM,
|
|
but this would not yield any benefits while increasing system complexity and decreasing data bandwidth.
|
|
|
|
Direct Sequence Spread Spectrum is the only remaining approach for our application. Direct Sequence Spread Spectrum
|
|
works by directly modulating a long pseudorandom bit sequence onto the channel. The receiver must know the same
|
|
pseudo-random bit sequence and continuously calculates the correlation between the received signal and the pseudo-random
|
|
template sequence mapped from binary $[0, 1]$ to bipolar $[1, -1]$. The pseudorandom sequence has approximately equal
|
|
number of $0$ and $1$ bits the correlation between the sequence and uncorrelated noise is small. The positive
|
|
contribution of the $+1$ terms of the correlation template approximately cancel out with the $-1$ terms when multiplied
|
|
with an uncorrelated signal such as white gaussian noise or another pseudo-random sequence.
|
|
|
|
By using a family of pseudo-random sequences with low cross-correlation channel capacity can be increased. Either the
|
|
transmitter can encode data in the choice of sequence or multiple transmitters can use the same channel at once. The
|
|
longer the pseudo-random sequence the lower its cross-correlation with noise or other pseudorandom sequences of the same
|
|
length. Choosing a long sequence we increase modulation gain while decreasing bandwidth. For any given application the
|
|
sweet spot will be the shortest sequence that is long enough to yield sufficient SNR for subsequent processing layers
|
|
such as channel coding.
|
|
|
|
A popular code used in many DSSS systems are Gold codes. A set of Gold codes has small cross-correlations. For some
|
|
value $n$ a set of Gold codes contains $2^n + 1$ sequences of length $2^n - 1$. Gold codes are generated from two
|
|
different maximum length sequences generated by linear feedback shift registers (LFSRs). For any bit count $n$ there are
|
|
certain empirically determined preferred pairs of LFSRs that produce Gold codes with especially good cross-correlation.
|
|
The $2^n + 1$ gold codes are defined as the XOR sum of both LFSR sequences shifted from $0$ to $2^n-1$ bit as well as
|
|
the two individual LFSR sequences. Given LFSR sequences \texttt{a} and \texttt{b} in numpy notation this is
|
|
\mintinline{python}{[a, b] + [ a ^ np.roll(b, shift) for shift in len(b) ]}.
|
|
|
|
In DSSS modulation the individual bits of the DSSS sequence are called \emph{chips}. Chip duration determines modulation
|
|
bandwidth\cite{goiser01}. In our system we are directly modulating DSSS chips on mains frequency without an underlying
|
|
modulation such as BPSK as it is commonly used in DSSS systems.
|
|
|
|
\subsection{Error-correcting codes}
|
|
|
|
To make our overall system reliable we have to layer some channel coding on top of our DSSS modulation. The messages we
|
|
expect to transmit are at least a few tens of bits long. We are highly constrained in SNR due to limited transmission
|
|
power. With lower SNR comes higher BER (bit error rate). Packet error rate grows exponentially with transmission length.
|
|
For our relatively long transmissions we would realistically get unacceptable error rates.
|
|
|
|
Error correcting codes are a very broad field with many options for specialization. Since we are implementing nothing
|
|
more than a prototype in this thesis we chose to not expend resources on optimization too much and settled on a basic
|
|
reed-solomon code. The state of the art has advanced considerably since the discovery of reed-solomon
|
|
codes\cite{mackay01}. The main areas of improvement are overhead and decoding speed. Since message length in our system
|
|
limits system response time but we do not have a fixed target we can tolerate some degree of overhead. Decoding speed
|
|
is of very low concern to us because our data rate is extremely low.
|
|
|
|
An important concern for our prototype implementation was the availability of reference implementations of our error
|
|
correcting code. We need a python implementation for test signal generation on a regular computer and we need a small C
|
|
or C++ implementation that we can adapt to embedded firmware. LDPC codes are a popular textbook example of
|
|
error-correcting codes and we had no particular difficulty finding either.
|
|
|
|
\subsection{Cryptographic security}
|
|
\label{sec-crypto}
|
|
|
|
Informally the system we are looking for can be modelled as consisting of three parties: the trusted
|
|
\emph{transmitter}, one of a large number of untrusted \emph{receivers}, and an \emph{attacker}. These three play
|
|
according to the following rules:
|
|
|
|
\begin{description}
|
|
\item[Access.] Both transmitter and attacker can transmit any bit sequence.
|
|
\item[Indistinguishability.] The receiver receives any transmission by either but cannot distinguish between them.
|
|
\item[Kerckhoff's principle.] The attacker knows anything any receiver might know\cite{kerckhoff01,kerckhoff02}.
|
|
\item[Priority.] The transmitter is stronger than an attacker and will ``win'' during simultaneous transmission.
|
|
\item[Seeding.] Both transmitter and receiver can be seeded out-of-band with some information on each other such as
|
|
public key fingerprints.
|
|
\end{description}
|
|
|
|
We are not considering situations where an attacker attempts to jam an ongoing transmission. In practice there are
|
|
several avenues to prevent such attempts. Compromised loads that are being abused by the attacker can be manually
|
|
disconnected by the utility. Error-correcting codes can be used to provide resiliency against small-scale disturbances.
|
|
Finally, the transmitter can be designed to have high enough power to be able to override any likely attacker.
|
|
|
|
Our goal is to find a cryptographic primitive that has the following properties:
|
|
\begin{description}
|
|
\item[Authenticity.] The transmitter can produce a message bit sequence that a subset of receivers can identify as
|
|
being generated by the transmitter. On reception of this sequence, all addressed receivers perform a safety
|
|
reset.
|
|
\item[Unforgeability.] The attacker cannot forge a message, i.e.\ find a bit sequence other than one of the
|
|
transmitter's previous messages that a receiver would accept. This implies that the attacker also cannot modify
|
|
an existing message.
|
|
\item[Brevity.] The message should be short. Our communications channel is outrageously slow compared to anything
|
|
else used in modern telecommunications and every bit counts.
|
|
\end{description}
|
|
|
|
On a protocol level we also have to ensure \emph{idempotence}. Our system should have an at-most-once semantic. This
|
|
means for a given message each receiver either performs exactly one safety reset or none at all, even if the message is
|
|
re-transmitted by either the transmitter or an attacker. We cannot achieve the ideal exactly-once semantic wit pure
|
|
protocol gymnastics since we are using an unidirectional lossy communication primitive. A receiver might be offline
|
|
(e.g.\ due to a local power outage) and then would not hear the transmission even if our broadcast primitive was
|
|
reliable. Since there is no back-channel, the transmitter has no way of telling when that happens. The practical impact
|
|
of this can be mitigated by the transmitter by repeating the transmission a number of times.
|
|
|
|
It follows from the unforgeability requirement that we can trivially reach idempotence at the protocol level by keeping
|
|
a database of all previous messages and only accepting \emph{new} messages. By considering this in our cryptographic
|
|
design we can reduce the storage requirement for this ``database''.
|
|
|
|
Along with the indistinguishability property the access requirement implies that we need a cryptographic
|
|
signature\cite{lamport01}. However, we have relaxed constraints on this signature compared to cryptographic practice.
|
|
While cryptographic signatures need to work over arbitrary inputs, all we want to ``sign'' here is the instruction to
|
|
perform a safety reset. This is the only message we might ever want to transmit so our message space has only one
|
|
entry. The information content of our message thus is 0 bit! All the information we want to transmit is already
|
|
encoded \emph{in the fact that we are transmitting}. We do not require any further payload to be transmitted. We can
|
|
omit the entirety of the message and just transmit whatever ``signature'' we produce. This is useful to conserve
|
|
transmission bits so our transmission does not take an exceeedingly long time over our extremely slow
|
|
communication channel.
|
|
|
|
We can modify this construction to allow for a small number of bits of information content in our message (say two or
|
|
three instead of zero) at no transmission overhead. We could transmit the cryptographic signature as usual but simply
|
|
omit the message. The message is only a few bits and we are dealing with minutes of transmission time so the receiver
|
|
can reconstruct the message through brute-force. Though this tradeoff between computation and data transmission might
|
|
seem inelegant it does work for our extremely slow link for very few bits.
|
|
|
|
There is an important limitation in the rules of our setup above: The attacker can always record the reset bit sequence
|
|
the transmitter transmits and replay that same sequence later. Even without cryptography we can trivially prevent an
|
|
attacker from violating the at-most-once criterion. If every receiver memorizes all bit sequences that have been
|
|
transmitted so far it can detect replays. With this mitigation by replaying an older authentic transmission an attacker
|
|
can cause receivers that were offline during the original transmission to reset at a later point. Considering our goal
|
|
is to reset them in the first place this should not pose a threat to the system's safety or security.
|
|
|
|
A possible scenario would be that an attacker first causes enough havoc for authorities to trigger a safety reset. The
|
|
attacker would record the trigger transmission. We can assume most meters were reset during the attack. Due to this the
|
|
attacker cannot cause a significant number of additional resets immediately afterwards. However, the attacker could
|
|
wait several years for a number of new meters to be installed. These new meters might not yet have updated firmware
|
|
including the lastest transmission. This means the attacker could cause them to reset by replaying the original
|
|
sequence.
|
|
|
|
A possible mitigation for this risk would be to introduce one bit of information into the trigger message that is
|
|
ignored by the replay protection mechanism. This \emph{enable} bit would be $1$ for the actual reset trigger message.
|
|
After the attack the transmitter would then perform scheduled transmissions of a ``disarm'' message that has this bit
|
|
set to $0$. This message informs all new meters and meters that were offline during the original transmission of the
|
|
original transmission for replay protection without actually performing any further resets.
|
|
|
|
We could use any of several traditional asymmetric cryptographic primitives to produce these signatures. The
|
|
comparatively high computational effort required for signature verification would not be an issue. Transmissions take
|
|
several minutes anyway and we can afford to spend some tens of seconds even in signature verification. Transmission
|
|
length and by proxy system latency would be determined by the length of the signature. For RSA signature length is the
|
|
modulus length (i.e. larger than \SI{1000}{bit} for very basic contemporary security). For elliptic curve-based systems
|
|
curve length is approximately twice the security level and signature size is twice the curve length because two curve
|
|
points need to be encoded\cite{anderson02}. For contemporary security this results in more than 300 bit transmission
|
|
length. Thanks to our unique setting we can do better than this. We can exploit that our effective message entropy is 0
|
|
bit to derive a more efficient scheme.
|
|
|
|
\subsubsection{Lamport signatures}
|
|
|
|
1979, Lamport in \cite{lamport02} introduced a signature scheme that is based only on a one-way function such as a
|
|
cryptographic hash function. The basic observation is that by choosing a random secret input to a one-way function and
|
|
publishing the output, one can later prove knowledge of the input by simply publishing it. In the following paragraphs
|
|
we will describe a construction of a one-time signature scheme based on this observation. The scheme we describe is the
|
|
one usually called a ``Lamport Signature'' in modern literature but is slightly different from the variant described in
|
|
the 1979 paper. For our purposes we can consider both to be equivalent.
|
|
|
|
\paragraph{Setup.} In a Lamport signature, for an n-bit hash function $H$ the signer generates a private key $s =
|
|
\left(s_{b, i} | b\in\left\{0, 1\right\}, 0\le i<n\right)$ of $2n$ random strings of length $n$. The signer publishes a
|
|
public key $p = \left(p_{b, i} = H\left(s_{b, i}\right), b\in\left\{0, 1\right\}, 0\le i<n\right)$ that is simply the
|
|
list of hashes of each of the random strings that make up the private key.
|
|
|
|
\paragraph{Signing.} To sign a message $m$, the signer publishes the signature $\sigma = \left(\sigma_i = k_{H(m)_i,
|
|
i}\right)$ where $H(m)_i$ is the $i$-th bit of $H$ applied to $m$. That is, for the $i$-th bit of the message's hash
|
|
$H(m)$ the signer publishes either of $p_{0, i}$ or $p_{1, i}$ depending on the hash bit's value, keeping the other
|
|
entry of $P$ secret.
|
|
|
|
\paragraph{Verification.} The verifier can compute $H(m)$ themselves and check the corresponding entries $\sigma_i =
|
|
k_{H(m)_i}$ of $S$ correctly evaluate to $p_{b, i} = H\left(s_{b, i}\right)$ from $P$ under $H$.
|
|
|
|
The above scheme is a one-time signature scheme only. After one signature has been published for a given key, the
|
|
corresponding key must not be re-used for other signatures. This is intutively clear as we are effectively publishing
|
|
part of the private key as the signature, and if we were to publish a signature for another message an attacker could
|
|
derive additional signatures by ``mixing'' the two published signatures.
|
|
|
|
\subsubsection{Winternitz signatures}
|
|
|
|
An improvement to basic Lamport signatures as described above are Winternitz signatures as detailed in
|
|
\cite{merkle01,dods01}. Winternitz signatures reduce public key length as well as signature length for hash length $n$
|
|
from $2n$ to $\mathcal O \left(n/t\right)$ for some choice of parameter $t$ (usually a small number such as 4).
|
|
|
|
\paragraph{Setup.} The signer generates a private key $s = \left(s_i\right)$ consisting of $\ceil{\frac{n}{t}}$ random
|
|
bit strings. The signer publishes a public key $p = \left(H^{2^t}\left(s_i\right)\right)$ where each element
|
|
$H^{2^t}\left(s_i\right)$ is the $2^t$-fold recursive application of $H$ to $s_i$.
|
|
|
|
\paragraph{Signing.} The signer splits $m$ padded to a multiple of $t$ bits into $\ceil{\frac{n}{t}}$ chunks $m_i$ of
|
|
$t$ bit each. The signer publishes the signature $\sigma = \left( \sigma_i = H^{m_i}\left(s_i\right) \right)$.
|
|
|
|
\paragraph{Verification.} The verifier can calculate for each $\sigma_i = H^{m_i}\left(s_i\right)$ that $H^{2^t -
|
|
m_i}\left(\sigma_i\right) = H^{2^t - m_i}\left(H^{m_i}\left(s_i\right)\right) = H^{2^t - m_i + m_i} \left(s_i\right) =
|
|
p_i$.
|
|
|
|
To prevent an attacker from forging additional signatures from one signature by calculating $\sigma_i' =
|
|
H\left(\sigma_i\right)$ matching $m_i' = m_i + 1$, this scheme is usually paired with a simple checksum as described in
|
|
\cite{merkle01}.
|
|
|
|
\subsubsection{Using hash-based signatures for trigger authentication}
|
|
|
|
The most basic possible trigger authentication scheme would be to simply generate a random bit string secret key $s$ and
|
|
publish $p = H(s)$ for some hash function $H$. To activate the trigger, $\sigma = s$ is published and receivers verify
|
|
that $H(\sigma) = p = H(s)$. This simplistic scheme has one main disadvantage: It is a fundamentally one-time
|
|
construction. To prevent an attacker from re-triggering a receiver a second time by replaying a valid trigger $\sigma$
|
|
all receivers have to blacklist any ``used'' $\sigma$. Alas, this means we can only ever trigger a receiver \emph{once}.
|
|
The good part is that any receiver that missed this trigger can still be triggered later, but the bad part is that once
|
|
$s$ is burned we are out of options. The trivial solution to this would be to simply inform each receiver with a whole
|
|
list of public keys in advance. This however takes $n$ times the amount of space for $n$-fold retriggerability and we
|
|
have to memorize separately for each one whether it has been used up. Luckily we can easily derive a scheme that yields
|
|
$n$-fold retriggerability and naturally memorizes replay state while using no more same space than the original scheme
|
|
by taking some inspiration from Winternitz signatures above.
|
|
|
|
In this scheme the secret key $s$ is still a random bit string. The public key is $p = H^n(s)$ for $n$-times
|
|
retriggerability. The $i$-th time the trigger is activated, $\sigma_i = H^n-i(s)$ is published, and every receiver can
|
|
verify that $\sigma_{i-1} = H\left(\sigma_i\right)$ with $\sigma_0 = p$. In case a receiver missed one or more previous
|
|
triggers it continues computing $H\left(H\left(\sigma_i\right)\right)$ and
|
|
$H\left(H\left(H\left(\sigma_i\right)\right)\right)$ until either reaching the $n$-th recursion level (indicating an
|
|
invalid signature) or finding $H^n\left(\sigma_i\right) = \sigma_j$ with $sigma_j$ being the last signature this
|
|
receiver recorded, or $p$ in case there is none.
|
|
|
|
This scheme provides replay protection through receiver memorizing the last signature they activated to. Public key
|
|
length is equal to the length of the hash function $H$ used. Even for our embedded systems use case $n$ can
|
|
realistically be up to $\mathcal O\left(10^3\right)$, which is easily enough for our purposes.
|
|
|
|
The ``disarm'' message we discussed above can be integrated into this scheme by encoding the ``enable'' bit into the
|
|
least significant bit of $n$ in our $H^n$ construction. In the chain of valid signatures every second one would be a
|
|
disarm signature. Reset and disarm signatures would alternate in this scheme. By skipping a disarm signature two resets
|
|
can still be triggered directly after one another.
|
|
|
|
In practice it may be useful to have some control over which particular meters reset. An attack exploiting a particular
|
|
network protocol implementation flaw might only affect one series of meters made by one manufacturer. Resetting
|
|
\emph{all} meters may be too much in this case. A simple solution for this is to define adressable subsets of meters.
|
|
``All meters'' along with ``meters made by manufacturer $x$'' and ``meters of model $y$'' are good choices for such
|
|
scopes. On the cryptographic level the protocol state is simply duplicated for each scope. This incurs memory and
|
|
computation overhead linear in the number of scopes. Device memory requirements are small at a few bytes only and
|
|
computation is of no concern due to the very slow channel so this simple solution is adequate. The transmitter has to
|
|
either store copies of all scope's keys or derive these keys from a root key using the scope's identifier. Keys are
|
|
small and the transmitter would be using a regular server or hardware security module so either easily feasible.
|
|
|
|
A diagram of the key structure in this key management scheme is shown in Figure \ref{fig:sig_key_chain}. The
|
|
transmitter key management is shown in Figure \ref{fig:tx_scope_key_illu}. This scheme is simplistic but suffices for
|
|
our prototype in Section \ref{sec-prototype} and may even be useful in a practical implementation. During
|
|
standardization of a safety reset system the key management system would most likely have to be customized to the
|
|
particular application's requirements. Developing an universal solution is outside the scope of this work.
|
|
% FIXME revisit this section - 2020-05-26
|
|
\begin{figure}
|
|
\centering
|
|
\begin{minipage}[c]{0.5\textwidth}
|
|
\includegraphics{resources/signature_key_chain}
|
|
\end{minipage}
|
|
\begin{minipage}[c]{0.45\textwidth}
|
|
\caption{
|
|
The hash chain between secret transmitter key and public device key. Each step represents one invocation of the
|
|
hash function. To generate a new chain a random transmitter key is generated, then hashed $n$ times to
|
|
generate the corresponding device key. A new trigger message can be generated by generating the key at depth
|
|
$m-1$ where $m$ is the height of the last used trigger, or $n$ initially. Every second trigger message is a
|
|
disarm message and every second one a reset message. Depending on which is needed the other one may be skipped.
|
|
}
|
|
\label{fig:sig_key_chain}
|
|
\end{minipage}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics{resources/transmitter_scope_key_illustration}
|
|
\caption{
|
|
An illustration of a key management system using a shared master key. The transmitter derives one secret key for
|
|
each adressable group from the master key. Then public device keys are generated like in Figure
|
|
\ref{fig:sig_key_chain}. Finally for each device the manufacturer picks the group public keys matching the
|
|
device. In this example one device is a series A meter made by manufacturer B so it gets provisioned with the
|
|
keys for the ``all devices'', ``manufacturer B'' and ``series A'' keys. The other device is also made by
|
|
manufacturer B but is a series C device so it gets provisioned with the ``all devices'', ``manufacturer B'' and
|
|
``series C'' public device keys. In this example the transmitter stores (or is able to derive) all six shown
|
|
group keys, but each device only needs to store the three applying to it for the three scopes ``all devices'',
|
|
``manufacturer'' and ``series''.
|
|
}
|
|
\label{fig:tx_scope_key_illu}
|
|
\end{figure}
|
|
|
|
\chapter{Practical implementation}
|
|
|
|
To validate the practical feasibility of the theoretical concepts we laid out in the previous chapter we decided to
|
|
build a prototype of a safety reset controller. In this section we describe the reasoning behind the components of this
|
|
prototype and the engineering that went into its firmware. The prototype consists of a smart meter whose application
|
|
microcontroller is reset by a prototype reset controller on an external circuit board. We lay out how we extensively
|
|
tested all parts of our firmware implementation. We conclude with results of a practical end-to-end experiment
|
|
exercising every part of our prototype.
|
|
|
|
\section{Data collection for channel validation}
|
|
|
|
To design a solid system we needed to parametrize mains frequency variations under normal conditions. To set modulation
|
|
amplitude as well as parameters of our modulation scheme we need a frequency spectrum of mains frequency variations
|
|
(that is $\mathcal F\left(f(V(t))\right)$: Taking mains frequency $f(x)$ as a variable, the frequency spectrum of that
|
|
variable, as opposed to the frequency spectrum of mains voltage $V(t)$ itself).
|
|
|
|
\subsection{Grid frequency estimation}
|
|
\label{frequency_estimation}
|
|
|
|
In commercial power systems Phasor Measurement Units (PMUs) are used to precisely measure parameters of a mains voltage
|
|
waveform. One of the parameters PMUs measure is mains frequency. PMUs are used as part of SCADA systems controlling
|
|
transmission networks to characterize the operational state of the network.
|
|
|
|
From a superficial viewpoint measuring mains frequency might seem like a simple problem. Take the mains voltage
|
|
waveform, measure time between two rising-edge (or falling-edge) zero-crossings and take the inverse $f = t^{-1}$. In
|
|
practice, phasor measurement units are significantly more complex than this. This discrepancy is due to the combination
|
|
of both high precision and quick response that is demanded from these units. High precision is necessary since
|
|
variations of mains frequency under normal operating conditions are quite small--in the range of
|
|
\SIrange{5}{10}{\milli\hertz} over short intervals of time. Relative to the nominal \SI{50}{\hertz} this is a derivation
|
|
of less than \SI{100}{ppm} Relative to the corresponding \SI{20}{\milli\second} period that means a time derivation of
|
|
about $2 \mu\text{s}$ from cycle to cycle. From this it is already obvious why a simplistic measurement cannot yield the
|
|
required precision for manageable averaging times--we would need either a ADC sampling rate in the order of megabits or
|
|
for a reconstruction through interpolated readings an impractically high ADC resolution.
|
|
|
|
Detail on the inner workings of commercial phasor measurement units is scarce but given their essential role to SCADA
|
|
systems there is a large amount of academic research on such algorithms\cite{narduzzi01,derviskadic01,belega01}. A
|
|
popular approach to these systems is to perform a Short-Time Fourier Transform (STFT) on ADC data sampled at high
|
|
sampling rate (e.g. \SI{10}{\kilo\hertz}) and then perform some analysis on the frequency-domain data to precisely
|
|
locate the strong peak around \SI{50}{\hertz}. A key observation here is that FFT bin size is going to be much larger
|
|
than required frequency resolution. This fundamental limitiation follows from the nyquist criterion %FIXME maybe cite?
|
|
and if we had to process an \emph{arbitrary} signal this would highly limit our practical measurement accuracy
|
|
\footnote{
|
|
Some software packages providing FFT or STFT primitives such as scipy\cite{virtanen01} allow the user to
|
|
super-sample FFT output by specifying an FFT width larger than input data length, padding the input data with zeros
|
|
on both sides. Note that in line with Nyquist this \emph{does not} actually provide finer output resolution but
|
|
instead just amounts to an interpolation between output bins. Depending on the downstream analysis algorithm it may
|
|
still be sensible to use this property of the DFT for interpolation, but in general it will be computationally
|
|
expensive compared to other interpolation methods and in any case it will not yield any better frequency resolution
|
|
aside from a hypothetical numerical advantage\cite{gasior02}.
|
|
}.
|
|
For this reason all approaches to mains frequency estimation are based on a model of the mains voltage waveform.
|
|
Nominally, this waveform would be a perfect sine at $f=\SI{50}{\hertz}$. In practice it is a sine at
|
|
$f\approx\SI{50}{\hertz}$ superimposed with some aperiodic noise (e.g. irregular spikes from inductive loads being
|
|
energized) as well as harmonic distortion that is caused by grid-topologically nearby devices with power factor
|
|
$\cos \theta \neq 1.0$. Under a continous fourier transform over a long period the frequency spectrum of a signal
|
|
distorted like this will be a low noise floor depending mainly on aperiodic noise on which a comb of harmonics as well
|
|
as some sub-harmonics of $f \approx f_\text{nom} = \SI{50}{\hertz}$ rides. The main peak at $f \approx f_\text{nom}$
|
|
will be very strong with the harmonics being approximately an order of magnitude weaker in energy and the noise floor
|
|
being at least another order of magnitude weaker. See Figure \ref{mains_voltage_spectrum} for a measured spectrum. This
|
|
domain knowledge about the expected frequency spectrum of the signal can be employed in a number of interpolation
|
|
techniques to re-construct the precise frequency of the spectrum's main component despite comparatively coarse STFT
|
|
resolution and despite numerous distortions.
|
|
|
|
Published grid frequency estimation algorithms such as \cite{narduzzi01,derviskadic01} are rather sophisticated and use
|
|
a combination of techniques to reduce numerical errors in FFT calculation and peak fitting. Given that we do not need
|
|
reference standard-grade accuracy for our application we chose to start with a very basic algorithm instead. We chose to
|
|
use a general approach to estimate the precise fundamental frequency of an arbitrary signal that was published by
|
|
experimental physicists Gasior and Gonzalez at CERN\cite{gasior01}. This approach assumes a general sinusoidal signal
|
|
superimposed with harmonics and broadband noise. Applicable to a wide spectrum of practical signal analysis tasks it is
|
|
a reasonable first-degree approximation of the much more sophisticated estimation algorithms developed specifically for
|
|
power systems. Some algorithms have components such as kalman filters\cite{narduzzi01} that require a phyiscal model.
|
|
As a general algorithm \cite{gasior01} does not require this kind of application-specific tuning, eliminating one source
|
|
of error.
|
|
|
|
The Gasior and Gonzalez algorithm\cite{gasior01} passes the windowed input signal through a DFT, then interpolates the
|
|
signal's fundamental frequency by fitting a wavelet such as a gaussian to the largest peak in the DFT results. The bias
|
|
parameter of this curve fit is an accurate estimation of the signal's fundamental frequency. This algorithm is similar
|
|
to the simpler interpolated DFT algorithm used as a reference in much of the synchrophasor estimation
|
|
literature\cite{borkowski01}. The three-term variant of the maximum sidelobe decay window often used there is a blackman
|
|
window with parameter $\alpha = \frac{1}{4}$. Analysis has shown\cite{belega01} that the interpolated DFT algorithm is
|
|
worse than algorithms involving more complex models under some conditions but that there is \emph{no free lunch} meaning
|
|
that more complex perform worse when the input signal deviates from their models.
|
|
|
|
\subsection{Frequency sensor hardware design}
|
|
% FIXME: link to schematics in appendix
|
|
% FIXME: include pics of finished board and device
|
|
|
|
\label{sec-fsensor}
|
|
Our safety reset controller will have to measure mains frequency to later demodulate a reset signal transmitted through
|
|
it. Since we have decided to do our own frequency measurement system here we can use this frequency measurement setup as
|
|
a prototype for the frequency measurement subcomponent of the demodulation system we will later develop. Since we do not
|
|
plan to do a large-scale field deployment of our measurement setup we can keep the hardware implementation simple by
|
|
moving most of the signal processing to a regular computer and concentrating our hardware efforts on raw signal capture.
|
|
|
|
\begin{figure}
|
|
\begin{center}
|
|
\begin{tikzpicture}[start chain = going below, node distance = 12mm and 50mm, every join/.style = {norm}]
|
|
\tikzset{
|
|
base/.style = {draw, on chain, on grid, align=center, minimum height = 4ex, font=\footnotesize},
|
|
text/.style = {base},
|
|
component/.style = {base, rectangle, text width=40mm},
|
|
coord/.style = {coordinate, on chain, on grid, node distance=6mm and 25mm}
|
|
}
|
|
\node[text centered] (input) {Single-Phase Mains Input};
|
|
\node[component] (safety) [below = of input] {Input Protection};
|
|
\node[coord] (safety-anchor) [below = of safety] {};
|
|
\node[component] (analog) [below = of safety-anchor] {Analog Signal Processing};
|
|
\node[component] (powersupply) [left = of analog] {Power supply};
|
|
\node[component] (adc) [below = of analog] {ADC};
|
|
\node[component] (micro) [below = of adc] {Microcontroller};
|
|
\node[component] (isol) [below = of micro] {Galvanic Digital Isolation};
|
|
\node[coord] (isol-left) [left = 6cm of isol.west] {};
|
|
\node[coord] (isol-right) [right = 1cm of isol.east] {};
|
|
\node[component] (usb) [below = of isol] {USB interface};
|
|
|
|
\draw[->] (input.south) -- (safety.north);
|
|
\draw[-] (safety.south) -- (safety-anchor);
|
|
\draw[->] (safety-anchor) -| (powersupply.north);
|
|
\draw[->] (safety-anchor) -| (analog.north);
|
|
\draw[->] (powersupply.south) |- (adc.west);
|
|
\draw[->] (powersupply.south) |- (micro.west);
|
|
\draw[->] (analog.south) -- (adc.north);
|
|
\draw[->] (adc.south) -- (micro.north);
|
|
\draw[->] (micro.south) -- (isol.north);
|
|
\draw[->] (isol.south) -- (usb.north);
|
|
|
|
\draw[dashed] (isol.west) -- (isol-left.east);
|
|
\draw[dashed] (isol.east) -- (isol-right.west);
|
|
\end{tikzpicture}
|
|
\end{center}
|
|
\caption{Frequency sensor hardware diagram.}
|
|
\label{fmeas-sens-diag}
|
|
\end{figure}
|
|
|
|
An overall block diagram of our system is shown in Figure \ref{fmeas-sens-diag}. The mircrocontroller we chose is an
|
|
\texttt{STM32F030F4P6} ARM Cortex-M0 microcontroller made by ST Microelectronics. The ADC in Figure
|
|
\ref{fmeas-sens-diag} in our design is the integrated 12-bit ADC of this microcontroller, which is sufficient for our
|
|
purposes. The USB interface is a simple USB to serial converter IC (\texttt{CH340G}) and the galvanic digital isolation
|
|
is accomplished with a pair of high-speed optocouplers on its \texttt{RX} and \texttt{TX} lines. The analog signal
|
|
processing is a simple voltage divider using high-power resistors to get the required creepage along with some
|
|
high-frequency filter capacitors and an op-amp buffer. The power supply is an off-the-shelf mains-input power module.
|
|
The system is implemented on a single two-layer PCB that is housed in an off-the-shelf industrial plastic case fitted
|
|
with a printed label and a few status lights on its front.
|
|
|
|
\subsection{Clock accuracy considerations}
|
|
|
|
Our measurement hardware will sample line voltage at some sampling rate $f_S$, e.g.\ \SI{1}{\kilo\hertz}. All downstream
|
|
processsing is limited in accuracy by the accuracy of $f_S$\footnote{
|
|
We are not considering the effects of clock jitter. We are highly oversampling the signal and the FFT done in our
|
|
downstream processing will eliminate small jitter effects leaving only frequency stability to worry about. }. We
|
|
generate our sampling clock in hardware by clocking the ADC from one of the microcontroller's timer blocks clocked from
|
|
the microcontroller's system clock. This means our ADC's sampling window will be synchronized cycle-accurate to the
|
|
microcontroller's system clock.
|
|
|
|
Our downstream measurement of mains frequency by nature is relative to our sampling frequency $f_S$. In the setup
|
|
described above this means we have to make sure our system clock is fairly stable. A frequency derivation of \SI{1}{ppm}
|
|
in our system clock causes a proportional grid frequency measurement error of $\Delta f = f_\text{nom} \cdot
|
|
10^{-6} = \SI{50}{\micro\hertz}$. In a worst-case where our system is clocked from a particularly bad crystal that exhibits
|
|
\SI{100}{ppm} of instabilities over our measurement period we end up with an error of \SI{5}{\milli\hertz}. This is well
|
|
within our target measurement range, so we need a more stable clock source. Ideally we want to avoid writing our own
|
|
clock conditioning code where we try to change an oscillators operating frequency to match some reference. Clock
|
|
conditioning algorithms are highly complex and in our case post-processing of measurement data and simply adding and
|
|
offset is simpler and less error-prone.
|
|
|
|
Our solution to these problems is to use a crystal oven\footnote{
|
|
A crystal oven is a crystal oscillator thermally coupled closely to a heater and temperature sensor and enclosed in
|
|
a thermally isolated case. The heater is controlled to hold the crystal oscillator at a near-constant temperature
|
|
some few ten degrees above ambient. Any ambient temperature variations will be absorbed by the temperature control.
|
|
This yields a crystal frequency that is almost completely unaffected by ambient temperature variations below the
|
|
oven temperature and whose main remaining instability is aging.
|
|
}as our main system clock source. Crystal ovens are expensive compared to ordinary crystal oscillators. Since any
|
|
crystal oven will be much more accurate than a standard room-temperature crystal we chose to reduce cost by using one
|
|
recycled from old telecommunications equipment.
|
|
|
|
To verify clock accuracy we routed an externally accessible SMA connector to a microcontroller pin that is routed to one
|
|
of the microcontroller's timer inputs. By connecting a GPS 1pps signal to this pin and measuring its period we can
|
|
calculate our system's Allan variance\footnote{
|
|
Allan variance is a measure of frequency stability between two clocks.
|
|
}, thereby measuring both clock stability and clock accuracy.
|
|
We ran a 4 hour test of our frequency sensor that generated the histogram shown in Figure \ref{ocxo_freq_stability}.
|
|
These results show that while we get a systematic error of about \SI{10}{ppm} due to manufacturing tolerances the
|
|
random error at less than \SI{10}{ppb} is smaller than that of a room-temperature crystal oscillator by 3-4 orders of
|
|
magnitude. Since we are interested in grid frequency variations over time but not in the absolute value of grid
|
|
frequency the systematic error is of no consequence to us. The random error at \SI{3.66}{ppb} corresponds to a
|
|
frequency measurement error of about \SI{0.2}{\micro\hertz}, well below what we can achieve at reasonable sampling rates
|
|
and ADC resolution.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics{../lab-windows/fig_out/ocxo_freq_stability}
|
|
\caption{OCXO Frequency derivation from nominal \SI{19.440}{\mega\hertz} measured against GPS 1pps.}
|
|
\label{ocxo_freq_stability}
|
|
\end{figure}
|
|
|
|
\subsection{Firmware implementation}
|
|
|
|
The firmware uses one of the microcontroller's timers clocked from an external crystal oscillator to produce an
|
|
\SI{1}{\milli\second} tick that the internal ADC is triggered from for a sample rate of \SI{1}{\kilo sps}. Higher sample
|
|
rates would be possible but reliable data transmission over the opto-isolated serial interface might prove challenging
|
|
and \SI{1}{\kilo sps} corresponds to $20$ samples per cycle at $f_\text{nominal}$. This figure exceeds the nyquist
|
|
criterion by a factor of ten and is be plenty for accurate measurements.
|
|
|
|
The ADC measurements are read using DMA and written into a circular buffer. Using some DMA controller features this
|
|
circular buffer is split in back and front halves with one being written to and the other being read at the same time.
|
|
Buffer contents are moved from the ADC DMA buffer into a packet-based reliable UART interface as they come in. The UART
|
|
packet interface keeps two ringbuffers: One byte-based ringbuffer for transmission data and one ringbuffer pointer
|
|
structure that keeps track of ADC data packet boundaries in the byte-based ringbuffer. Every time a chunk of data is
|
|
available from the ADC the data is framed into the byte-based ringbuffer and the packet boundaries are logged in the
|
|
packet pointer ringbuffer. If the UART transmitter is idle at this time a DMA-backed transmission of the oldest packet
|
|
in the packet ringbuffer is triggered at this point. Data is framed using Consistent Overhead Byte Stuffing
|
|
(COBS)\footnote{
|
|
COBS is a framing technique that allows encoding $n$ bytes of arbitray data into exactly $n+1$ bytes with no embedded
|
|
$0$-bytes that can then be delimited using $0$-bytes. COBS is simple to implement and allows both one-pass decoding and
|
|
encoding. The encoder either needs to be able to read up to \SI{256}{\byte} ahead or needs a buffer of \SI{256}{\byte}.
|
|
COBS is very robust in that it allows self-synchronization. At any point a receiver can reliably synchronize itself
|
|
against a COBS data stream by waiting for the next $0$-byte. The constant overhead allows precise bandwidth and buffer
|
|
planning and provides constant, good efficiency close to the theoretical maximum.}\cite{cheshire01} along with a
|
|
CRC-32 checksum for error checking. When the host receives a new packet with a valid checksum it returns an
|
|
acknowledgement packet to the sensor. When the sensor receives the acknowledgement, the acknowledged packet is dropped
|
|
from the transmission packet ringbuffer. When the host detects an incorrect checksum it simply stays quiet and waits for
|
|
the sensor to resume with retransmission when the next ADC buffer has been received.
|
|
|
|
The serial interface logic presents most of the complexity of the sensor firmware. This complexity is necessary since
|
|
we need reliable, error-checked transmission to the host. Though rare, bit errors on a serial interface do happen and
|
|
data corruption is unacceptable. The packet-layer queueing on the sensor is necessary since the host is not a realtime
|
|
system and unpredictable latency spikes of several hundred milliseconds are possible.
|
|
|
|
The host in our recording setup is a Raspberry Pi 3 model B running a Python script. The Python script handles serial
|
|
communication and logs data and errors into an SQLite database file. SQLite has been chosen for its simple yet flexible
|
|
interface and its good tolerance of system resets due to unexpected power loss. Overall our setup performed adequately
|
|
with IO contention on the raspberry PI/linux side causing only 16 skipped sample packets over a 68-hour recording span.
|
|
|
|
\subsection{Frequency sensor measurement results}
|
|
|
|
Captured raw waveform data has been processed in the Jupyter Lab environment\cite{kluyver01} and grid frequency
|
|
estimates are extracted as described in sec. \ref{frequency_estimation} using the Gasior and Gonzalez\cite{gasior01}
|
|
technique. Appendix \ref{grid_freq_estimation_notebook} contains the Jupyter notebook we used for frequency
|
|
measurement. In Figure \ref{freq_meas_feedback} we fed back to the frequency estimator its own output giving us an
|
|
indication of its numerical performance. The result was \SI{1.3}{\milli\hertz} of RMS noise over a \SI{3600}{\second}
|
|
simulation time. This indicates performance is good enough for our purposes. In addition to this we validated our
|
|
algorithm's performance by applying it to the test waveforms from \cite{wright01}. In this test we got errors of
|
|
\SI{4.4}{\milli\hertz} for the \emph{noise} test waveform, \SI{0.027}{\milli\hertz} for the \emph{interharmonics} test
|
|
waveform and \SI{46}{\milli\hertz} for the \emph{amplitude and phase step} test waveform. Full results can be found in
|
|
Figure \ref{freq_meas_rocof_reference}.
|
|
|
|
Figures \ref{freq_meas_trace} and \ref{freq_meas_trace_mag} show our measurement results over a 24-hour and a 2-hour
|
|
window respectively.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_feedback}
|
|
\caption{
|
|
The frequency estimation algorithm applied to a synthetic noise-less mains waveform generated from its own
|
|
output. This feedback simulation gives an indication of numerical errors in our estimation algorithm. The top
|
|
four graphs show a comparison of the original trace (blue) and the re-calculated trace (orange). The bottom
|
|
trace shows the difference between the two. As we can tell both traces agree very well with an overall RMS
|
|
deviation of about \SI{1.3}{\milli\hertz}. The bottom trace shows deviation growing over time. This is very
|
|
likely an effect of numerical errors in our ad-hoc waveform generator.
|
|
}
|
|
\label{freq_meas_feedback}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_rocof_reference}
|
|
\caption{
|
|
Performance of our frequency estimation algorithm against the test suite specified in \cite{wright01}. Shown are
|
|
standard deviation and variance measurements as well as time-domain traces of differences.
|
|
}
|
|
\label{freq_meas_rocof_reference}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics{../lab-windows/fig_out/freq_meas_trace_24h}
|
|
\caption{Trace of grid frequency over a 24 hour window. One clearly visible feature are large positive and negative
|
|
transients at full hours. Times shown are UTC. Note that the european continental synchronous area that this
|
|
sensor is placed in covers several time zones which may result in images of daily load peaks appearing in 1 hour
|
|
intervals. Figure \ref{freq_meas_trace_mag} contains two magnified intervals from this plot.}
|
|
\label{freq_meas_trace}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics{../lab-windows/fig_out/freq_meas_trace_2h_1}
|
|
\caption{A 2 hour window around 00:00 UTC.}
|
|
\end{subfigure}
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics{../lab-windows/fig_out/freq_meas_trace_2h_2}
|
|
\caption{A 2 hour window around 18:30 UTC.}
|
|
\end{subfigure}
|
|
\caption{Two magnified 2 hour windows of the trace from Figure \ref{freq_meas_trace}.}
|
|
\label{freq_meas_trace_mag}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics{../lab-windows/fig_out/mains_voltage_spectrum}
|
|
\caption{Power spectral density of the mains voltage trace in Figure \ref{freq_meas_trace}. Data was captured using
|
|
our frequency measurement sensor (\ref{sec-fsensor}) and FFT'ed after applying a blackman window. Vertical lines
|
|
indicate \SI{50}{\hertz} and odd harmonics. We can see the expected peak at \SI{50}{\hertz} along with smaller
|
|
peaks at odd harmonics. We can also see a number of spurious tones both between harmonics and at low frequencies, as
|
|
well as some bands containing high noise energy around \SI{0.1}{\hertz}. This graph demonstrates a high
|
|
signal-to-noise ratio that is not very demanding on our frequency estimation algorithm.
|
|
}
|
|
\label{mains_voltage_spectrum}
|
|
\end{figure}
|
|
|
|
\section{Channel simulation and parameter validation}
|
|
\label{sec-ch-sim}
|
|
|
|
To validate all layers of our communication stack from modulation scheme to cryptography we built a prototype
|
|
implementation in python. Implementing all components in a high-level language builds up familiartiy with the concepts
|
|
while taking away much of the implementation complexity. For our demonstrator we will not be able to use python since
|
|
our target platform is a cheap low-end microcontroller. Our demonstrator firmware will have to be written in a low-level
|
|
language such as C or rust. For prototyping these languages lack flexibility compared to python.
|
|
|
|
To validate our modulation scheme we first performed a series of simulations on our python demodulator prototype
|
|
implementation. To simulate a modulated grid frequency signal we added noise to a synthetic modulation signal. For most
|
|
simulations we used measured frequency data gathered with our frequency sensor. We only have a limited amount of capture
|
|
data. Re-using segements of this data as background noise in multiple simulation runs could hypothetically lead to our
|
|
simulation results depending on individual features of this particular capture that would be common between all runs. To
|
|
estimate the impact of this problem we re-ran some of our simulations with artificial random noise synthesized with a
|
|
power spectral density matching that of our capture. To do this, we first measured our capture's PSD, then fitted a
|
|
low-resolution spline to the PSD curve in log-log coördinates. We then generated white noise, multiplied the resampled
|
|
spline with the DFT of the synthetic noise and performed an iDFT on the result. The resulting time-domain signal is our
|
|
synthetic grid frequency data. Figure \ref{freq_meas_spectrum} shows the PSD of our measured grid frequency signal. The
|
|
red line indicates the low-resolution log-log spline interpolation used for shaping our artificial noise. Figure
|
|
\ref{simulated_noise_spectrum} shows the PSD of our simulated signal overlayed with the same spline as a red line and
|
|
shows time-domain traces of both simulated (blue) and reference signals (orange) at various time scales. Visually both
|
|
signals look very similar, suggesting we have found a good synthetic approximation of our measurements.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_spectrum}
|
|
\caption{Power spectral density of the 24 hour grid frequency trace in Figure \ref{freq_meas_trace} with some notable
|
|
peaks annotated with the corresponding period in seconds. The $\frac{1}{f}$ line indicates a pink noise spectrum.
|
|
Around a period of \SI{20}{\second} the PSD starts to fall off at about $\frac{1}{f^3}$ until we can make out some
|
|
bumps at periods around $2$ and \SI{3}{\second}. Starting at at around \SI{1}{Hz} we can see a white noise floor in
|
|
the order of \si{\micro\hertz^2\per\hertz}.
|
|
% TODO: where does this noise floor come from? Is it a fundamental property of the grid? Is it due to limitations of
|
|
% our measurement setup (such as ocxo stability/phase noise) ???
|
|
}
|
|
\label{freq_meas_spectrum}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{../lab-windows/fig_out/simulated_noise_spectrum}
|
|
\caption{Synthetic grid frequency in comparison with measured data. The topmost graph shows the synthetic spectrum
|
|
compared to the spline approximation of the measured spectrum (red line). The other graphs show time-domain
|
|
synthetic data (blue) in comparison with measured data (orange).
|
|
}
|
|
\label{simulated_noise_spectrum}
|
|
\end{figure}
|
|
|
|
In our simulations, we manipulated four main variables of our modulation scheme and demodulation algorithm and observed
|
|
their impact on symbol error rate (SER):
|
|
|
|
\begin{description}
|
|
\item[Modulation amplitude.] Higher amplitude should correspond to a lower SER.
|
|
\item[Modulation bit count.] Higher bit count $n$ means longer transmissions but yields higher theoretical decoding
|
|
gain, and should increase demodulator sensitivity. Ultimately, we want to find a sweet spot of manageable
|
|
transmission length at good demodulator sensitivity.
|
|
\item[Decimation.] or DSSS chip duration. The chip time determines where in the grid frequency spectrum (Figure
|
|
\ref{freq_meas_spectrum} our modulated signal is located. Given our noise spectrum (Figure
|
|
\ref{freq_meas_spectrum}) lower chip durations (shifting our signal upwards in the spectrum) should yield lower
|
|
in-band background noise which should correspond to lower symbol error rates.
|
|
\item[Demodulation correlator peak threshold factor.] The first step of our prototype demodulation algorithm is to
|
|
calculate the correlation between all $2^n+1$ Gold sequences
|
|
and to identify peaks corresponding to the input data containing a correctly aligned Gold sequence. The
|
|
threshold factor is a factor peaks of what magnitude compared to baseline noise levels are considered in the
|
|
following maximum likelihood estimation (MLE) decoding (cf.\ Figure \ref{fig_demo_sig_schema}).
|
|
\end{description}
|
|
|
|
Our results indicate that symbol error rate is a good proxy of demodulation performance. With decreasing signal-to-noise
|
|
ratio, margins in various parts of the demodulator decrease which statistically leads to an increased symbol error rate.
|
|
Our simulations yield smooth, reproducible SER curves with adequately low error bounds. This shows SER is related
|
|
monotonically to the signal-to-noise margins inside our demodulator prototype.
|
|
|
|
\subsection{Sensitivity as a function of sequency length}
|
|
|
|
A basic parameter of our DSSS modulation is the length of the Gold codes used. The length of a Gold code is exponential
|
|
in the code's bit count. Figure \ref{dsss_gold_nbits_overview} shows a plot of the symbol error rate of our demodulator
|
|
prototype depending on amplitude for each of five, six, seven and eigth-bit Gold sequences. In regions where symbol
|
|
error rate is between $0$ and $1$ we can see the expected dependency that a $n+1$ bit Gold sequence at roughly twice
|
|
the length yields roughly one half the SER. We can also observe a saturation effect: At low amplitudes, increasing the
|
|
correlation length does not seem to yield much of a benefit in SER anymore. In particular there seems to be a level of
|
|
about \SI{2.5}{\milli\hertz} signal amplitude where even with asymptotically infinite sequence length our demodulator
|
|
would still not be able to produce a good demodulation. This is likely due to numerical errors in our demodulator. Since
|
|
Gold codes of more than 7 bit would yield unacceptably long transmission times this does not pose a problem in practice.
|
|
|
|
Figure \ref{dsss_gold_nbits_sensitivity} for each bit count shows the minimum signal amplitude where our demodulator
|
|
crossed below $\text{SER}=0.5$. If we have sufficient transmitter power to allocate selecting either a 5 bit or a 6 bit
|
|
gold code looks to yield good enough performance at manageable data rates.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.6\textwidth]{../lab-windows/fig_out/dsss_gold_nbits_overview}
|
|
\caption{
|
|
Symbol Error Rate (SER) as a function of transmission amplitude. The line represents the mean of several
|
|
measurements for each parameter set. The shaded areas indicate one standard deviation from the mean. Background
|
|
noise for each trial is a random segment of measured grid frequency. Background noise amplitude is the same for
|
|
all trials. Shown are four traces for four different DSSS sequence lengths. Using a 5-bit gold code, one DSSS
|
|
symbol measures 31 chips. 6 bit per symbol are 63 chips, 7 bit are 127 chips and 8 bit 255 chips. This
|
|
simulation uses a decimation of 10, which corresponds to an $1 \text{s}$ chip length at our $10 \text{Hz}$ grid
|
|
frequency sampling rate. At 5 bit per symbol, one symbol takes $31 \text{s}$ and one bit takes $6.2 \text{s}$
|
|
amortized. At 8 bit one symbol takes $255 \text{s} = 4 \text{min} 15 \text{s}$ and one bit takes $31.9 \text{s}$
|
|
amortized. Here, slower transmission speed buys coding gain. All else being the same this allows for a decrease
|
|
in transmission power.
|
|
}
|
|
\label{dsss_gold_nbits_overview}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{minipage}[c]{0.5\textwidth}
|
|
\includegraphics{../lab-windows/fig_out/dsss_gold_nbits_sensitivity}
|
|
\end{minipage}
|
|
\begin{minipage}[c]{0.45\textwidth}
|
|
\caption{
|
|
Amplitude at a SER of 0.5\ in mHz depending on symbol length. Here we can observe an increase of sensitivity
|
|
with increasing symbol length, but we can clearly see diminishing returns above 6 bit (63 chips). Considering
|
|
that each bit roughly doubles overall transmission time for a given data length it seems lower bit counts are
|
|
preferrable if the necessary transmitter power can be realized.
|
|
}
|
|
\label{dsss_gold_nbits_sensitivity}
|
|
\end{minipage}
|
|
\end{figure}
|
|
|
|
\subsection{Sensitivity versus peak detection threshold factor}
|
|
|
|
One of the high-level parameters of our demodulation algorithm is the \emph{threshold factor}. This parameter is
|
|
an implementation detail specific to our algorithm and not general to all possible DSSS demodulation algorithms. After
|
|
correlating the input signal against the template Gold sequences our algorithm runs a single-channel discrete wavelet
|
|
transform (DWT) on the correlator output to better discriminate peaks from background noise. The output of this DWT is
|
|
then normalized against a running average and then fed into a simple threshold detector. The threshold of this detector
|
|
is our threshold factor. This threshold is the ratio that a correlation peak after DWT has to stand out from long-term
|
|
average background noise to be considered a peak.
|
|
|
|
The threshold factor is an empirically-determined parameter Low threshold factors yield many false positives that in the
|
|
extreme ultimately overload our MLE estimator's capacity to discard them. Moderate numbers of false positive do not pose
|
|
much of a challenge to our MLE since these spurious peaks have a random time distribution and are easily discarded by
|
|
our MLE's symbol chain detection. High threshold factors lead the algorithm to completely ignore some valid peaks. To
|
|
some degree this can be compensated by our later interpolation step for missing peaks but in the extreme will also break
|
|
demodulation. In our simulations good values lie in the range from $4.0$ to $5.5$.
|
|
|
|
Figure \ref{dsss_thf_amplitude_5678} contains plots of demodulator sensitivity like the one in Figure
|
|
\ref{dsss_gold_nbits_overview}. This time there is one color-coded trace for each threshold factor between $1.5$ and
|
|
$10.0$ in steps of $0.5$. We can see a clear dependency of demodulation performance from trheshold factor with both very
|
|
low and very high values breaking the demodulator. The ``runaway'' traces that we can see at low threshold factors are
|
|
artifacts of an implementation issue with our prototype code. We later fixed this issue in the demonstrator firmware
|
|
implementation in Section \ref{sec-demo-fw-impl}. For comparison purposes this issue do not matter.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics{../lab-windows/fig_out/dsss_thf_amplitude_5678}
|
|
\caption{
|
|
SER vs.\ amplitude graph similar to Figure \ref{dsss_gold_nbits_overview} with one color-coded traces for
|
|
threshold factors between $1.5$ and $10.0$. Each graph shows traces for a single DSSS symbol length.
|
|
}
|
|
\label{dsss_thf_amplitude_5678}
|
|
\end{figure}
|
|
|
|
If we again look at the intercept points where the amplitude traces cross $\text{SER}=0.5$ in these graphs we get the
|
|
plots in Figure \ref{dsss_thf_sensitivity_all_bits}. From this we can conclude that the range between $4.0$ and $5.0$ will
|
|
yield adequate threshold factors for our use case.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics{../lab-windows/fig_out/dsss_thf_sensitivity_5678}
|
|
\caption{
|
|
Graphs of amplitude at $SER=0.5$ for each symbol length as well as asymptotic SER for large amplitudes. Areas
|
|
shaded red indicate that $SER=0.5$ was not reached for any amplitude in the simulated range. The bumps in the 7
|
|
bit and 8 bit graphs are due to the convergence problem we identified above and do not exist in our demonstrator
|
|
implementation. We see that smaller symbol lengths favor lower threshold factors, and that optimal threshold
|
|
factors for all symbol lengths are between $4.0$ and $5.0$.
|
|
}
|
|
\label{dsss_thf_sensitivity_all_bits}
|
|
\end{figure}
|
|
|
|
\subsection{Chip duration and bandwidth}
|
|
|
|
A parameter of any DSSS system is the frequency band used for transmission. Instead of specifying absolute frequencies
|
|
in our simulations we expressed DSSS bandwidth through chip duration and Gold sequence length. In our prototype, chip
|
|
duration is specified in grid frequency sampling periods to ease implementation without loss of generalization.
|
|
|
|
Figure \ref{chip_duration_sensitivity} shows the dependence of symbol error rate at a fixed good threshold factor from
|
|
chip duration. The color bars indicate both chip duration translated to seconds real-time and the resulting symbol
|
|
duration at the given Gold code length. In the lower graphs we show the trace of ampltude at $\text{SER}=0.5$ over chip
|
|
duration like we did in Figure \ref{dsss_thf_sensitivity_all_bits} for threshold facotr. In both graphs we can just about
|
|
see an optimum for very short chips with a decrease of sensitivity for long chips. This effect is due to longer chips
|
|
moving the signal band into noisier spectral regions (cf.\ Figure \ref{freq_meas_spectrum}).
|
|
|
|
\begin{figure}
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{../lab-windows/fig_out/chip_duration_sensitivity_5}
|
|
\label{chip_duration_sensitivity_5}
|
|
\caption{
|
|
5 bit Gold code.
|
|
}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
\begin{figure}
|
|
\ContinuedFloat
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{../lab-windows/fig_out/chip_duration_sensitivity_6}
|
|
\label{chip_duration_sensitivity_6}
|
|
\caption{
|
|
6 bit Gold code.
|
|
}
|
|
\end{subfigure}
|
|
\caption{
|
|
Dependence of demodulator sensitivity on DSSS chip duration. Due to computational constraints this simulation is
|
|
limited to 5 bit and 6 bit DSSS sequences. There is a clearly visible sensitivity maximum at fairly short chip
|
|
lengths around $0.2 \text{s}$. Short chip durations shift the entire transmission band up in frequency. In
|
|
Figure \ref{freq_meas_spectrum} we can see that noise energy is mostly concentrated at lower frequencies, so
|
|
shifting our signal up in frequency will reduce the amount of noise the decoder sees behind the correlator by
|
|
shifting the band of interest into a lower-noise spectral region. For a practical implementation chip duration
|
|
is limited by physical factors such as the maximum modulation slew rate ($\frac{\text{d}P}{\text{d}t}$) and the
|
|
maximum Rate-Of-Change-Of-Frequency (ROCOF, $\frac{\text{d}f}{\text{d}t}$) the grid can tolerate.
|
|
}
|
|
\label{chip_duration_sensitivity}
|
|
\end{figure}
|
|
|
|
In the previous graphs we have used random clips of measured grid frequency noise as noise in our simulations. Comparing
|
|
between a simulation using measured noise and synthetic noise generated as we outlined in the beginning of Section
|
|
\label{sec-ch-sim} we get the plots in Figure \ref{chip_duration_sensitivity_cmp}. We can see that while not perfect our
|
|
simulated noise is an adequate approximation of reality: Our prototype demodulator shows no significant difference in
|
|
behavior between measured and simulated noise. Simulated noise causes slightly worse performance for long chips. Overall
|
|
the results for both are very close in absolute value.
|
|
|
|
\begin{figure}
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{../lab-windows/fig_out/chip_duration_sensitivity_cmp_meas_6}
|
|
\label{chip_duration_sensitivity_cmp_meas_6}
|
|
\caption{
|
|
Simulation using baseline frequency data from actual measurements.
|
|
}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
\begin{figure}
|
|
\ContinuedFloat
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{../lab-windows/fig_out/chip_duration_sensitivity_cmp_synth_6}
|
|
\label{chip_duration_sensitivity_cmp_synth_6}
|
|
\caption{
|
|
Simulation using synthetic frequency data.
|
|
}
|
|
\end{subfigure}
|
|
\caption{
|
|
Chip duration/sensitivity simulation results like in Figure \ref{chip_duration_sensitivity} compared between a
|
|
simulation using measured frequency data like previous graphs and one using artificially generated noise. There
|
|
is little visible difference indicating that we have found a good model of reality in our noise synthesizer, but
|
|
also that real grid frequency behaves like a frequency-shaped gaussian noise process.
|
|
}
|
|
\label{chip_duration_sensitivity_cmp}
|
|
\end{figure}
|
|
|
|
\section{Implementation of a demonstrator unit}
|
|
\label{sec-prototype}
|
|
|
|
To demonstrate the viability of our reset architecture we decided to implement a demonstrator system. In this
|
|
demonstrator we use JTAG to reset part of a commodity smart meter from an externally-connected reset controller. The
|
|
reset controller receives its commands over the grid frequency modulation system we outlined in this thesis. To keep
|
|
implementation cost low the reset controller is fed a simulation of a modulated grid frequency signal through a standard
|
|
\SI{3.5}{\milli\meter} audio jack\footnote{
|
|
By generously cutting two PCB traces the meter we chose to use can be easily modified to provide strong galvanic
|
|
separation between grid and main application microcontroller. With this modification we have to supply power to its
|
|
main application MCU externally along with the JTAG interface.
|
|
}. Measurement of actual grid frequency instead would simply require a voltage divider and depending on the setup an
|
|
analog optoisolator.
|
|
|
|
\subsection{Selecting a smart meter for demonstration purposes}
|
|
\label{sec-easymeter}
|
|
|
|
For our demonstrator to make sense we wanted to select a realistic reset target. In Germany where this thesis was
|
|
written a standards-compliant setup would consist of a fairly dumb smart meter and a smart meter gateway (SMGW)
|
|
containing all of the complex bidirectional protocol logic such as wireless or landline IP connectivity. The realistic
|
|
target for a setup in this architecture would be the components of an SMGW such as its communications modem or main
|
|
application processor. In the German architecture the smart meter does not even have to have a bi-directional data link
|
|
to the SMGW effectively mitigating any attack vector for remote compormise.
|
|
|
|
Despite these considerations we still chose to reset the application MCU inside smart meter for two reasons. One is that
|
|
SMGWs are much harder to come by on the second-hand market. The other is that SMGWs are a particular feature of the
|
|
German standardization landscape and in many other countries functions of an SMGW such as wireless protocol handling are
|
|
integrated into the meter itself (see e.g.\ \cite{honeywell01}).
|
|
|
|
In the end we settled on an Q3DA1002 three-phase 60A meter made by German manufacturer EasyMeter. This meter is typical
|
|
of what would be found in an average German household and can be acquired very inexpensively as new old stock on online
|
|
marketplaces.
|
|
|
|
The meter consists of a plastic enclosure with a transparent polycarbonate top part and a grey ABS bottom part that are
|
|
ultrasonically welded shut. In the bottom part of the case a PCB we call the \emph{measurement} board is potted in
|
|
epoxide resin (see Figure \ref{easymeter_composites}). This PCB contains three separate energy measurement ASICs for the
|
|
three phases (see Figure \ref{easymeter_detail_xrays}). It also contains a capacitive dropper power supply for the meter
|
|
circuitry and external modules such as a SMGW. The measurement board through three infrared links (one per phase)
|
|
communicates with a smaller unpotted PCB we call the \emph{display} board in the top of the case. This PCB handles
|
|
measurement logging and aggregation, controls a small segment LCD displaying totals and handles the externally
|
|
accessible \si{\kilo\watt\hour} impulse LED and serial IR links.
|
|
|
|
The measurement board does not contain any logging or outside communication interfaces. All of that is handled on the
|
|
display board by a Texas Instruments MSP430F2350 application MCU. This is a 16-bit RISC MCU with \SI{16}{\kilo\byte}
|
|
flash and \SI{2}{\kilo\byte} SRAM\footnote{
|
|
The microcontroller might seem a bit overkill for such a simple application, but most of its \SI{16}{\kilo\byte}
|
|
program flash is in fact used. A casual glance with Ghidra shows that a large part of program flash is expended on
|
|
keeping multiple redundant copies of energy consumption aggregates including error recovery in case of data
|
|
corruption and some effort has even been made to guard against data corruption using simple non-cryptographic
|
|
checksums. Another large part of the MCU's firmware handles data transmission over the meter's externally accessible
|
|
IR link through Smart Message Language\cite{bsi-tr-03109-1-IVb}.
|
|
}. There is an I2C EEPROM that is used in conjunction with the microcontroller's internal \SI{256}{\byte} data flash to
|
|
keep redundant copies of energy consumption aggregates. On the side of the base board is a 14-pin header containing both
|
|
a standard TI MSP430 JTAG pinout and an UART serial link for debugging. Conveniently the JTAG port was left enabled by
|
|
fuse in our particular production unit.
|
|
|
|
We chose to use this MSP430 series application MCU as our reset target. Though in this particular unit compromise is
|
|
impossible due to a lack of bi-directional communication links some of its sister models do contain bidirectional
|
|
communication links\cite{easymeter01} making compromise through communication interfaces at least a theoretical
|
|
possibility. In other countries meters with a similar architecture to the Q3DA1002 commonly include complex protocol
|
|
logic as part of the meter itself\cite{honeywell01,ifixit01}. As an example, the Honeywell REX2 uses a Maxim Integrated
|
|
71M6541 main application microcontroller along with a Texas Instruments CC1000 series radio transceiver and is
|
|
advertised to support both over-the-air firmware upgrades and a remotely accessible ``service control switch''.
|
|
|
|
% TODO add pics of the intact easymeter and of the one with the safety reset0r hooked up
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics[width=0.6\textwidth]{resources/easymeter_board_composite.jpg}
|
|
\label{easymeter_display_board_composite}
|
|
\caption{
|
|
\footnotesize
|
|
Optical composite image of the display and data logging board in the top of the case. The six pins at the
|
|
top are the SPI chip-on-glass segment LCD. Of the eight pads on the left six are unused and two carry the
|
|
auxiliary power supply from the measurement board below. The bottom right section contains the
|
|
\si{\kilo\watt\hour} impulse LED and the angled IR communication LED. The flying wires
|
|
connect to the 14-pin JTAG and serial debug header.
|
|
}
|
|
\end{subfigure}
|
|
\begin{subfigure}{\textwidth}
|
|
\vspace{1cm}
|
|
\centering
|
|
\includegraphics[width=0.8\textwidth]{resources/easymeter_baseboard_composite.jpg}
|
|
\label{easymeter_measurement_board_composite}
|
|
\caption{
|
|
\footnotesize
|
|
Composite microfocus x-ray image of the potted measurement module in the bottom of the case. The ovals on
|
|
the top left and right are power supply and data jumper connections for external modules such as SMGW
|
|
interfaces. The bright parts at the bottom are the massive screw terminals with integrated current shunts.
|
|
The circuitry right of the three independent measurement channels is the power supply circuit for the
|
|
display board.
|
|
}
|
|
\end{subfigure}
|
|
|
|
\caption{
|
|
Composite images of the circuit boards inside the EasyMeter Q3DA1002 ``smart'' electricity meter used in our
|
|
demonstration.
|
|
}
|
|
\label{easymeter_composites}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{subfigure}{0.45\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{resources/easymeter_baseboard_channel.jpg}
|
|
\label{easymeter_channel_xray}
|
|
\caption{Microfocus x-ray of one channel's data acquisition circuit.}
|
|
\end{subfigure}\hspace*{5mm}
|
|
\begin{subfigure}{0.45\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{resources/easymeter_baseboard_powersupply.jpg}
|
|
\label{easymeter_powersupply_xray}
|
|
\caption{Microfocus x-ray of the auxiliary power supply.}
|
|
\end{subfigure}
|
|
|
|
\caption{
|
|
Microfocus x-rays of major sections of the EasyMeter Q3DA1002 measurement board.
|
|
}
|
|
\label{easymeter_detail_xrays}
|
|
\end{figure}
|
|
|
|
\subsection{Firmware implementation}
|
|
\label{sec-demo-fw-impl}
|
|
|
|
We based our safety reset demonstrator firmware on the grid frequency sensor firmware we developed in sec.\
|
|
\ref{sec-fsensor}. We implemented DSSS demodulation by translating the python prototype code we developed in sec.\
|
|
\ref{sec-ch-sim} to embedded C code. After validating the C translation in extensive simulations we integrated our code
|
|
with a reed-solomon implementation and a libsodium-based implementation of the cryptographic protocol we designed in
|
|
sec.\ \ref{sec-crypto}. To reprogram the target MSP430 microcontroller we ported over the low-level bitbang JTAG driver
|
|
of \texttt{mspdebug}\footnote{\url{https://github.com/dlbeer/mspdebug}}. See Figure \ref{fig_demo_sig_schema} for a
|
|
schematic overview of signal processing in our demonstrator.
|
|
|
|
For all computation-heavy high-level modules of our firmware such as the DSSS demodulator or the grid frequency
|
|
estimator we wrote test fixtures that allow the same code that runs on the microcontroller to be executed on the host
|
|
for testing. These test fixtures are very simple C programs that load input data from a file or the command line, run
|
|
the algorithm and print results on standard output.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{resources/prototype_schema}
|
|
\caption{The signal processing chain of our demonstrator.}
|
|
\label{fig_demo_sig_schema}
|
|
\end{figure}
|
|
|
|
\section{Grid frequency modulation emulation}
|
|
|
|
To emulate a modulated grid frequency signal we superimposed a DSSS-modulated signal at the proper amplitude with
|
|
synthetic grid frequency noise generated according to the measurements we took in sec. \ref{sec-fsensor}. In this
|
|
primitive simulation we do not simulate the precise impulse response of the grid to a DSSS-modulated stimulus signal.
|
|
Our results still serve to illustrate the possibility of data transmission in this manner this impulse response can be
|
|
compensated for at the transmitter by selecting appropriate modulation parameters (e.g. chip rate and amplitude) and at
|
|
the receiver by equalization with a matched filter.
|
|
|
|
\section{Experimental results}
|
|
|
|
After extensive simulations and testing of the individual modules of our solution we proceeded to conduct a real-world
|
|
experiment. We tried the demonstrator setup with an emulated noisy DSSS signal in real-time. Our experiment went without
|
|
any issues and the firmware implementation correctly reset the demonstrator's meter. We were happy to see that our
|
|
extensive testing paid off: The demonstrator setup worked on its first try.
|
|
% FIXME add pictures of the finished demo setup in action
|
|
% FIXME maybe add an SER curve here?
|
|
|
|
\section{Lessons learned}
|
|
|
|
Before settling on the commercial smart meter we first tried to use an EVM430-F6779 smart meter evaluation kit made by
|
|
Texas Instruments. This evaluation kit did not turn out well for two main reasons. One, it shipped with half the case
|
|
missing and no cover for the terminal blocks. Because of this some work was required to maintain electrical safety.
|
|
Even after mounting it in an electrically safe manner since the main MCU is not isolated from the grid and the JTAG port
|
|
is also galvanically coupled the safety reset controller prototype would also have to be galvanically isolated to not
|
|
pose an electrical safety risk. The second issue we ran into was that the EVM430-F6779 is based around an MSP430F6779
|
|
microcontroller. This microcontroller is a rather large part within the MSP430 series and uses a particularly new
|
|
revision of the CPU core and associated JTAG peripheral that are incompatible with all MSP430 programmers we tried to
|
|
use on it. \texttt{mspdebug} does not have support for it and porting TI's own JTAG programmer reference sources did not
|
|
yield any results either. Finally we tried an USB-based programmer made by TI themselves that turned out to either have
|
|
broken firmware or a hardware defect, leading to it frequently re-enumerating on the USB.
|
|
|
|
Overall our initial assumption that a development kit would certainly be easier to program than a commercial meter did
|
|
not prove to be true. Contrary to our expectations the commercial meter had JTAG enabled allowing us to easily read out
|
|
its stock firmware without needing to reverse-engineer vendor firmware update files or circumventing code protection
|
|
measures. The fact that its firmware was only available in its compiled binary form was not much of a hindrance as it
|
|
proved not to be too complex and all we wanted to know could be found out with just a few hours of digging in Ghidra.
|
|
|
|
In the firmware development phase our approach of testing every module individually (e.g. DSSS demodulator, Reed-Solomon
|
|
decoder, grid frequency estimation) proved to be very useful. In particular debugging benefited greatly from being able
|
|
to run a couple thousand tests within seconds. In case of our DSSS demodulator this modular testing and simulation
|
|
architecture allowed us to simulate many thousand runs of our implementation on test data and directly compare it to our
|
|
Jupyter/Python prototype (see Figure \ref{fw_proto_comparison}). Since we spent more time polishing our embedded C
|
|
implementation it turned out to perform much better than our initial python prototype. At the same time it shows
|
|
fundamentally similar response to its parameters. One significant bug we fixed in the embedded C version is the python
|
|
version's tendency towards incorrect decodings at even very large amplitudes.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics[trim={0 4cm 0 0},clip]{../lab-windows/fig_out/dsss_thf_amplitude_56_jupyter_impl}
|
|
\caption{Python prototype.}
|
|
\end{subfigure}
|
|
\begin{subfigure}{\textwidth}
|
|
\centering
|
|
\includegraphics[trim={0 4cm 0 0},clip]{../lab-windows/fig_out/dsss_thf_amplitude_56_fw_impl}
|
|
\caption{Embedded C implementation.}
|
|
\end{subfigure}
|
|
|
|
\caption{
|
|
Symbol error rate plots versus threshold factor for both our python prototype (above) and our firmware
|
|
implementation of our demodulation algorithm. Note the slightly different threshold factor color scales. Cf.\
|
|
Figure \ref{dsss_thf_amplitude_5678}.
|
|
}
|
|
\label{fw_proto_comparison}
|
|
\end{figure}
|
|
|
|
In accordance with our initial estimations we did not run into any code space nor computation bottlenecks for chosing
|
|
floating-point emulation instead of porting over our algorithms to fixed-point calculations. The extremely slow sampling
|
|
rate of our systems makes even heavyweight processing such as FFT or our rather brute-force dynamic programming approach
|
|
to DSSS demodulation possible well within performance constraints.
|
|
|
|
Compiled code size of our firmware implementation is slightly larger than we would like at around \SI{64}{\kilo\byte}
|
|
for our firmware image including everything except the target microcontroller firmware image. See appendix
|
|
\ref{symbol_size_chart} for a graph illustrating the contribution of various parts of the signal processing toolchain to
|
|
this total. Overall the most heavy-weight operations by far are the SHA512 implementation from libsodium and the FFT
|
|
from ARM's CMSIS signal processing library.
|
|
|
|
\chapter{Future work}
|
|
|
|
\section{Precise grid characterization}
|
|
|
|
We based our simulations on a linear relationship between generation/consumption power imbalance and grid frequency.
|
|
Our literature study suggests that this is an appropriate first-order approximation\cite{crastan03}. We kept modulation
|
|
bandwidth in our simulations inside a \SIrange{1000}{100}{\milli\hertz} frequency band that we reason is most likely to
|
|
exibit this linear behavior in practice. At lower frequencies primary control kicks in. With the frequency delta
|
|
thresholds specified for primary control systems\cite{entsoe04} this will likely lead to significant non-linear effects.
|
|
At higher frequencies grid frequency estimation at the receiver becomes more complex. Higher frequencies also come
|
|
close to modes of mechanical oscillation in generators (usually at \SI{5}{\hertz} and above\cite{crastan03}).
|
|
|
|
An analysis of the above concerns can be performed using dynamic grid simulation models\cite{semerow01,entsoe05}.
|
|
Presumably out of safety concerns these models are only available under non-disclosure agreements. Integrating
|
|
NDA-encumbered results stemming from such a model in an open-source publication such as this one poses a logistical
|
|
challenge which is why we decided to leave this topic for a separate future work. After detailed model simulation we
|
|
ultimately aim to validate our results experimentally. Assuming linear grid behavior even under very small disturbances
|
|
a small-scale experiment is an option. Such a small-scale experiment would require very long integration times.
|
|
|
|
Given a frequency characteristic of \SI{30}{\giga\watt\per\hertz} a stimulus of \SI{10}{\kilo\watt} yields $\Delta f =
|
|
\SI{0.33}{\micro\hertz}$. At an estimated \SI{20}{\milli\hertz} of RMS noise over a bandwidth of interest this results
|
|
in an SNR slightly better than \SI{-50}{\decibel}. The correlation time necessary to offset this with DSSS processing
|
|
gain at a chip rate of \SI{1}{\baud} would be in the order of days. With such long correlation times clock stability
|
|
starts to become a problem as during correlation transmitter and receiver must maintain close phase alignment w.r.t.\
|
|
one chip period. A $\leq \SI{10}{\degree}$ phase difference requirement over this period of time would translate into
|
|
clock stability better than \SI{10}{ppm}. Though certainly not impossible to achieve this does pose an engineering
|
|
challenge.
|
|
|
|
A possible way to maintain clock alignment is to use grid frequency itself as a reference. Instead of keying the DSSS
|
|
modulator/demodulator on a local crystal oscillator, chip timings would be described in fractions of a mains voltage
|
|
cycle. This would track grid frequency variations synchronously at both ends and would maintain phase alignment even
|
|
over long periods of time at cost of a slight increase in system complexity.
|
|
|
|
\section{Technical standardization}
|
|
|
|
The description of a safety reset system provided in this work could be translated into a formalized technical standard
|
|
with relatively low effort. Our system is very simple compared to e.g. a full smart meter communication standard and
|
|
thus can conceivably be described in a single, concise document. The much more complicated side of standardization would
|
|
be the standardization of the backend operation including key management, coördination and command authorization.
|
|
|
|
\section{Regulatory adoption}
|
|
|
|
Since the proposed system adds significant cost and development overhead at no immediate benefit to either consumer or
|
|
utility company it is unlikely that it would be adopted voluntarily. Market forces limit what long-term planning utility
|
|
companies can do. An advanced mitigation such as this one might be out of their reach on their own and might require
|
|
regulatory intervention to be implemented. To regulatory authorities a system such as this one provides a primitive to
|
|
guard against attacks. Due to the low-level approach our system might allow a regulatory authority to restore meters to
|
|
a safe state without the need of fine-grained control of implementation details such as application network protocols.
|
|
|
|
A regulatory authority might specify that all smart meters must use a standardized reset controller that on command
|
|
resets to a minimal firmware image that disables external communication, continues basic billing functions and enables
|
|
any disconnect switches. This system would enable the \emph{reset authority} to directly preempt a large-scale attack
|
|
irrespective of implementation details of the various smart meter implementations.
|
|
|
|
Cryptographic key management for the smart reset system is not much different to the management of highly privileged
|
|
signing keys as they are used in many other systems already. If the safety reset system is implemented with a
|
|
regulatory authority as the \emph{reset authority} they would likely be able to find a public entity that is already
|
|
managing root keys for other government systems to also manage safety reset keys. Availability and security requirements
|
|
of safety reset keys do not differ significantly from those for other types of root keys.
|
|
|
|
\section{Zones of trust}
|
|
|
|
In our design, we opted for a safety reset controller in form of a separate micocontroller entirely separate from
|
|
whatever application microcontroller the smart meter design is already using. This design nicely separates the meter
|
|
into an untrusted application (the core microcontroller) and the trusted reset controller. Since the interface between
|
|
the two is simple and logically one-way, it can be validated to a high standard of security.
|
|
|
|
Despite these security benefits, the cost of such a separate hardware device might prove high in a mass-market rollout.
|
|
In this case, one might attempt to integrate the reset controller into the core microcontroller in some way. Primarily,
|
|
there would be two ways to accomplish this. One is a solution that physically integrates an additional microcontroller
|
|
core into the main application microcontroller package either as a submodule on the same die or as a separate die in a
|
|
multi-chip module (MCM) with the main application microcontroller. A full-custom solution integrating both on a single
|
|
die might be a viable path for very large-scale deployments, but will most likely be too expensive in tooling costs
|
|
alone to justify its use. More likely for a medium- to large-scale deployment (millions of meters) would be a MCM
|
|
integrating an off-the-shelf smart metering microcontroller die with the reset controller running on another, much
|
|
smaller off-the-shelf microcontroller die. This solution might potentially save some cost compared to a solution using a
|
|
discrete microcontroller for the reset controller.
|
|
|
|
The more likely approach to reducing cost overhead of the reset controller would be to employ virtualization
|
|
technologies such as ARM's TrustZone in order to incorporate the reset controller firmware into the application firmware
|
|
on the same processor core without compromising the reset controller's security or disturbing the application firmware's
|
|
operation.
|
|
|
|
TrustZone is a virtualization technology that provides a hardware-assisted privileged execution domain on at least one
|
|
of the microcontroller's cores. In traditional virtualization setups a privileged hypervisor is managing several
|
|
unprivileged applications sharing resources between them. Separation between applications in this setup is longitudinal
|
|
between adjacent virtual machines. Two applications would both be running in unprivileged mode sharing the same cpu and
|
|
the hypervisor would merely schedule them, configure hardware resource access and coördinate communication. This
|
|
longitudinal virtualization simplifies application development since from the application's perspective the virtual
|
|
machine looks very similar to a physical one. In addition, in general this setup reciprocally isolates two applications
|
|
with neither one being able to gain control over the other.
|
|
|
|
In contrast to this, a TrustZone-like system in general does not provide several application virtual machines and
|
|
longitudinal separation. Instead, it provides lateral separation between two domains: The unprivileged application
|
|
firmware and a privileged hypervisor. Application firmware may communicate with the hypervisor through defined
|
|
interfaces but due to TrustZone's design it need not even be aware of the hypervisor's existence. This makes a perfect
|
|
fit for our reset controller. The reset controller firmware would be running in privileged mode and without exposing any
|
|
communication interfaces to application firmware. The application firmware would be running in unprivileged mode
|
|
without any modification. The main hurdles to the implementation to a system like this are the requirement for a
|
|
microcontroller providing this type of virtualization on the one hand and the complexity of correctly employing this
|
|
virtualization on the other hand. Virtualization systems such as TrustZone are still orders of magnitude more complex to
|
|
correctly configure than it is to simply use separate hardware and secure the interfaces in between.
|
|
|
|
\chapter{Conclusion}
|
|
|
|
In this thesis we have developed an end-to-end design of a reset system to restore smart meters to a safe operating
|
|
state during an ongoing large-scale cyberattack. We have laid out the fundamentals of smart metering infrastructure and
|
|
elaborated the need for an out-of-band method to reset device firmware due to the large attack surface of this complex
|
|
firmware. To allow our system to be triggered even in the middle of a cyberattack we have developed a broadcast data
|
|
transmission system based on intentional modulation of global grid frequency. We have developed the theoretical
|
|
foundations of the process based on an established model of inertial grid frequency response to load variations and
|
|
shown the veracity of our end-to-end design through extensive simulations. To properly base these simulations we have
|
|
developed a grid frequency measurement methodology comprising of a custom-designed hardware device for electrically safe
|
|
data capture and a set of software tools to archive and process captured data. Our simulations show good behavior of our
|
|
broadcast communication system and give an indication that coöperating with a large consumer such as an aluminium
|
|
smelter would be a feasible way to set up a transmitter at very low hardware overhead. Based on our broadcast primitive
|
|
we have developed a cryptographic protocol ready for embedded implementation in resource-constrained systems that allows
|
|
quick (response time less than 30 minutes) triggering of all or a selected subset of devices. Finally, we have
|
|
experimentally validated our system using simulated grid frequency data in a demonstrator setup based on a commercial
|
|
microcontroller as our safety reset controller and an off-the-shelf smart meter. We have laid out a path for further
|
|
research and standardization related to our system.
|
|
|
|
\newpage
|
|
|
|
%\nocite{*} TODO: check unused references
|
|
\printbibliography[heading=bibintoc]
|
|
\newpage
|
|
|
|
\appendix
|
|
%\chapter{Transcripts of Jupyter notebooks used in this thesis}
|
|
|
|
%\includenotebook{Grid frequency estimation}{grid_freq_estimation}
|
|
%\includenotebook{Grid frequency estimation validation against ROCOF test suite}{freq_meas_validation_rocof_testsuite}
|
|
%\includenotebook{Frequency sensor clock stability analysis}{gps_clock_jitter_analysis}
|
|
%\includenotebook{DSSS modulation experiments}{dsss_experiments-ber}
|
|
|
|
\chapter{Frequency sensor schematics}
|
|
\fancyhead[C]{Frequency sensor schematics (1/3)}
|
|
\fancyfoot[C]{}
|
|
\fancyhead[R]{\thepage}
|
|
\includepdf[fitpaper,landscape,pagecommand={\thispagestyle{fancy}}]{resources/platform-export-pg1.pdf}
|
|
\fancyhead[C]{Frequency sensor schematics (2/3)}
|
|
\includepdf[fitpaper,pagecommand={\thispagestyle{fancy}}]{resources/platform-export-pg2.pdf}
|
|
\fancyhead[C]{Frequency sensor schematics (3/3)}
|
|
\includepdf[fitpaper,landscape,pagecommand={\thispagestyle{fancy}}]{resources/platform-export-pg3.pdf}
|
|
\fancyfoot[C]{\thepage}
|
|
|
|
%\chapter{Firmware source code excerpts}
|
|
%\section{DMA-backed ADC capture (adc.c)}
|
|
%\inputminted[fontsize=\footnotesize,linenos,firstline=18,lastline=115,breaklines]{C}{../gm_platform/fw/adc.c}
|
|
%
|
|
%\section{Frequency sensor packetized serial interface}
|
|
%\subsection{serial.c}
|
|
%\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../gm_platform/fw/serial.c}
|
|
%\subsection{packet\_interface.c}
|
|
%\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../gm_platform/fw/packet_interface.c}
|
|
%\subsection{cobs.c}
|
|
%\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../gm_platform/fw/cobs.c}
|
|
%\subsection{Host data logging utility (tw\_test.py)}
|
|
%\inputminted[fontsize=\footnotesize,linenos,breaklines]{python}{../gm_platform/fw/tw_test.py}
|
|
%
|
|
%\section{Frequency estimation (freq\_meas.c)}
|
|
%\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../controller/fw/src/freq_meas.c}
|
|
%\section{DSSS demodulation (dsss\_demod.c)}
|
|
%\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../controller/fw/src/dsss_demod.c}
|
|
%\section{Cryptographic protocol handling}
|
|
%\subsection{protocol.c}
|
|
%\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../controller/fw/src/protocol.c}
|
|
%\subsection{crypto.c}
|
|
%\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../controller/fw/src/crypto.c}
|
|
|
|
|
|
\chapter{Demonstrator firmware symbol size map}
|
|
\label{symbol_size_chart}
|
|
\includepdf[fitpaper]{resources/safetyreset-symbol-sizes.pdf}
|
|
|
|
% FIXME
|
|
%\chapter{Economic viability of countermeasures}
|
|
%\section{Attack cost}
|
|
%\section{Countermeasure cost}
|
|
%\section{Conclusion}
|
|
|
|
% FIXME maybe include a standard for the technical side of a safety reset system here, e.g. in the style of an IETF draft?
|
|
|
|
\end{document}
|