master-thesis/ma/safety_reset.tex

\documentclass[12pt,a4paper,notitlepage]{report}
\usepackage[utf8]{inputenc}
\usepackage[a4paper,textwidth=17cm, top=2cm, bottom=3.5cm]{geometry}
\usepackage[T1]{fontenc}
\usepackage[
    backend=biber,
    style=numeric,
    natbib=true,
    url=true,
    doi=true,
    eprint=false
    ]{biblatex}
\addbibresource{safety_reset.bib}
\usepackage{amssymb,amsmath}
\usepackage{listings}
\usepackage{eurosym}
\usepackage{wasysym}
\usepackage{amsthm}
\usepackage{tabularx}
\usepackage{multirow}
\usepackage{multicol}
\usepackage{tikz}

\usetikzlibrary{arrows}
\usetikzlibrary{backgrounds}
\usetikzlibrary{calc}
\usetikzlibrary{decorations.markings}
\usetikzlibrary{decorations.pathreplacing}
\usetikzlibrary{fit}
\usetikzlibrary{patterns}
\usetikzlibrary{positioning}
\usetikzlibrary{shapes}

\usepackage{hyperref}
\usepackage{tabularx}
\usepackage{commath}
\usepackage{graphicx,color}
\usepackage{subcaption}
\usepackage{float}
\usepackage{footmisc}
\usepackage{array}
\usepackage[underline=false]{pgf-umlsd}
\usetikzlibrary{calc}
%\usepackage[pdftex]{graphicx,color}
%\usepackage{epstopdf}
% Needed for murks.tex
\usepackage{setspace}
\usepackage[draft=false,babel,tracking=true,kerning=true,spacing=true]{microtype} % optischer Randausgleich etc.
% For german quotation marks

\newcommand{\foonote}[1]{\footnote{#1}}
\newcommand{\degree}{\ensuremath{^\circ}}
\newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}

\begin{document}

% Beispielhafte Nutzung der Vorlage für die Titelseite (bitte anpassen):
\input{murks}
\titel{FIXME} % Titel der Arbeit
\typ{Masterarbeit} % Typ der Arbeit:  Diplomarbeit, Masterarbeit, Bachelorarbeit
\grad{Master of Science (M. Sc.)} % erreichter Akademischer Grad
% z.B.: Master of Science (M. Sc.), Master of Education (M. Ed.), Bachelor of Science (B. Sc.), Bachelor of Arts (B. A.), Diplominformatikerin
\autor{Jan Sebastian Götte}
\gebdatum{Aus datenschutzrechtlichen Gründen nicht abgedruckt} % Geburtsdatum des Autors
\gebort{Aus datenschutzrechtlichen Gründen nicht abgedruckt} % Geburtsort des Autors
\gutachter{Prof. Dr. Björn Scheuermann}{FIXME} % Erst- und Zweitgutachter der Arbeit
\mitverteidigung % entfernen, falls keine Verteidigung erfolgt
\makeTitel
\selbstaendigkeitserklaerung{31.03.2020}
\newpage

% Hier folgt die eigentliche Arbeit (bei doppelseitigem Druck auf einem neuen Blatt):
\tableofcontents
\newpage

\chapter{Introduction}
\section{Structure and operation of the electrical grid}
\subsection{Structure of the electrical grid}
\subsubsection{Generators and loads}
\subsubsection{Transformers}
\subsubsection{Tie lines}

\subsection{Operational concerns}
\subsubsection{Modelling the electrical grid}
\subsubsection{Generator controls}
\subsubsection{Load shedding}
\subsubsection{System stability}
\subsubsection{Power System Stabilizers}

\subsubsection{Smart metering}

\section{Smart meter technology}
\subsubsection{Common components}

Smart meters usually are built around a standard microcontroller. \label{sm-cpu}
\subsubsection{Cryptographic coprocessors}
\subsubsection{Physical structure}
\subsubsection{Physical installation}

\section{Regulatory frameworks around the world}
\subsection{International standards}
\subsection{Regulations in Europe}
\subsection{The regulatory situation in Germany}
\subsection{The regulatory situation in France}
\subsection{The regulatory situation in the UK}
\subsection{The regulatory situation in Italy}
\subsection{The regulatory situation in northern America}
\subsection{The regulatory situation in Japan}
\subsection{Common themes}

\section{Security in smart grids}
The smart grid in practice is nothing more or less than an aggregation of embedded control and measurement devices that
are part of a large control system. This implies that all the same security concerns that apply to embedded systems in
general also apply to most components of a smart grid in some way. Where programmers have been struggling for decades
now with input validation\cite{leveson01}, the same potential issue raises security concerns in smart grid scenarios as
well\cite{mo01, lee01}.  Only, in smart grid we have two complicating factors present: Many components are embedded
systems, and as such inherently hard to update. Also, the smart grid and its control algorithms act as a large
(partially-)distributed system, making problems such as input validation or authentication difficult to
implement\cite{blaze01} and adding a host of distributed systems problems on top\cite{lamport01}.

Given that the electrical grid is a major piece of essential infrastructure in modern civilization, these problems
amount to significant issues in practice. Attacks on the electrical grid may have grave consequences\cite{lee01} all the
while the long maintenance cycles of various components make the system slow to adapt. Thus, components for the smart
grid need to be built to a much higher standard of security than most consumer devices to ensure they live up to
well-funded attackers even decades down the road. This requirement intensifies the challenges of embedded security and
distributed systems security among others that are inherent in any modern complex technological system.

A point we will not consider in much depth is theft of electricity. A large part of the motivation of the introduction
of smart meters seems to be % TODO weak statement
to reduce the level of fraud by consumers. Academic papers tend to either focus on other benefits such as generation
efficiency gains through better forecasting or try to rationalize the funamentally anti-consumer nature of smart
metering with strenuous claims of ``enormous social benefits''\cite{mcdaniel01}. We will entirely focus on grid
stability and discard electricity theft in the context of this paper for two reasons: One, billing inaccuracies of
electricity companies are of very low urgency compared to grid stability, and the one is a precondition for the other.
Two, utility companies can already put strong bounds on the amount of theft by simply cross-refrencing meter readings
against trusted readings from upstream sections of the grid.  This capability works even without smart meters and only
gains speed from smart meters, just as the old exploit of bypassing the meter with a section of wire can't be prevented
like this.

Due to these bounds on its volume, electricity theft using smart meter hacking would not scale. Hackers would simply be
rooted up one by one with no damage to consumers and very limmited damage to utility companies. Damage in these
scenarios would be a far cry from the efficiency of an exponentially growing botnet.

\subsection{Smart grid components as embedded devices}
A fundamental challenge in smart grid implementations is the central role smart electricity meters play. Smart meters
are used both for highly-granular load measurement and (in some countries) load switching\cite{zheng01}.
Smart electricity meters are effectively consumer devices. They are built down to a certain price point that is
measured by the burden it puts on consumers and that is generally fixed by regulatory authorities. % FIXME cite
This requirement precludes some hardware features such as the use of a standard hardened software environment on a
high-powerded embedded system (such as a hypervirtualized embedded linux setup) that would both increase resilience
against attacks and simplify updates. Combined with the small market sizes in smart grid deployments
\footnote{
    Most vendors of smart electricity meters only serve a handful of markets. For the most part, smart meter development
    cost lies in the meter's software % TODO cite?
    There exist multiple competing standards applicable to various parts of a smart electricity meter. In addition,
    most countries have their own certification regimen\cite{cenelec01}. This complexity creates a large development
    burden for new market entrants\cite{perez01}.
}
this produces a high cost pressure on the software development process for smart electricity meters.

\subsection{The state of the art in embedded security}
Embedded security generally is much harder than security of higher-level systems. This is due to a combination of the
unique constraints of embedded devices (hard to update, usually small quantity) and their lack of capabilities
(processing power, memory protection functions, user interface devices). Even very well-funded companies continue to
have serious problems securing their embedded systems. A spectacular example of this difficulty is the recently-exposed
flaw in Apple's iPhone SoC first-stage ROM bootloader\footnote{
    Modern system-on-chips integrate one or several CPUs with a multitude of peripherals, from memory and DMA
    controllers over 3D graphics accelerators down to general-purpose IO modules for controlling things like indicator
    LEDs. Most SoCs boot from one of several boot devices such as flash memory, ethernet or USB according to a
    configuration set e.g. by connecting some SoC pins a certain way or set by device-internal write-only fuse bits.

    Physically, one of the processing cores of the SoC (usually one of the main CPU cores) is connected such that it is
    taken out of reset before all other devices, and is tasked with switching on and configuring all other devices of
    the SoC. In order to run later intialization code or more advanced bootloaders, this core on startup runs a very
    small piece of code hard-burned into the SoC in the factory. This ROM loader initializes the most basic peripherals
    such as internal SRAM memory and selects a boot device for the next bootloader stage.

    Apple's ROM loader performs some authorization checks, to ensure no unauthorized software is loaded. The present
    flaw allows an attacker to circumvent these checks, booting code not authorized by Apple on a USB-connected iPhone,
    compromising Apple's chain of trust from ROM loader to userland right at its root.
}, that allows a full compromise of any iPhone before the iPhone X. iPhone 8, one of the affected models, is still being
manufactured and sold by Apple today\footnote{
    i.e. at the time this paragraph was written, on %FIXME
}. In another instance, Samsung put a flaw in their secure-world firmware used for protection of sensitive credentials
in their mobile phone SoCs in % FIXME year % .
If both of these very large companies have trouble securing parts of their secure embedded software stacks measuring a
mere few hundred bytes in Apple's case or a few kilobytes in Samsung's, what is a smart electricity meter manufacturer
to do? For their mass-market phones, these two companies have R\&D budgets that dwarf some countries' national budgets.
% FIXME hyperbole?
% FIXME cite

Since thorough formal verification of code is not yet within reach for either large-scale software development or
code heavy in side-effects such as embedded firmware or industrial control software\cite{pariente01}
the two most effective measures for embedded security is reducing the amount of code on one hand, and labour-intensively
checking and double-checking this code on the other hand. A smart electricity manufacturer does not have a say in the
former since it is bound by the official regulations it has to comply with, and will almost certainly not have sufficient
resources for the latter.
% FIXME expand?
% FIXME cite some figures on code size in smart meter firmware?

\subsection{Attack avenues in the smart grid}
If we model the smart grid as a control system responding to changes in inputs by regulating outputs, on a very high
level we can see two general categories of attacks: Attacks that directly change the state of the outputs, and attacks
that try to influence the outputs indirectly by changing the system's view of its inputs. The former would be an attack
such as one that shuts down a power plant to decrease generation capacity. The latter would be an attack such as one
that forges grid frequency measurements where they enter a power plant's control systems to provoke increasing
oscillation in the amount of power generated by the plant according to the control systems' directions.
% FIXME cite
% FIXME expand

\subsubsection{Communication channel attacks}
Communication channel attacks are attacks on the communication links between smart grid components. This could be
attacks on IP-connected parts of the core network or attacks on shared busses between smart meters and IP gateways in
substations. Generally, these attacks can be mitigated by securing the aforementioned communication links using modern
cryptography. IP links can be protected using TLS, and more low-level busses can be protected using more lightweight
Noise-based protocols. % FIXME cite
Cryptographic security transforms an attackers ability to manipulate communication contents into a mere denial of
service attack. Thus, in addition to cryptographic security safety under DoS conditions must be ensured to ensure
continued system performance under attacks. This safety property is identical with the safety required to withstand
random outages of components, such as communications link outages due to physical damage from storms, flooding etc.
% FIXME cite papers on attack impact, on coutermeasures and on attack realization
In general, attacks at the meter level may be hard to weaponize % may be -> weak statement?
since meters are used mostly for billing and forecasting purposes % FIXME cite
and for more critical grid control purposes there exist several additional layers of sensors above smart meters that
limit how much an attacker can falsify smart meter readings without the manipulation being obvious. In order for an
attack to have more far-reaching consequences the attacker would need to compromise additional grid
infrastructure\cite{kim01,kosut01}.

\subsubsection{Exploiting centralized control systems}
The type of smart grid attack most often cited in popular discourse, and to the author's knowledge % FIXME verify, cite
the only type that has so far been conducted in practice, is a direct attack on centralized control systems. In this
attack, computer components of control systems are compromised by the same techniques used to compromise any other kind
of computer system such as exploiting insecure services running on internet-exposed ports and using one compromised
system to compromised other systems connected with it through an ostensably secure internal network. These attacks are
very powerful as they yield the attacker direct control over whatever outputs the control systems are controlling. If an
attacker manages to compromise a power stations control computers, they may be able to influence generation output or
even cause an emergency shutdown. % FIXME

Despite their potentially large impact, these attacks are only moderately interesting from a scientific perspective. For
one, their mitigation mostly consists of a straightforward application of security practices well-known for decades.
Though there is room for the implementation of genuinely new, application-specific security systems in this field, the
general state of the art is lacking behind the rest of the computer industry such that the low-hanging fruit should take
priority. % FIXME cite this bold claim very properly

In addition, given political will these systems can readily be secured since there is only a comparatively small number
of them and driving a technician to every one of them in turn to install some security update is perfectly feasible.

\subsubsection{Control function exploits}
Control function exploits are attacks on the mathematical control loops used by the centralized control system. One
example of such an attack would be resonance attacks as described in \textcite{wu01}.
In this kind of attack, inputs from peripheral sensors indicating grid load to the centralized control system are
carefully modified to cause a disproportionally large oscillation in control system action. This type of attack relies
on complex resonance effects that arise when mechanical generators are electrically coupled. These resonances,
coloquially called ``modes'' are well-studied in power system engineering\cite{rogers01,grebe01,entsoe01}.
% FIXME: refer to section on stability control above here
Even disregarding modern attack scenarios, for stability electrical grids are designed with measures in place to dampen
any resonances inherent to grid structure. Still, requiring an accurate grid model these resonances are hard to analyze
and unlikely to be noiticed under normal operating conditions.

Mitigation of these attacks is most easily done by on the one hand ensuring unmodified sensor inputs to the control
systems in the first place, and on the other hand carefully designing control systems not to exhibit exploitable
behavior such as oscillations.
% FIXME cite mitigation approaches

\subsubsection{Endpoint exploits}
One rather interesting attack on smart grid systems is one exploiting the grid's endpoint devices such as smart
electricity meters\footnote{
    Though potentially this could also aim at other kinds of devices distributed on a large scale such as sensors in
    unmanned substations. % FIXME cite verify
}
These meters are deployed on a massive scale, with several thousand meters deployed for every substation.
% FIXME cite (this should be straightforward)
Thus, once compromised restoration to an uncompromised state can be potentially very difficult if it requires physical
access to thousands of devices hidden inaccessible in private homes.

By compromising smart electricity meters, an attacker can trivially forge the distributed energy measurements these
devices perform. In a best-case scenario, this might only affect billing and lead to customers being under- or
over-charged if the attack is not noticed in time. However, in a less ideal scenario the energy measurements taken by
these devices migth be used to inform the grid centralized control systems % FIXME cite
and a falsification of these measurements might lead to inefficiency.

In some countries and for some customers, these smart meters have one additional function that is highly useful to an
attacker: They contain high-current load switches to disconnect the entire household or business in case electricity
bills are left unpaid for a certain period. In countries that use these kinds of systems, the load disconnect is often
simply hooked up to one of the smart merter's central microcontroller's general-purpose IO pins, allowing anyone
compromising this microcontroller's firmware to actuate the load switch at will. % FIXME validate cite add pictures

Given control over a large number of network-connected smart meters, an attacker might thus be able to cause large-scale
disruptions of power consumption by repeatedly disconnecting and re-connecting a large number of consumers.
% FIXME cite some analysis of this
Combined with an attack method such as the resonance attack from \textcite{wu01}
that was mentioned above, this scenario poses a serious danger to grid stability.

% FIXME add small-scale load shedding for heaters etc.

\subsection{Attacker models in the smart grid}
\subsection{Practical attacks}
\subsection{Practical threats}
\subsection{Conclusion, or why we are doomed}
We can conclude that a compromise of a large number of smart electricity meters cannot be ruled out. The complexity of
network-connected smart meter firmware makes it exceedingly unlikely that it is in fact flawless. Large-scale
deployments of these devices under some circumstances such as where they are used with load disconnect relays make them
an attractive target for attackers interested in causing grid instability. The attacker model for these devices very
definitely includes enemy states, who have considerable resources at their disposal.

For a reasonable guarantee that no large-scale compromises of hard- and software built today will happen over a span of
some decades, we would have to radically simplify its design and limit attack surface. Unfortunately, the complexity of
smart electricity meter implementations mostly stems from the large list of requirements these devices have to conform
with. Additionally, standards have already been written and changes that reduce scope or functionality have become
exceedingly unlikely at this point.

A general observation with smart grid systems of any kind is that they comprise a zealous departure of the decentralized
control structure of yesterday's dumb grid and the advent of centralization at an enormous scale. This modern,
centralized infrastructure has been carefully designed to defend against malicious actors%FIXME cite
and all involved parties have an interest in keeping it secure. Still, like in any other system this centralization also
makes a very attractive target for attackers since an attacker can likewise employ this centralized control to their
goals. Fundamentally, decentralized systems tend to make attacks of any kind a lot more costly and one might question
whether security has truly been gained during smart grid rollout. % FIXME hot take maybe

\chapter{Restoring endpoint safety in an age of smart devices}
If as layed out in the previous paragraph we cannot rule out a large-scale compromise of smart energy meters, we have to
rephrase our claim to security. If we cannot rule out exploitation, we have to limit its impact. If we assume that we
cannot strip any functionality from smart meters since it may be required by standards or for enormous social
benefits\cite{mcdaniel01} % FIXME is sarcasm ok here?
all we can do is to flush out an attacker once they are in.

In a worst-case scenario an attacker would gain unconstrained code execution e.g. by exploiting a flaw in a network
protocol implentation. Since smart meters use standard microcontrollers that do not have advanced memory protection
functions (see pg. \ref{sm-cpu}), at this point we can assume the attacker has full control over the main
microcontroller. With this control they can actuate the load switch if present, transmit data through the device's
communication interfaces or use the user interface components such as LEDs and the LCD. Using the self-programming
capabilities of modern flash microcontrollers, an attacker may even gain persistency without much trouble. Note that in
systems separating cryptographic functions into some form of cryptographic module such as systems used in Germany
    % TODO list other countries as well? FIXME cite BSI standard requiring this
we can be optimistic and assume the attacker has not in fact compromised this cryptographic co-processor yet and does
not have access to any cryptographic secrets yet.

Given that the attacker has complete control over the meter's core microcontroller and given that due to cost
constraints we are bound to use whatever microcontroller the meter OEM has chosen for their design, we cannot rely on
software running on the core mircocontroller to restore system integrity.

Our solution to this problem is to add another, very small microcontroller to the smart meter design. This
microcontroller will contain a small piece of software to receive cryptographically authenticated commands from utility
companies and on demand reset the meter's core microcontroller to a known-good state. We have to assume the code in the
core controller's flash memory has been compromised, so our only option to flush out an attacker is to re-program the
core microcontroller in its entirety. We propose using JTAG to re-program the core microcontroller
    % TODO get terminology consistent. Is "core microcontroller" a good term here?
with a known-good firmware image read from a sufficiently large SPI flash connected to the reset controller. JTAG is
supported by most microcontrollers complex enough to end up in a smart meter design % TODO colloquialism
and given adequate documentation JTAG programming functionality can be ported to new microcontrollers with relatively
little work.

On the microcontroller side our solution requires the JTAG interface to be activated (i.e. not fused-shut) and for our
solution to work core microcontroller firmware must not be able to permanently disable the JTAG interface from within.
In microcontrollers that do not yet provide this functionality this is a minor change that could be added to a custom
microcontroller variant at low cost. On most microcontrollers keeping JTAG open should not interfere with code readout
protection. Code secrecy should be of no concern\cite{schneier01} here but besides security manufacturers have strong
preferences about this due to fear of copyright infringement.

\section{The theory of endpoint safety}
\label{sec_criteria}
In order to gain anything by adding our reset controller to the smart meter's already complex design we must satisfy two
interrelated conditions.
\begin{enumerate}
\item \textsc{security} means our reset controller itself does not have any remotely exploitable flaws
\item \textsc{safety} menas our reset controller will perform its job as intended
\end{enumerate}

Note that our \textsc{security} property includes only remote exploitation, and excludes any form of hardware attack.
Even though most smart meters provide some level of physical security, we do not wish to make any assumptions on this.
In the following section we will elaborate our attacker model and it will become apparent that sufficient physical
security to defend against all attackers in our model would be infeasible, and thus we will design our overall system
to remain secure even assuming some number of physically compromised devices.
% FIXME expand

\subsection{Attack characteristics}
The attacker model these two conditions must hold under is as follows. We assume three angles of attack: Attacks by the
customer themselves, attacks by an insider within the metering systems controlling utility company and lastly attacks
from third parties. Examples for these third parties are hobbyist hackers or outside cyber-criminals on the one hand,
but also other companies participating in the smart grid infrastructure besides the utility company such as intermediary
providers of meter-reading services.

Due to the critical nature of the electrical grid, we have to include hostile state actors in our attacker model. When
acting directly, these would be classified as third-party attackers by the above schema, but they can reasonably be
expected to be able to assume either of the other two roles as well e.g. through infiltration or bribery.
\textcite{fraunholz01} in their elaboration of their generalized attacker model give some classification of attackers
and provide a nice taxonomy of attacker properties. In their threat/capability rating, criminals are still considered
to have higher threat rating than state-sponsored attackers. The New York Times reported in 2016 that some states
recruit their hacking personnel in part from cyber-criminals. If this report is true, in a worst-case scenario we have
to assume a state-sponsored attacker to be the worst of both types. Comparing this against the other attacker types in
\textcite{fraunholz01}, this state-sponsored attacker is strictly worse than any other type in both variables. We are
left with a highly-skilled, very well-funded, highly intentional and motivated attacker.

Based on the above classification of attack angles and our observations on state-sponsored attacks, we can adapt
\textcite{fraunholz01} to our problem, yielding the following new attacker types:

\begin{enumerate}
    \item \textbf{Utility company insiders controlled by a state actor}
        We can ignore the other internal threats described in \textcite{fraunholz01} since an insider cooperating with a
        state actor is strictly worse in every respect.
    \item \textbf{State-sponsored external attackers}
        A state actor can obviously directly attack the system through the internet.
    \item \textbf{Customers controlled by a state actor}
        A state actor can very well compromise some customers for their purposes. They might either physically
        infiltrate the system posing as legitimate customers, or they might simply deceive or bribe existing customers
        into cooperation.
    \item \textbf{Regular customers}
        Though a hostile state actor might gain control of some number of customers through means such as voluntary
        cooperation, bribery, infiltration, they are limited in attack scale since they do not want to arouse premature
        attention. Though regular customers may not have the motivation, skill or resources of a state-sponsored
        attacker, potentially large numbers of them may try to attack a system out of financial incentives. To allow for
        this possibility, we consider regular customers separate from state actors posing as customers in some way.
\end{enumerate}

\subsection{Overall structural system security}
Considering overall security, we first introduce the \emph{reset authority}, a trusted party acting as the single
authority for issuing reset commands in our system. In practice this trusted party may be part of the utility company,
part of an external regulatory body or a hybrid setup requiring both to cooperate. We assume this party will be designed
to be secure against all of the above attacker types. The precise design of this trusted party is out of scope for this
work but we will list some practical suggestions on how to achieve security below. % FIXME do the list
% FIXME put up a large box on this limitation

Using an asymmetric cryptographic design centered around the \emph{reset authority}, we rule out all attacks except for
denial-of-service attacks on our system by any of the four attacker types. All reset commands in our system originate
from the \emph{reset authority} and are cryptographically secured to provide authentication and tamper detection.
Under this model, attacks on the electrical grid components between the \emph{reset authority} and the customer device
degrade into man-in-the-middle attacks. To ensure the \textsc{safety} criterion from \ref{sec_criteria} holds we must
% FIXME check whether this \ref displays as intended
make sure our cryptography is secure against man-in-the-middle attacks and we must try to harden the system against
denial-of-service attacks by the attacker types listed above. Given our attacker model we cannot fully guard against
this sort of attack but we can at least choose a commmunication channel that is resilient against denial of service
attacks under the above model.

Finally, we have to consider the issue of hardware security. We will solve the problem of physical attacks on some small
number of devices by simply not programming any secret information into these devices. This also simplifies hardware
production. From consideration in this work we explicitly rule out any form of supply-chain attack as
out-of-scope.
% FIXME include considerations on production testing somewhere (is the device working? is the right key programmed?)

\subsection{Complex microcontroller firmware}
The \textsc{security} property from \ref{sec_criteria} is in a large part reliant on the security of our reset
controller firmware. The best method to increase firmware security is to reduce attack surface by limiting external
interfaces as much as possible and by reducing code complexity as much as possible.
% FIXME formalize this as something like "Design Goal DG-023-42-1" ?
If we avoid the complexity of most modern microcontroller firmware we gain another benefit beyond implicitly reduced
attack surface: If the resulting design is small enough we may attempt formal verification of our security property.
Though formal verification tools are not yet suitable for highly complex tasks they are already barely adequate for
small amounds of code and simple interfaces.

\subsection{Modern microcontroller hardware}
Microcontrollers have gained enormously in both performance/efficiency as well as in peripheral support. Alas, these
gains have largely been driven by insatiable customer demand for faster, more powerful chips and for a long time
security has not been considered important outside of some specific niches such as smartcards. Traditionally a
microcontroller would spend its entire lifetime without ever being exposed to any networks. Though this trend has been
reversing with the increasing adoption of internet-of-things things % FIXME is this pun ok?
and more advanced security features have started appearing in general-purpose microcontrollers, most still lack even
basic functionality found in processors for computers or smartphones.

One of the components lacking from most microcontrollers is strong memory protection or even a memory mapping unit as
it is found in all modern computer processors and SoCs for applications such as smartphones. Without an MPU/MPU some
mitigations for memory safety violations cannot be implemented.  This and the absence of virtualization tools such as
ARM's TrustZone make hardening microcontroller firmware a big task.  It is very important to ensure memory safety in
microcontroller firmware through tools such as defensive coding, extensive testing and formal verification.

In our design we achieve simplicity on two levels: One, we isolate the very complex metering firmware from our reset
controller by having both run on separate microcontrollers. Two, we keep the reset controller firmware itself extremely
simple to reduce attack surface there.

\subsection{Regulatory and economical constraints}
\subsection{Safety vs. Security: Opting for restoration instead of prevention}


\subsection{Technical outline of a safety reset}

\section{Communication channels on the grid}
\subsection{Powerline communication systems and their use}
\subsection{Proprietary wireless systems}
\subsection{Landline IP}
\subsection{IP-based wireless systems}
\subsection{Frequency modulation as a communication channel}

For our system, we chose grid frequency modulation (henceforth GFC) as a low-bandwidth uni-directional communications channel.
Compared to traditional PLC GFC requires no additional hardware, works reliably throughout the grid and is harder to
manipulate by a malicious actor.
% FIXME \cite{urtasun01}

\subsubsection{The frequency dependance of grid frequency}
\subsubsection{Control systems coupled to grid frequency}
\subsubsection{Avoiding dangerous modes}
\subsubsection{Overall system parameters}
\subsubsection{An outline of practical implementation}

\section{From grid frequency to a reliable communications channel}
\subsection{Channel properties}
\subsection{Modulation and its parameters}
\subsection{Error-correcting codes}
\subsection{Cryptographic security}

\chapter{Practical implementation}
\section{Cryptographic validation}

\section{Data collection for channel validation}
\subsection{Frequency sensor hardware design}
\subsection{Frequency sensor measurement results}

\section{Channel simulation and parameter validation}

\section{Implementation of a demonstrator unit}

\section{Experimental results}

\section{Lessons learned}

\chapter{Future work}
\section{Technical standardization}
The description of a safety reset system provided in this work could be translated into a formalized technical standard
with relatively low effort. Our system is very simple compared to e.g. a full smart meter communication standard and
thus can conceivably be described in a single, concise document. The much more complicated side of standardization would
be the standardization of the backend operation including key management, coordination and command authorization.

\section{Regulatory adoption}
Since the proposed system adds significant cost and development overhead at no immediate benefit to either consumer or
utility company it is unlikely that it would be adopted voluntarily. Market forces limit what long-term planning utility
companies can do. An advanced mitigation such as this one might be out of their reach on their own and might require
regulatory intervention to be implemented. To regulatory authorities a system such as this one provides a powerful
primitive to guard against attacks. Due to the low-level approach our system might allow a regulatory authority to
restore meters to a safe state without the need of fine-grained control of implementation details such as application
network protocols.

A regulatory authority might specify that all smart meters must use a standardized reset controller that on command
resets to a minimal firmware image that disables external communication, continues basic billing functions and enables
any disconnect switches. This system would enable the \emph{reset authority} to directly preempt a large-scale attack
irrespective of implementation details of the various smart meter implementations.

Cryptographic key management for the smart reset system is not much different to the management of highly privileged
signing keys as they are used in many other systems already.  If the safety reset system is implemented with a
regulatory authority as the \emph{reset authority} they would likely be able to find a public entity that is already
managing root keys for other government systems to also manage safety reset keys. Availability and security requirements
of safety reset keys do not differ significantly from those for other types of root keys.

\section{Practical implementation}


\section{Zones of trust}
In our design, we opted for a safety reset controller
    % FIXME is "safety reset" the proper name here? We need some sort of branding, but is this here really about "safety"?
in form of a separate micocontroller entirely separate from whatever application microcontroller the smart meter design
is already using.  This design nicely separates the meter into an untrusted application (the core microcontroller) and
the trusted reset controller. Since the interface between the two is simple and logically one-way, it can be validated
to a high standard of security.

Despite these security benefits, the cost of such a separate hardware device might prove high in a mass-market rollout.
In this case, one might attempt to integrate the reset controller into the core microcontroller in some way. Primarily,
there would be two ways to accomplish this. One is a solution that physically integrates an additional microcontroller
core into the main application microcontroller package either as a submodule on the same die or as a separate die in a
multi-chip module (MCM) with the main application microcontroller. A full-custom solution integrating both on a single
die might be a viable path for very large-scale deployments, but will most likely be too expensive in tooling costs
alone to justify its use. More likely for a medium- to large-scale deployment (millions of meters) would be a MCM
integrating an off-the-shelf smart metering microcontroller die with the reset controller running on another, much
smaller off-the-shelf microcontroller die. This solution might potentially save some cost compared to a solution using a
discrete microcontroller for the reset controller.

The more likely approach to reducing cost overhead of the reset controller would be to employ virtualization
technologies such as ARM's TrustZone in order to incorporate the reset controller firmware into the application firmware
on the same chip without compromising the reset controller's security or disturbing the application firmware's
operation.

TrustZone is a virtualization technology that provides a hardware-assisted privileged execution domain on at least one
of the microcontrollers cores. In traditional virtualization setups a privileged hypervisor is managing several
unprivileged applications sharing resources between them. Separation between applications in this setup is longitudinal
between adjacent virtual machines. Two applications would both be running in unprivileged mode sharing the same cpu and
the hypervisor would merely schedule them, configure hardware resource access and coordinate communication. This
longitudinal virtualization simplifies application development since from the application's perspective the virtual
machine looks very similar to a physical one. In addition, in general this setup reciprocally isolates two applications
with neither one being able to gain control over the other.

In contrast to this, a TrustZone-like system in general does not provide several application virtual machines and
longitudinal separation. Instead, it provides lateral separation between two domains: The unprivileged application
firmware and a privileged hypervisor. Application firmware may communicate with the hypervisor through defined
interfaces but due to TrustZone's design it need not even be aware of the hypervisor's existence. This makes a perfect
fit for our reset controller. The reset controller firmware would be running in privileged mode and without exposing any
communication interfaces to application firmware. The application firmware would be running in unprivileged mode
without any modification. The main hurdles to the implementation to a system like this are the requirement for a
microcontroller providing this type of virtualization on the one hand and the complexity of correctly employing this
virtualization on the other hand. Virtualization systems such as TrustZone are still orders of magnitude more complex to
correctly configure than it is to simply use separate hardware and secure the interfaces in between.

\chapter{Conclusion}

\newpage
\appendix
\chapter{Acknowledgements}
\newpage

\chapter{References}
\nocite{*}
\printbibliography
\newpage

\chapter{Demonstrator schematics and code}

\chapter{Economic viability of countermeasures}
\section{Attack cost}
\section{Countermeasure cost}

% FIXME maybe include a standard for the technical side of a safety reset system here, e.g. in the style of an IETF draft?

\end{document}