ma: Go through remaining TODOs

2020-06-29 12:56:37 +02:00 · 2020-06-29 12:56:37 +02:00 · 5474a79b75
commit 5474a79b75
parent be7775ca0d
1 changed files with 38 additions and 57 deletions
--- a/ma/safety_reset.tex
+++ b/ma/safety_reset.tex
@ -119,13 +119,12 @@

 \chapter{Introduction}

-%FIXME: sprinkle this section with citations.
 In the power grid, as in many other engineered systems, we can observe an ongoing diffusion of information systems into
 industrial control systems. Automation of these control systems has already been practiced for the better part of a
 century.  Throughout the 20th century this automation was mostly limited to core components of the grid. Generators in
 power stations are computer-controlled according to electromechanical and economic models. Switching in substations is
 automated to allow for fast failure recovery. Human operators are still vital to these systems, but their tasks have
-shifted from pure operation to engineering, maintenance and surveillance.
+shifted from pure operation to engineering, maintenance and surveillance\cite{crastan03,anderson02}.

 With the turn of the century came a large-scale trend in power systems to move from a model of centralized generation,
 built around massive large-scale fossil and nuclear power plants, towards a more heterogenous model of smaller-scale
@ -135,7 +134,7 @@ particular from a current standpoint seems unavoidable for our continued existen
 grid these systems constitute a significant challenge. Fossil-fueled power plants can be controlled in a precise and
 quick way to match energy consumption. This tracking of consumption with production is vital to the stability of the
 grid. Renewable energies such as wind and solar power do not provide the same degree of controllability, and they
-introduce a larger degree of uncertainty due to the unpredictability of the forces of nature.
+introduce a larger degree of uncertainty due to the unpredictability of the forces of nature\cite{crastan03}.

 Along with this change in dynamic behavior, renewable energies have brought forth the advance of distributed generation.
 In distributed generation end-customers that previously only consumed energy have started to feed energy into the grid
@ -589,7 +588,6 @@ stretched their area of expertise where resorting to established standard protoc
 outcome\cite{weith01}. Compared to industry-standard transport security the IEC standards provide a simplistic key
 management framework based on a static shared key with unlimited lifetime and provide sub-optimal transport security
 properties (e.g.\ lack of forward-secrecy)\cite{khurana01,sato01}.
-% TODO maybe expand this?

 \subsection{The regulatory situation in selected countries}

@ -622,9 +620,7 @@ installation. This makes it theoretically possible for a utility company to stil
 disconnect a customer, but this would be a spearate installation from the smart meter. In Germany there are significant
 barriers that have to be met before a utility company may cut power to a household\cite{delaw01}. The elision of a
 disconnect switch means attacks on German meters will be limited in influence to billing irregularities and attacks
-using DSM equipment.
-
-% TODO elaborate DSM attacks vs. whole-household attacks in attacks section
+using DSM equipment such as water storage heaters that represent only a fraction of overall load.

 \subsubsection{The Netherlands}
 The Netherlands were early to take initiative to roll out smart metering after its recognition by the European
@ -676,7 +672,7 @@ A unique point in the Japanese utility metering landscape is that the current pr
 Japan residential utility meters are usually mounted outside the building on an exterior wall and every month someone
 with a mirror on a long stick will come and read the meter. The meter reader then makes a thermal paper print-out of the
 updated utility bill and puts it into the resident's post box. This practice gives consumers good control over their
-consumption but does incur significant personnel overhead. % TODO decide on citation. Maybe the toshiba one?
+consumption but does incur significant personnel overhead.

 \subsubsection{The USA}

@ -801,20 +797,20 @@ able to evade it\cite{bmwi04}.
 \subsection{Smart grid components as embedded devices}

 A fundamental challenge in smart grid implementations is the central role smart electricity meters play. Smart meters
-are used both for highly-granular load measurement and (in some countries) load switching\cite{zheng01}.
-Smart electricity meters are effectively consumer devices. They are built down to a certain price point that is measured
-by the burden it puts on consumers. The cost of a smart meter is ultimately limited by it being a major factor in the
+are used both for highly-granular load measurement and (in some countries) load switching\cite{zheng01}.  Smart
+electricity meters are effectively consumer devices. They are built down to a certain price point that is measured by
+the burden it puts on consumers. The cost of a smart meter is ultimately limited by it being a major factor in the
 economies of a smart meter rollout\cite{bmwi03}.  Cost requirements preclude some hardware features such as the use of a
 standard hardened software environment on a high powered embedded system (such as a hypervirtualized embedded linux
 setup) that would both increase resilience against attacks and simplify updates. Combined with the small market sizes in
-smart grid deployments\footnote{
-    Most vendors of smart electricity meters only serve a handful of markets. For the most part, smart meter development
-    cost lies in the meter's software. % TODO cite?
-    There exist multiple competing standards applicable to various parts of a smart electricity meter. In addition,
-    most countries have their own certification regimen\cite{cenelec01}. This complexity creates a large development
-    burden for new market entrants\cite{perez01}.
-}
-this results in a high cost pressure on the software development process for smart electricity meters.
+smart grid deploymentsthis results in a high cost pressure on the software development process for smart electricity
+meters.  Most vendors of smart electricity meters only serve a handful of markets. A large fraction of smart meter
+development cost lies in the meter's software. Landis+Gyr, a large manufacturer that makes most of its revenue from
+utility meters in their 2019 annual report write that they \SI{36}{\percent} of their total R\&D budget on embedded
+software (firmware) while spending only \SI{24}{\percent} on hardware R\&D\cite{landisgyr01,landisgyr02}.  There exist
+multiple competing standards applicable to various parts of a smart electricity meter and most countries have their own
+certification regimen\cite{cenelec01}. This complexity creates a large development burden for new market
+entrants\cite{perez01}.

 \subsection{The state of the art in embedded security}

@ -1071,7 +1067,6 @@ Even though most smart meters provide some level of physical security, we do not
 In the following section we will elaborate our attacker model and it will become apparent that sufficient physical
 security to defend against all attackers in our model would be infeasible, and thus we will design our overall system
 to remain secure even if we assume some number of physically compromised devices.
-% FIXME expand

 \subsection{Attack characteristics}
 The attacker model the two above conditions must hold under is as follows. We assume three angles of attack: Attacks by the
@ -1120,8 +1115,7 @@ Considering overall security, we first introduce the reset authority, a trusted
 issuing reset commands in our system. In practice this trusted party may be part of the utility company, part of an
 external regulatory body or a hybrid setup requiring both to coöperate. We assume this party will be designed to be
 secure against all of the above attacker types. The precise design of this trusted party is out of scope for this work
-but we will list some practical suggestions on how to achieve security below. % FIXME do the list
-% FIXME put up a large box on this limitation
+but we will provide some practical suggestions on how to achieve security below in Section \ref{sec-regulation}.

 Using an asymmetric cryptographic design centered around the reset authority, we rule out all attacks except for
 denial-of-service attacks on our system by any of the four attacker types. All reset commands in our system originate
@ -1135,18 +1129,16 @@ but we can at least choose a communication channel that is resilient under the a
 Finally, we have to consider the issue of hardware security. We will solve the problem of physical attacks by simply not
 programming any secret information into devices. This also simplifies hardware production. We consider supply-chain
 attacks out-of-scope for this work.
-% FIXME include considerations on production testing somewhere (is the device working? is the right key programmed?)

 \subsection{Complex microcontroller firmware}

 The \emph{security} property from \ref{sec_criteria} is in a large part reliant on the security of our reset
 controller firmware. The best method to increase firmware security is to reduce attack surface by limiting external
-interfaces as much as possible and by reducing code complexity as much as possible.
-% FIXME formalize this as something like "Design Goal DG-023-42-1" ?
-If we avoid the complexity of most modern microcontroller firmware we gain another benefit beyond implicitly reduced
-attack surface: If the resulting design is small enough we may even succeed in formal verification of our security
-properties.  Though formal verification tools are not yet suitable for highly complex tasks they are already adequate
-for small amounts of code and simple interfaces.
+interfaces as much as possible and by reducing code complexity as much as possible.  If we avoid the complexity of most
+modern microcontroller firmware we gain another benefit beyond implicitly reduced attack surface: If the resulting
+design is small enough we may even succeed in formal verification of our security properties.  Though formal
+verification tools are not yet suitable for highly complex tasks they are already adequate for small amounts of code and
+simple interfaces.

 \subsection{Modern microcontroller hardware}

@ -1169,9 +1161,6 @@ In our design we achieve simplicity on two levels: One, we isolate the very comp
 controller by having both run on separate microcontrollers. Two, we keep the reset controller firmware itself extremely
 simple to reduce attack surface there. Our protocol only has one message type and no state machine.

-% \subsection{Regulatory and economical constraints}
-% TODO decide whether to keep this section
-
 \subsection{Safety vs. security: Opting for restoration instead of prevention}

 By implementing our reset system as a physically separate microcontroller we sidestep most security issues around the
@ -1191,9 +1180,9 @@ device series and across vendors.
 Attack resilience in the power grid can benefit from a safety-focused approach. The greater threat such an attack poses
 is not the temporary denial of service of utility metering functions. Even in a highly integrated smart grid as
 envisioned by utility companies these measurement functions are used by utility companies to increase efficiency and
-reduce cost but are not necessary for the grid to function at all. % TODO citation
-Thus if we can provide mere \emph{safety} with a fail-safe semantic instead of unattainable perfect \emph{security} we
-have gained resilience against a large class of realistic attack scenarios.
+reduce cost but are not necessary for the grid to function at all.  Thus if we can provide mere \emph{safety} with a
+fail-safe semantic instead of unattainable perfect \emph{security} we have gained resilience against a large class of
+realistic attack scenarios.

 \subsection{Technical outline of a safety reset system}

@ -1292,12 +1281,11 @@ several ISM bands\footnote{
 }. ZigBee is another popular standard and some vendors additionally support their own proprietary protcols\footnote{
    For an example see \cite{honeywell01}.
 }.
-% TODO expand this?

 \subsection{Frequency modulation as a communication channel}

 For our system, we chose grid frequency modulation (henceforth GFM) as a low-bandwidth unidirectional broadcast
-communication channel.  Compared to traditional PLC, GFM requires only a small amount of additional hardware, works
+communication channel.  Compared to traditional PLC, GFM requires only a small amount of additional equipment, works
 reliably throughout the grid and is harder to manipulate by a malicious actor. 

 Grid frequency in Europe's synchronous areas is nominally 50 Hertz, but there are small load-dependent variations from
@ -1400,8 +1388,6 @@ current in 12 pulses per cycle. In the best case an SCR pulse rectifier switched
 \SIrange{0}{100}{\percent} load changes from one rectifier pulse to the next, i.e. within a fraction of a single cycle.}
 data rates.

-% FIXME validate this \subsubsection with an expert
-
 \subsubsection{Avoiding dangerous modes}

 Modern power systems are complex electromechanical systems. Each component is controlled by several carefully tuned
@ -1452,7 +1438,8 @@ controllable load:
 \end{description}

 \section{From grid frequency to a reliable communication channel}
-% FIXME add intro text here
+Based on the physical properties oulined above we will provide the theoretical groundwork for a practical communication
+system based on grid frequency modulation.

 \subsection{Channel properties}
 In this section we will explore how we can construct a reliable communication channel from the analog primitive we
@ -1549,11 +1536,11 @@ on the host.

 \subsection{Cryptographic security}
 \label{sec-crypto}
-% FIXME intro blurb
-
-From a protocol security perspective the system we are looking for can informally be modelled as consisting of three
-parties: the trusted \emph{transmitter}, one of a large number of untrusted \emph{receivers}, and an \emph{attacker}.
-These three play according to the following rules:
+Above the communication base layer elaborated in the previous section we have to layer a cryptographic protocol to
+ensure system security. We want to avoid a case where a third party could interfere with our system or even subvert this
+safety system itself for an attack.  From a protocol security perspective the system we are looking for can informally
+be modelled as consisting of three parties: the trusted \emph{transmitter}, one of a large number of untrusted
+\emph{receivers}, and an \emph{attacker}.  These three play according to the following rules:

 \begin{description}
    \item[Access.] Both transmitter and attacker can transmit any bit sequence.
@ -1623,7 +1610,6 @@ attacker would record the trigger transmission. We can assume most meters were r
 attacker cannot cause a significant number of additional resets immediately afterwards.  However, the attacker could
 wait several years for a number of new meters to be installed that might not yet have updated firmware that includes the
 last transmission. This means the attacker could cause them to reset by replaying the original sequence.
-% TODO mention why firmware has to be update with last transmission

 A possible mitigation for this risk would be to introduce one bit of information into the trigger message that is
 ignored by the replay protection mechanism.  This \emph{enable} bit would be $1$ for the actual reset trigger message.
@ -1817,7 +1803,7 @@ systems there is a large amount of academic research on such algorithms\cite{nar
 popular approach to these systems is to perform a Short-Time Fourier Transform (STFT) on ADC data sampled at high
 sampling rate (e.g. \SI{10}{\kilo\hertz}) and then perform analysis on the frequency-domain data to precisely locate the
 peak at \SI{50}{\hertz}. A key observation here is that FFT bin size is going to be much larger than required frequency
-resolution. This fundamental limitation follows from the Nyquist criterion %FIXME cite DSP text
+resolution. This fundamental limitation follows from the Nyquist criterion\cite{shannon01}
 and if we had to process an \emph{arbitrary} signal this would severely limit our practical measurement accuracy
 \footnote{
    Some software packages providing FFT or STFT primitives such as scipy\cite{virtanen01} allow the user to
@ -1861,8 +1847,6 @@ algorithm is worse than algorithms involving more complex models under some cond
 lunch} meaning that more complex perform worse when the input signal deviates from their models.

 \subsection{Frequency sensor hardware design}
-% FIXME: link to schematics in appendix
-% FIXME: include pics of finished board and device

 \label{sec-fsensor}
 Our safety reset controller will have to measure mains frequency to later demodulate a reset signal transmitted through
@ -1919,7 +1903,8 @@ isolation is accomplished with a pair of high speed optocouplers on its \texttt{
 signal processing is a simple voltage divider using high power resistors to get the required creepage along with some
 high frequency filter capacitors and an op-amp buffer. The power supply is an off-the-shelf mains-input power module.
 The system is implemented on a single two-layer PCB that is housed in an off-the-shelf industrial plastic case fitted
-with a printed label and a few status lights on its front.
+with a printed label and a few status lights on its front. The schematics of our system can be found in Appendix
+\ref{sec-app-freq-sens-schematics}.

 \subsection{Clock accuracy considerations}

@ -2429,8 +2414,6 @@ an example, the Honeywell REX2 uses a Maxim Integrated \texttt{71M6541} main app
 Texas Instruments \texttt{CC1000} series radio transceiver and is advertised to support both over-the-air firmware
 upgrade and a remotely accessible disconnect switch.

-% TODO add pics of the intact easymeter and of the one with the safety reset0r hooked up
-
 \begin{figure}
    \centering
    \begin{subfigure}{\textwidth}
@ -2538,8 +2521,6 @@ After extensive simulations and testing of the individual modules of our solutio
 experiment. We tried the demonstrator setup in Figure \ref{fig_proto_pic} using an emulated noisy DSSS signal in
 real-time. Our experiment went without any issues and the firmware implementation correctly reset the demonstrator's
 meter. We were happy to see that our extensive testing paid off: The demonstrator setup worked on its first try.
-% FIXME add pictures of the finished demo setup in action
-% FIXME maybe add an SER curve here?

 \section{Lessons learned}

@ -2646,6 +2627,7 @@ described in a single, concise document. The complicated side of standardization
 backend operation including key management, coördination and command authorization.

 \section{Regulatory adoption}
+\label{sec-regulation}

 Since the proposed system adds significant cost and development overhead at no immediate benefit to either consumer or
 utility company it is unlikely that it would be adopted voluntarily. Market forces limit what long-term planning utility
@ -2743,6 +2725,7 @@ public repository listed on the second page of this document.
 %\includenotebook{DSSS modulation experiments}{dsss_experiments-ber}

 \chapter{Frequency sensor schematics}
+\label{sec-app-freq-sens-schematics}
 \fancyhead[C]{Frequency sensor schematics (1/3)}
 \fancyfoot[C]{}
 \fancyhead[R]{\thepage}
@ -2782,12 +2765,10 @@ public repository listed on the second page of this document.
 \label{symbol_size_chart}
 \includepdf[fitpaper]{resources/safetyreset-symbol-sizes.pdf}

-% FIXME
+% TODO
 %\chapter{Economic viability of countermeasures}
 %\section{Attack cost}
 %\section{Countermeasure cost}
 %\section{Conclusion}

-% FIXME maybe include a standard for the technical side of a safety reset system here, e.g. in the style of an IETF draft?
-
 \end{document}