phd-thesis/chapter-smpc/chapter.tex

364 lines
27 KiB
TeX

\chapterquote{Moxie Marlinspike~\cite{marlinspikeWeShouldAll2013}, see also
\textcite{rogawayMoralCharacterCryptographic2015}}{
We can only desire based on what we know. It is our present experience of what we are and are not able to do that
largely determines our sense for what is possible. This is why same sex relationships, in violation of sodomy laws,
were a necessary precondition for the legalization of same sex marriage. This is also why those maintaining
positions of power will always encourage the freedom to talk about ideas, but never to act.}
\chaptertitle{Case Study: Multiparty Computation in Scalable Hardware Security Modules}
\label{chapter-smpc}
Inertial Hardware Security Modules do not only support much larger payloads compared to conventional HSMs, they also
support much higher power dissipation since they allow for direct air cooling of their payload. The tamper-sensing
membrane of a conventional HSM must be continuous to provide security, so any heat dissipated by the payload must pass
through it. Since the polymers used in tamper sensing membranes are poor conductors of heat, and since security benefits
from a thicker tamper sensing assembly (cf.\ Chapter~\ref{chapter-survey}), power dissipation in conventional HSMs is
limited~\cite{
petriePartIITechnical,
curetHardwareSecurityModule2025,
zhangTamperrespondentAssembliesPorous2023,
dragoneVentedTamperrespondentAssemblies2020}.
Because IHSMs rotate at high speed, IHSM meshes do not need to be contiguous to provide adequate security. While a
non-contiguous rotating mesh might theoretically allow a stationary attack tool to quickly penetrate, then retract
through one of the mesh's gaps while the mesh is rotating, the time available for such an attack would be too short for
a practical attack. For a mesh with three vertical connecting segments (cf.\ Figure~\ref{fig_proto_mesh} in
Chapter~\ref{chapter-ihsm}) rotating at \qty{1000}{\rpm}, this time would be in the order of \qty{20}{\milli\second}.
Conventional HSM monitoring circuits would likely require a similar amount of time to react to an
attack~\cite{obermaier2018}.
Similar to how the increase in payload \emph{sizew} of IHSMs compared to conventional HSMs unlocks new applications such
as the Quantum Key Distribution relay use case we presented in Chapter~\ref{chapter-qkd}, the increase in sustainable
power dissipation enabled by air cooling also unlocks a number of new applications. Especially applications that require
large amounts of computing power benefit from IHSM technology, as their needs fundamentally cannot be met by
conventional HSMs.
One such application that does not translate to conventional HSMs due to its need for large amounts of computing power
is Multiparty Computation (MPC). MPC is a cryptographic construct that allows several networked parties to jointly
perform a computation in such a way that the inputs to the computation remain private to the parties providing them, and
no single party must be trusted for the computation to produce the correct result. Conceptually, MPC is similar to a
secret sharing scheme that shares not just data, but computation between untrusted parties. The computation primitive
MPC offers is a cryptographic answer to the question of how to bootstrap trust in a computing system.
\todo{In this chapter, cite academic publications and patents on HSM cooling!}
%The most challenging scenarios in computing arise when multiple
%parties such as manufacturers and operators, servers and clients, or sellers and buyers need to interact through
%computation. In many practical situations, it is impossible to create a single computer that can be trusted by every
%participant. MPC is a generic solution to a multitude of such scenarios reducing the problem of creating a single,
%shared computer everyone can trust simultaneously to everyone creating their own computer that they only can trust.
We can deconstruct the problem of trust in computing into two largely disjunct parts: Establishing trust in a computing
system during its creation is one, and maintaining this trust throughout its life is the other. For the second part of
this problem, maintaining trust in a system once trusted, we have an ample supply of good methods such as encryption,
authentication, and formally proven protocols. In contrast, establishing trust in a computing system is largely
intractable and despite a large corpus of academic research on approaches such as hardware trojan detection and
physicaly unclonable functions, only two approaches find practical adoption: In one, we build the system ourselves from
the ground up, making sure to leave no part vulnerable to third-party compromise. In the other, we go to a store and
physically buy a randomly-chosen computer using cash, assuming that while an attacker can target any particular system,
they cannot target all systems simultaneously and we give them too little time to target the system we buy.
A limitation of both approaches is that in either case, while the party creating or acquiring the system can trust it,
they cannot prove its trustworthiness to other parties. MPC solves this issue by allowing every party to contribute
their trusted system to the protocol, cryptographically bootstrapping common trust in the computation and its
output\footnote{
In fact, MPC does more than just bootstrapping from each participant trusting their own system to a trusted shared
computation. In an MPC protocol providing semi-honest or better security, MPC even \emph{relaxes} each party's trust
requirement from trusting their own system to trusting that any $n$-of-$k$ out of all systems contributing to the
protocol.
}.
\section{Fast MPC and Slow HSMs}
MPC is a uniquely powerful cryptographic primitive, yet it has still not found widespread practical adoption. This is
because MPC is extremely resource-intensive to run. MPC protocols exist on a continuum trading off between extreme
memory and bandwidth requirements on one end and intense computational requirements on the other end. At a first glance,
MPC and Hardware Security Modules look like they would complement each other well, but HSMs cannot keep up with the
intense computational requirements posed by MPC.
Using P-256 curve ECC key generation as a benchmark, commercially available HSMs are quoted to perform between 3500 and
22000 cryptographic operations per second~\cite{
kumarIBMZ16Performance2025,
ThalesLunaNetwork2024,
Utrust_GP_HSM_Se_Series_Datasheet_ENpdf,
}. Meanwhile, an MPC protocol doing something as simple as a single AES encryption, corresponding to 7000 logic
gates~\cite{wangGlobalScaleSecureMultiparty2017}, requires tens of thousands of cryptographic operations when performed
in MPC. As a result, applying conventional HSMs to MPC at any practical scale is infeasible by multiple orders of
magnitude. Literature on MPC commonly uses server hardware as a platform for benchmarks.
HSMs are slow compared to contemporary computers because they are limited in their power dissipation, and power
dissipation is largely proportional to processing speed. In the limited fields where HSMs have found commercial
application, this limitation was never considered important and market forces pushing towards faster HSMs remain
light\todo{Can we find a citation here?}. Fundamentally, conventional HSMs must envelope the entire payload in a tamper
sensing mesh to detect drilling attacks, but a tamper sensing mesh that is impermeable to a drill is also impermeable to
air. As a result, any heat conducted from the HSMs processor to the outside world must pass through the mesh. At the
same time, the mesh cannot be thinned either because thinning it would enable micro-drilling attacks. The result of
these constraints is a high thermal resistance between the HSM's processor and an external heat sink, which limits
maximum power dissipation to a fraction of what is achieved in modern CPUs or even GPUs.
A secondary limitation of conventional HSMs is that the highly specialized tamper sensing foils used in their
construction often cannot be scaled to arbitray sizes without incurring unsustainable process yields due to the
multiplication of error rates with increasing area. As a result, even if the heat dissipation problem could be solved,
manufacturing the tamper sensing foil for a conventional HSM large enough to contain a more powerful CPU might not be
possible. The HSM's tamper-sensing envelope would have to protect not only the CPU itself, but also its supporting
components such as memory, power supplies and any internal heat spreading components.
Inertial HSMs solve this issue since they allow their payload to be air cooled without compromising security, and they
expand the feasible security boundary size from the several hundred milliliters offered by conventional HSMs to several
liters and more, enabling the integration of standard, off-the-shelf server components such as mainboards, CPUs, CPU
coolers, and power supplies. In this chapter, we will first provide a short overview of the theory of MPC before
elaborating a design of an IHSM tailored to MPC tasks including performance calculations and unique design aspects. We
will conclude with an outlook of applications unlocked by our design as well as promising areas for future improvements
of our design.
\section{The Fundamentals of Multiparty Computation}
Secure Multiparty Computation can be separated into two broad classes of approaches: Garbled Circuits, and Secret
Sharing-based techniques. Garbled Circuit techniques model the computation as a circuit of binary logic components such
as logic gates. They are well-suited for implementing cryptographic primitives such as conventional symmetric ciphers
such as AES or hash functions such as the SHA-2 series. Secret Sharing-based techniques model computation as an
arithmetic circuit made from components such as arithmetic operations. While they can also work in binary, they often
support operations on larger finite fields. Secret sharing-based techniques are efficient processing integer numbers,
but can have higher overhead in processing using many bitwise operations such as ciphers or cryptographic hash
functions\cite{evansPragmaticIntroductionSecure}.
\subsection{Security Models in MPC}
MPC schemes are usually evaluated assuming one of three adversary levels: \emph{Semi-Honest}, \emph{Covert} or
\emph{Malicious} adversaries. A \emph{Semi-Honest} adversary is an adversary that follows the protocol as specified, but
that outside the protocol's execution may collude arbitrarily with other parties to reveal the secret inputs of other
parties. A \emph{Covert} adversary is an adversary that additionally may cheat during the protocol's execution, but only
in ways that cannot be detected by other parties. Finally, a \emph{Malicious} adversary is one that can deviate from
the protocol's execution arbitrarily~\cite{aumannSecurityCovertAdversaries2010}. The covert adversary model most closely
captures the requirements of a real-world scenario where a small number of cooperating parties runs the protocol, since
in such settings cheating parties can easily be excluded once identified. The malicious adversary model captures
real-world settings where parties do not have stable identities such as peer-to-peer settings. The semi-honest model is
mostly interesting as a research tool since protocols assuming a semi-honest adversary can often be upgraded to covert
or malicious security at some performance tradeoff. In a practical setting, a semi-honest secure MPC protocol would not
provide additional security over just having one party run the computation except in some situations where inadvertent
side-channel leakage is a concern.
\subsection{Oblivious Transfer}
Before we can go into details of multiparty computation, we need to define an important primitive. Oblivious Transfer is
a cryptographic protocol between two parties that enables one party, the \emph{Sender}, to share one of two values to
the other party, the \emph{Receiver}. The Receiver can choose which of the two values it wants to receive by inputting a
choice bit $b$ into the protocol. After the protocol has been executed, the Receiver learns \emph{only} its chosen
value, and learns nothing about the other input value that the Sender provided. Meanwhile, the Sender learns nothing
about the choice bit $b$ of the receiver.
\subsubsection{OT extensions}
Oblivious Transfer is a public-key primitive, requiring asymmetric cryptographic operations. For this reason, ``raw''
Oblivious Transfer is not particularly fast. In practice, this issue is solved by applying a technique called Oblivious
Transfer Extensions (OTe)\cite{ishaiExtendingObliviousTransfers2003}. Using OTe, a fixed, small number of public-key
base Oblivious Transfer instances can be extended into an arbitrarily large number of Oblivious Transfer instances using
only invocations of a pseudo-random function (PRF) such as a cryptographic hash function.
\subsection{Boolean MPC with Yao's Garbled Circuits}
% Yao's Garbled Circuits
Yao's Garbled Circuits (GC) protocol is one of the oldest Multiparty Computation protocols, dating back to the 1980ies.
In Yao's GC, two parties jointly compute a function that is represented as a circuit of binary logic gates by evaluating
the circuit gate by gate. In Yao's GC, one party, generator, creates a random \emph{garbled} representation of the
circuit and sends it to the other party, the evaluator, who computes its output. The core idea in Yao's GC is that every
wire $w_i$ in the circuit is assigned two random cryptographic secret keys $w_i^b$, called wire labels, one $w_i^0$ for
the logical value $0$ and one $w_i^1$ for the value $1$. The mapping from logic values to these keys is assigned
randomly by the generator, and unknown to the evaluator~\cite{
yaoHowGenerateExchange1986,
beaverComplexitySecureProtocols1990,
evansPragmaticIntroductionSecure,
}.
Gates are represented in Yao's GC as truth tables with one row for every combination of input wire values. Each row of
these truth tables contains the output wire label (i.e. secret key) corresponding to the gate's logical output value for
the row's combination of input values. The output wire label in each row is encrypted with \emph{both} input wire labels
corresponding to this row as keys.
The generator must indicate to the evaluator which row of a gate's truth table to decrypt, while also avoiding leaking
the logical value of the output wire to the evaluator. This is commonly done in a technique called
\emph{point-and-permute} where a random pointer bit $p_i^b$ is appended to each wire $w_i^b$ label such that $p_i^b =
\neg p_i^{\neg b}$. The rows in the gate's truth table are ordered according to the combination of the two input wire
labels' pointer bits. When the evaluator obtains the two input wire labels, they obtain their pointer bits, which
combined are the index of the row to decrypt in the following gate's truth table.
It is clear how the protocol described above can be used to compute any binary circuit, but there are two questions
remaining: How do the two parties provide input into the circuit, and how do they decode the circuit's output? Output is
handled in Yao's GC by creating an output decoding table for every output wire of the circuit. The output decoding table
contains two rows, one for a logical $0$ output value and one for a logical $1$ output value. Each row contains the hash
of the output wire's label corresponding to the row's logical output value. This way, the evaluator can identify the
logical value knowing the output wire label, but is unable to deduce the output wire label from the output decoding
table.
Inputs are a bit more difficult to handle. While the generator can easily provide secret inputs by simply providing the
evaluator with the input wire labels corresponding to its input, inputs from the evaluator require oblivious transfer to
avoid leaking the evaluator's input to the generator. To input the logic bit $b$ on wire $w_i$, the generator and the
evaluator perform an 1-out-of-2 oblivious transfer with the generator assuming the Sender role and providing the two
input wire labels $w_i^0$ and $w_i^1$ as the two choices, and the evaluator submitting its chosen input bit $b$ as the
OT's choice bit.
Yao's GC has good performance since the only asymmetric cryptographic primitive used is in the Oblivious Transfer needed
for the evaluator's hidden inputs. The generation and the evaluation of the garbled circuit itself both only require
evaluations of a pseudorandom function such as a cryptographic hash or a cipher. Still, performing a computation using a
Garbled Circuit is many times slower than performing it in the clear. Intuitively, each single-bit gate in the garbled
circuit results in several cryptographic operations with input and output sizes of dozens or hundreds of bits.
Practically useful functions such as AES encryption have circuit implementations measuring thousands or tens of
thousands of gates, meaning these costs quickly escalate for practical problem sizes~\cite{
boyarNewCombinationalLogic2010,
songhoriTinyGarbleHighlyCompressed2015,
}.
% FIXME This entire connecting section
%\subsection{Practical Application}
%\subsubsection{Preprocessing and Online Phases}
%\subsubsection{Constant-Round MPC}
% \subsection{Performance}
% zahurTwoHalvesMake2015,wangGlobalScaleSecureMultiparty2017,kellerMPSPDZVersatileFramework2020,dalskovFantasticFourHonestMajority
% \subsection{Practical Deployments}
% \subsection{Solutions}
% \subsection{Hardware Security Applied to MPC}
% Hardware security primitives can be applied in several roles in an MPC protocol.
\section{A High-Performance IHSM for MPC Applications}
Multiparty Computation is at the verge of being practical in some applications, but is still too computationally
expensive for others. While some attempts at GPU-accelerating MPC primitives exist, in practice it is commonly
implemented using CPU processing. The technology comes with an unavoidable increase in computational complexity since
each single plaintext computation or gate results in several cryptographic operations.
A naive implementation might attempt to implement MPC using an HSM by simply offloading all cryptographic operations to
the HSM. In practice, this is not a workable solution due to the slow processing speed of conventional HSMs.
Conventional HSMs use low-power embedded processors since their encapsulation using potting and security meshes impedes
heat transfer, limiting power dissipation.
In the near term, absent radical developments in either MPC theory or in the speed and power efficiency of processing
hardware, the only feasible solution for HSM-protected MPC at any practical scale is to find a way to protect an entire
server-class computer. As elaborated above, IHSMs are a natural fit for this requirement since they allow for large,
air-cooled payloads.
%\subsection{Hardware Requirements}
As a baseline performance target, we consider a commodity server mainboard in CEB or ATX form factor, populated with a
high-end server CPU and a large amount of RAM. As MPC systems do not usually require a great amount of storage, we can
largely ignore storage for our size and power calculations.\todo{Refer to performance numbers from research above here}
As a result, we end up with a total maximum power dissipation of approximately \qty{420}{\watt} as shown in
Table~\ref{tab_power_budget}. Dissipating this amount of power using air cooling is within the capabilities of commodity
server cooling components~\cite{coroamaPossibleFutureTrends2025}.
\begin{table}
\centering
\begin{tabular}{r|l|r|r}
Count & Component & Power Dissipation (approx.) & Total\\\hline
1 & CPU & \qty{350}{\watt}~\cite{tropgen16YearsSPEC2024}&\qty{350}{\watt}\\
16 & Memory~\cite{kennedyDDR4DIMMsSystem2017} &\qty{2}{\watt}&\qty{32}{\watt}\\
1 & Losses & \qty{40}{\watt}&\qty{40}{\watt}\\
\end{tabular}
\caption[Power budget of a modern mid-range server.]{Power budget of a modern mid-range server. Losses were
estimated at 10\%, consistent with mainboard losses plus losses from a 80plus platinum efficiency certified
power supply (~94\% at load).}
\label{tab_power_budget}
\end{table}
A common type of side-channel attack on cryptographic systems are power analysis attacks. In such attacks, the supply
current of the target processing system is measured at high speed while the target is performing cryptographic
computations. By aggregating the results of a large number of the resulting power traces, it is often possible to infer
the value of secret data such as cryptographic keys. To mitigate this type of attack, not only do we have to place the
CPU, mainboard, and memory inside of the HSM's tamper-sensing barrier, but also the power supply. A secondary benefit of
placing the power supply inside the tamper-sensing barrier is that it simplifies the power wiring between the outside of
the IHSM cage and the payload. Supplying the \qty{12}{\volt} power rails that commodity mainboard commonly use requires
tens of Ampere. To carry such high current, the wiring has to be sized accordingly. In an IHSM, even thick wires can
easily be passed through the mesh cage, but such wiring requires a large opening at the shaft on one end of the cage,
which creates a literal security gap. Placing the power supply inside of the cage reduces the size of the wires needed
since the power supply steps down a lower current \qty{240}{\volt} input to the system's high-current \qty{12}{\volt}
rails. According to DIN VDE 0298-4\todo{Citation?}, a pair of \qty{1.5}{\milli\meter^2} conductors is sufficient for
more than \qty{3}{\kilo\watt} of load under worst-case conditions.
\subsection{Software Considerations}
While the hardware of a HSM-assisted MPC system is a straightforward application of IHSM technology to a server
platform, the software implementation poses some unique challenges. A core concern in an IHSM based on commodity
hardware running a commodity operating system is the concrete implementation of the IHSM's alarm reseponse. When the
IHSM detects tampering, it is crucial that all secrets in the payload have been made unusable before an attacker can
either extract them, or stop the system from making them unusable.
Making secret data unusable to an attacker can be done by either deleting it (\emph{zeroization} in HSM terminology) or
by encrypting the data when it is stored and destroying the key. Zeroization is more technically challenging, while
encryption comes at a performance cost. The main challenges in zeroization are ensuring that the data is deleted fast
enough, and making sure it cannot be reconstructed through data remanence effects. Zeroization is practical for small
amounts of RAM such as a microcontroller's main SRAM or a small amount of DRAM in a server set aside for cryptographic
keys. For larger amounts of data, or for any data stored on flash memory, HDDs etc., encryption is the only viable
option due to speed. Encrypting data at rest on HDD or flash storage is straightforward to do in software and is a
standard feature in any modern operating system. RAM is more challenging. Encrypting data in RAM cannot be done in
software without effectively running a system emulation and incurring a massive performance penalty. Recent CPUs from
Intel and AMD contain hardware features that provide transparent DRAM encryption. These hardware features would be
necessary when securing an entire sever in an MPC setup with IHSMS technology.
% \subsection{Fast Zeroization of Non-Customizable Memories}
% Thermite experiements and paper
\subsection{A Joint Cooling and IHSM Envelope Powertrain}
In this section, we will present a sketch of a design for an IHSM envelope large enough to fit a small server mainboard,
and that provides air cooling to the payload. Our sketch solves the engineering issue of moving such an IHSM envelope
while simultaneously providing cooling to the payload.
\begin{figure}
\centering
\begin{subfigure}{0.45\textwidth}
\centering
\includegraphics[width=\textwidth]{setup_0001.jpg}
\caption{}
\label{fig_setup_left}
\end{subfigure}
\hspace*{5mm}
\begin{subfigure}{0.45\textwidth}
\centering
\includegraphics[width=\textwidth]{setup_0002.jpg}
\caption{}
\label{fig_setup_right}
\end{subfigure}
\caption{Conceptual demonstrator of the fan-driven IHSM primary mesh approach.}
\label{fig_setup}
\end{figure}
Our proposed design is based on the idea of using the cooling fans' airflow to power the rotation of the IHSM envelope.
Figure~\ref{fig_setup} shows a conceptual demonstration of this concept. Using the basic cylindrical design, the IHSM
envelope consists of two discs above and below the payload that are connected through vertical struts containing part of
the tamper-sensing mesh on the outside of the payload. We propose widening these vertical connecting struts, and angling
them such that the entire envelope becomes a centrifugal impeller. By letting air flow into the envelope from the side,
and back out through its top and bottom, the envelope assumes the same configuration used in centrifugal cooling fans.
Laying out an IHSM this way has several advantages. First, we save some vertical space by removing the motor from the
shaft of the mesh. Second, on top of driving the mesh, the airflow also serves to cool the payload. Second, this
approach eliminates the motor driving the mesh as a single point of failure. In a basic IHSM design as we introduced it
in Chapter~\ref{chapter-ihsm}, this motor is a critical component as it failing would lead to the mesh accelerometer
triggering the tamper alarm. Using a brushless motor type the numebr of wear components in this motor can be reduced to
the motor's shaft bearings. A complication in the practical manufacturing of IHSMs at a small scale is that small-scale
production does not allow for a custom-made motor. Limiting the selection to off-the-shelf brushless motors leads to an
unpredictability of bearing life since precise bearing specifications are not usually included in motor datasheets.
Compared to the market for off-the-shelf small brushless motors, cooling fans are easier to shop for. A large selection
of products with various form factors and specifications is available, and manufacturers usually give detailed
information on both performance and lifetime. For industrial and server cooling fans, uninterrupted 24/7 operation is
their nominal operating condition.
The cooling fans can be located on the outside of the envelope in an easily accessible location, and can be set up in a
redundant way such that a failed cooling fan can be replaced while the system continues operation.
The main drawback of a fan-driven IHSM is the necessary airflow. To maximize payload volume, the fan blades must be kept
as narrow as possible. Narrow fan blades work best at high air speed, but high air speed requires the fan to have high
airflow. Besides limiting fan selection and increasing power consumption, high airflow fans also are noisy. Despite
these limitations, we consider fan-driven IHSMs a valid tradeoff since such a system would most likely be deployed in a
datacenter where high noise levels are acceptable.
\section{Outlook}
In this chapter we briefly introduced the challenges raised by MPC at scale, and we outlined a practical solution based
on an IHSM-protected sever that can be used to perform MPC with an unique combination of high bandwidth and low latency
that was previously considered impractical due to physical security concerns.