\chapterquote{Moxie Marlinspike~\cite{marlinspikeWeShouldAll2013}, see also \textcite{rogawayMoralCharacterCryptographic2015}}{ We can only desire based on what we know. It is our present experience of what we are and are not able to do that largely determines our sense for what is possible. This is why same sex relationships, in violation of sodomy laws, were a necessary precondition for the legalization of same sex marriage. This is also why those maintaining positions of power will always encourage the freedom to talk about ideas, but never to act.} \chaptertitle{Case Study: Multiparty Computation in Scalable Hardware Security Modules} Inertial Hardware Security Modules do not only support much larger payloads compared to conventional HSMs, they also support much higher power dissipation since they allow for direct air cooling of their payload. Because they rotate at high speed, IHSM meshes do not need to be contiguous to provide adequate security. While a non-contiuous rotating mesh might theoretically allow a stationary attack tool to quickly penetrate, then retract through one of the mesh's gaps while the mesh is rotating, the time available for such an attack would be too short for a practical attack. For a mesh with three vertical connecting segments (cf.\ Figure~\ref{fig_proto_mesh} in Chapter~\ref{chapter-ihsm}) rotating at \qty{1000}{\rpm}, this time would be in the order of \qty{20}{\milli\second}. Conventional HSM monitoring circuits often require a similar amount of time to react to an attack~\cite{obermaier2018}. Similar to how the increase in payload \emph{size} unlocks new applications such as the Quantum Key Distribution relay use case we presented in Chapter~\ref{chapter-qkd}, this increase in sustainable power dissipation by a factor of several hundred also unlocks a number of new applications. Especially applications that require large amounts of computing power benefit from IHSM technology, as their needs fundamentally cannot be met by conventional HSMs. One such application that does not translate to conventional HSMs due to its need for large amounts of computing power is Multiparty Computation (MPC). MPC is a cryptographic construct that allows several networked parties to jointly perform a computation in such a way that the inputs to the computation remain private to the parties providing them, and no single party must be trusted for the computation to produce the correct result. Conceptually, MPC is similar to a secret sharing scheme that shares not just data, but computation between untrusted parties. The computation primitive MPC offers is a cryptographic answer to the question of how to bootstrap trust in a computing system. \todo{In this chapter, cite academic publications and patents on HSM cooling!} %The most challenging scenarios in computing arise when multiple %parties such as manufacturers and operators, servers and clients, or sellers and buyers need to interact through %computation. In many practical situations, it is impossible to create a single computer that can be trusted by every %participant. MPC is a generic solution to a multitude of such scenarios reducing the problem of creating a single, %shared computer everyone can trust simultaneously to everyone creating their own computer that they only can trust. We can deconstruct the problem of trust in computing into two largely disjunct parts: Establishing trust in a computing system during its creation is one, and maintaining this trust throughout its life is the other. For the second part of this problem, maintaining trust in a system once trusted, we have an ample supply of good methods such as encryption, authentication, and formally proven protocols. In contrast, establishing trust in a computing system is largely intractable and despite a large corpus of academic research on approaches such as hardware trojan detection and physicaly unclonable functions, only two approaches find practical adoption: In one, we build the system ourselves from the ground up, making sure to leave no part vulnerable to third-party compromise. In the other, we go to a store and physically buy a randomly-chosen computer using cash, assuming that while an attacker can target any particular system, they cannot target all systems simultaneously and we give them too little time to target the system we buy. A limitation of both approaches is that in either case, while the party creating or acquiring the system can trust it, they cannot prove its trustworthiness to other parties. MPC solves this issue by allowing every party to contribute their trusted system to the protocol, cryptographically bootstrapping common trust in the computation and its output\footnote{ In fact, MPC does more than just bootstrapping from each participant trusting their own system to a trusted shared computation. In an MPC protocol providing semi-honest or better security, MPC even \emph{relaxes} each party's trust requirement from trusting their own system to trusting that any $n$-of-$k$ out of all systems contributing to the protocol. }. \section{Fast MPC and Slow HSMs} MPC is a uniquely powerful cryptographic primitive, yet it has still not found widespread practical adoption. This is because MPC is extremely resource-intensive to run. MPC protocols exist on a continuum trading off between extreme memory and bandwidth requirements on one end and intense computational requirements on the other end. At a first glance, MPC and Hardware Security Modules look like they would complement each other well, but HSMs cannot keep up with the intense computational requirements posed by MPC. Using P-256 curve ECC key generation as a benchmark, commercially available HSMs are quoted to perform between 3500 and 22000 cryptographic operations per second~\cite{ kumarIBMZ16Performance2025, ThalesLunaNetwork2024, Utrust_GP_HSM_Se_Series_Datasheet_ENpdf, }. Meanwhile, an MPC protocol doing something as simple as a single AES encryption, corresponding to 7000 logic gates~\cite{wangGlobalScaleSecureMultiparty2017}, requires tens of thousands of cryptographic operations when performed in MPC. As a result, applying conventional HSMs to MPC at any practical scale is infeasible by multiple orders of magnitude. Literature on MPC commonly uses server hardware as a platform for benchmarks. HSMs are slow compared to contemporary computers because they are limited in their power dissipation, and power dissipation is largely proportional to processing speed. In the limited fields where HSMs have found commercial application, this limitation was never considered important and market forces pushing towards faster HSMs remain light\todo{Can we find a citation here?}. Fundamentally, conventional HSMs must envelope the entire payload in a tamper sensing mesh to detect drilling attacks, but a tamper sensing mesh that is impermeable to a drill is also impermeable to air. As a result, any heat conducted from the HSMs processor to the outside world must pass through the mesh. At the same time, the mesh cannot be thinned either because thinning it would enable micro-drilling attacks. The result of these constraints is a high thermal resistance between the HSM's processor and an external heat sink, which limits maximum power dissipation to a fraction of what is achieved in modern CPUs or even GPUs. A secondary limitation of conventional HSMs is that the highly specialized tamper sensing foils used in their construction often cannot be scaled to arbitray sizes without incurring unsustainable process yields due to the multiplication of error rates with increasing area. As a result, even if the heat dissipation problem could be solved, manufacturing the tamper sensing foil for a conventional HSM large enough to contain a more powerful CPU might not be possible. The HSM's tamper-sensing envelope would have to protect not only the CPU itself, but also its supporting components such as memory, power supplies and any internal heat spreading components. Inertial HSMs solve this issue since they allow their payload to be air cooled without compromising security, and they expand the feasible security boundary size from the several hundred milliliters offered by conventional HSMs to several liters and more, enabling the integration of standard, off-the-shelf server components such as mainboards, CPUs, CPU coolers, and power supplies. In this chapter, we will first provide a short overview of the theory of MPC before elaborating a design of an IHSM tailored to MPC tasks including performance calculations and unique design aspects. We will conclude with an outlook of applications unlocked by our design as well as promising areas for future improvements of our design. \section{The Fundamentals of Multiparty Computation} Secure Multiparty Computation can be separated into two broad classes of approaches: Garbled Circuits, and Secret Sharing-based techniques. Garbled Circuit techniques model the computation as a circuit of binary logic components such as logic gates. They are well-suited for implementing cryptographic primitives such as conventional symmetric ciphers such as AES or hash functions such as the SHA-2 series. Secret Sharing-based techniques model computation as an arithmetic circuit made from components such as arithmetic operations. While they can also work in binary, they often support operations on larger finite fields. Secret sharing-based techniques are efficient processing integer numbers, but can have higher overhead in processing using many bitwise operations such as ciphers or cryptographic hash functions\cite{evansPragmaticIntroductionSecure}. \subsection{Security Models in MPC} MPC schemes are usually evaluated assuming one of three adversary levels: \emph{Semi-Honest}, \emph{Covert} or \emph{Malicious} adversaries. A \emph{Semi-Honest} adversary is an adversary that follows the protocol as specified, but that outside the protocol's execution may collude arbitrarily with other parties to reveal the secret inputs of other parties. A \emph{Covert} adversary is an adversary that additionally may cheat during the protocol's execution, but only in ways that cannot be detected by other parties. Finally, a \emph{Malicious} adversary is one that can deviate from the protocol's execution arbitrarily~\cite{aumannSecurityCovertAdversaries2010}. The covert adversary model most closely captures the requirements of a real-world scenario where a small number of cooperating parties runs the protocol, since in such settings cheating parties can easily be excluded once identified. The malicious adversary model captures real-world settings where parties do not have stable identities such as peer-to-peer settings. The semi-honest model is mostly interesting as a research tool since protocols assuming a semi-honest adversary can often be upgraded to covert or malicious security at some performance tradeoff. In a practical setting, a semi-honest secure MPC protocol would not provide additional security over just having one party run the computation except in some situations where inadvertent side-channel leakage is a concern. \subsection{Oblivious Transfer} Before we can go into details of multiparty computation, we need to define an important primitive. Oblivious Transfer is a cryptographic protocol between two parties that enables one party, the \emph{Sender}, to share one of two values to the other party, the \emph{Receiver}. The Receiver can choose which of the two values it wants to receive by inputting a choice bit $b$ into the protocol. After the protocol has been executed, the Receiver learns \emph{only} its chosen value, and learns nothing about the other input value that the Sender provided. Meanwhile, the Sender learns nothing about the choice bit $b$ of the receiver. \subsubsection{OT extensions} Oblivious Transfer is a public-key primitive, requiring asymmetric cryptographic operations. For this reason, ``raw'' Oblivious Transfer is not particularly fast. In practice, this issue is solved by applying a technique called Oblivious Transfer Extensions (OTe)\cite{ishaiExtendingObliviousTransfers2003}. Using OTe, a fixed, small number of public-key base Oblivious Transfer instances can be extended into an arbitrarily large number of Oblivious Transfer instances using only invocations of a pseudo-random function (PRF) such as a cryptographic hash function. \subsection{Boolean MPC with Yao's Garbled Circuits} % Yao's Garbled Circuits Yao's Garbled Circuits (GC) protocol is one of the oldest Multiparty Computation protocols, dating back to the 1980ies. In Yao's GC, two parties jointly compute a function that is represented as a circuit of binary logic gates by evaluating the circuit gate by gate. In Yao's GC, one party, generator, creates a random \emph{garbled} representation of the circuit and sends it to the other party, the evaluator, who computes its output. The core idea in Yao's GC is that every wire $w_i$ in the circuit is assigned two random cryptographic secret keys $w_i^b$, called wire labels, one $w_i^0$ for the logical value $0$ and one $w_i^1$ for the value $1$. The mapping from logic values to these keys is assigned randomly by the generator, and unknown to the evaluator~\cite{ yaoHowGenerateExchange1986, beaverComplexitySecureProtocols1990, evansPragmaticIntroductionSecure, }. Gates are represented in Yao's GC as truth tables with one row for every combination of input wire values. Each row of these truth tables contains the output wire label (i.e. secret key) corresponding to the gate's logical output value for the row's combination of input values. The output wire label in each row is encrypted with \emph{both} input wire labels corresponding to this row as keys. The generator must indicate to the evaluator which row of a gate's truth table to decrypt, while also avoiding leaking the logical value of the output wire to the evaluator. This is commonly done in a technique called \emph{point-and-permute} where a random pointer bit $p_i^b$ is appended to each wire $w_i^b$ label such that $p_i^b = \neg p_i^{\neg b}$. The rows in the gate's truth table are ordered according to the combination of the two input wire labels' pointer bits. When the evaluator obtains the two input wire labels, they obtain their pointer bits, which combined are the index of the row to decrypt in the following gate's truth table. It is clear how the protocol described above can be used to compute any binary circuit, but there are two questions remaining: How do the two parties provide input into the circuit, and how do they decode the circuit's output? Output is handled in Yao's GC by creating an output decoding table for every output wire of the circuit. The output decoding table contains two rows, one for a logical $0$ output value and one for a logical $1$ output value. Each row contains the hash of the output wire's label corresponding to the row's logical output value. This way, the evaluator can identify the logical value knowing the output wire label, but is unable to deduce the output wire label from the output decoding table. Inputs are a bit more difficult to handle. While the generator can easily provide secret inputs by simply providing the evaluator with the input wire labels corresponding to its input, inputs from the evaluator require oblivious transfer to avoid leaking the evaluator's input to the generator. To input the logic bit $b$ on wire $w_i$, the generator and the evaluator perform an 1-out-of-2 oblivious transfer with the generator assuming the Sender role and providing the two input wire labels $w_i^0$ and $w_i^1$ as the two choices, and the evaluator submitting its chosen input bit $b$ as the OT's choice bit. Yao's GC has good performance since the only asymmetric cryptographic primitive used is in the Oblivious Transfer needed for the evaluator's hidden inputs. The generation and the evaluation of the garbled circuit itself both only require evaluations of a pseudorandom function such as a cryptographic hash or a cipher. Still, performing a computation using a Garbled Circuit is many times slower than performing it in the clear. Intuitively, each single-bit gate in the garbled circuit results in several cryptographic operations with input and output sizes of dozens or hundreds of bits. Practically useful functions such as AES encryption have circuit implementations measuring thousands or tens of thousands of gates, meaning these costs quickly escalate for practical problem sizes~\cite{ boyarNewCombinationalLogic2010, songhoriTinyGarbleHighlyCompressed2015, }. % FIXME This entire connecting section %\subsection{Practical Application} %\subsubsection{Preprocessing and Online Phases} %\subsubsection{Constant-Round MPC} % \subsection{Performance} % zahurTwoHalvesMake2015,wangGlobalScaleSecureMultiparty2017,kellerMPSPDZVersatileFramework2020,dalskovFantasticFourHonestMajority % \subsection{Practical Deployments} % \subsection{Solutions} % \subsection{Hardware Security Applied to MPC} % Hardware security primitives can be applied in several roles in an MPC protocol. \section{A High-Performance IHSM for MPC Applications} Multiparty Computation is at the verge of being practical in some applications, but is still too computationally expensive for others. While some attempts at GPU-accelerating MPC primitives exist, in practice it is commonly implemented using CPU processing. The technology comes with an unavoidable increase in computational complexity since each single plaintext computation or gate results in several cryptographic operations. A naive implementation might attempt to implement MPC using an HSM by simply offloading all cryptographic operations to the HSM. In practice, this is not a workable solution due to the slow processing speed of conventional HSMs. Conventional HSMs use low-power embedded processors since their encapsulation using potting and security meshes impedes heat transfer, limiting power dissipation. In the near term, absent radical developments in either MPC theory or in the speed and power efficiency of processing hardware, the only feasible solution for HSM-protected MPC at any practical scale is to find a way to protect an entire server-class computer. As elaborated above, IHSMs are a natural fit for this requirement since they allow for large, air-cooled payloads. %\subsection{Hardware Requirements} As a baseline performance target, we consider a commodity server mainboard in CEB or ATX form factor, populated with a high-end server CPU and a large amount of RAM. As MPC systems do not usually require a great amount of storage, we can largely ignore storage for our size and power calculations.\todo{Refer to performance numbers from research above here} As a result, we end up with a total maximum power dissipation of approximately \qty{420}{\watt} as shown in Table~\ref{tab_power_budget}. Dissipating this amount of power using air cooling is within the capabilities of commodity server cooling components~\cite{coroamaPossibleFutureTrends2025}. \begin{table} \centering \begin{tabular}{r|l|r|r} Count & Component & Power Dissipation (approx.) & Total\\\hline 1 & CPU & \qty{350}{\watt}~\cite{tropgen16YearsSPEC2024}&\qty{350}{\watt}\\ 16 & Memory~\cite{kennedyDDR4DIMMsSystem2017} &\qty{2}{\watt}&\qty{32}{\watt}\\ 1 & Losses & \qty{40}{\watt}&\qty{40}{\watt}\\ \end{tabular} \caption[Power budget of a modern mid-range server.]{Power budget of a modern mid-range server. Losses were estimated at 10\%, consistent with mainboard losses plus losses from a 80plus platinum efficiency certified power supply (~94\% at load).} \label{tab_power_budget} \end{table} A common type of side-channel attack on cryptographic systems are power analysis attacks. In such attacks, the supply current of the target processing system is measured at high speed while the target is performing cryptographic computations. By aggregating the results of a large number of the resulting power traces, it is often possible to infer the value of secret data such as cryptographic keys. To mitigate this type of attack, not only do we have to place the CPU, mainboard, and memory inside of the HSM's tamper-sensing barrier, but also the power supply. A secondary benefit of placing the power supply inside the tamper-sensing barrier is that it simplifies the power wiring between the outside of the IHSM cage and the payload. Supplying the \qty{12}{\volt} power rails that commodity mainboard commonly use requires tens of Ampere. To carrie such high current, the wiring has to be sized accordingly. In an IHSM, even thick wires can easily be passed through the mesh cage, but such wiring requires a large opening at the shaft on one end of the cage, which creates a literal security gap. Placing the power supply inside of the cage reduces the size of the wires needed since the power supply steps down a lower current \qty{240}{\volt} input to the system's high-current \qty{12}{\volt} rails. According to DIN VDE 0298-4\todo{Citation?}, a pair of \qty{1.5}{\milli\meter^2} conductors is sufficient for more than \qty{3}{\kilo\watt} of load under worst-case conditions. \subsection{Software Considerations} While the hardware of a HSM-assisted MPC system is a straightforward application of IHSM technology to a server platform, the software implementation poses some unique challenges. A core concern in an IHSM based on commodity hardware running a commodity operating system is the concrete implementation of the IHSM's alarm reseponse. When the IHSM detects tampering, it is crucial that all secrets in the payload have been made unusable before an attacker can either extract them, or stop the system from making them unusable. Making secret data unusable to an attacker can be done by either deleting it (\emph{zeroization} in HSM terminology) or by encrypting the data when it is stored and destroying the key. Zeroization is more technically challenging, while encryption comes at a performance cost. The main challenges in zeroization are ensuring that the data is deleted fast enough, and making sure it cannot be reconstructed through data remanence effects. Zeroization is practical for small amounts of RAM such as a microcontroller's main SRAM or a small amount of DRAM in a server set aside for cryptographic keys. For larger amounts of data, or for any data stored on flash memory, HDDs etc., encryption is the only viable option due to speed. Encrypting data at rest on HDD or flash storage is straightforward to do in software and is a standard feature in any modern operating system. RAM is more challenging. Encrypting data in RAM cannot be done in software without effectively running a system emulation and incurring a massive performance penalty. Recent CPUs from Intel and AMD contain hardware features that provide transparent DRAM encryption. These hardware features would be necessary when securing an entire sever in an MPC setup with IHSMS technology. % \subsection{Fast Zeroization of Non-Customizable Memories} % Thermite experiements and paper \subsection{A Joint Cooling and IHSM Envelope Powertrain} In this section, we will present a sketch of a design for an IHSM envelope large enough to fit a small server mainboard, and that provides air cooling to the payload. Our sketch solves the engineering issue of moving such an IHSM envelope while simultaneously providing cooling to the payload. % FIXME picture! Our proposed design is based on the idea of using the cooling fans' airflow to power the rotation of the IHSM envelope. Using the basic cylindrical design, the IHSM envelope consists of two discs above and below the payload that are connected through vertical struts containing part of the tamper-sensing mesh on the outside of the payload. We propose widening these vertical connecting struts, and angling them such that the entire envelope becomes a centrifugal impeller. By letting air flow into the envelope from the side, and back out through its top and bottom, the envelope assumes the same configuration used in centrifugal cooling fans. A secondary advantage of this concept is that we do not need a motor on the envelope's shaft, saving vertical space and one difficult to source part. Furthermore, the cooling fans can be located on the outside of the envelope in an easily accessible location, and can be set up in a redundant way such that a failed cooling fan can be replaced while the system continues operation. The only disadvantage of this solution over a direct motor drive is noise. To achieve the speed necessary for sufficient security at the large envelope diameter of an MPC accelerator application, high-airflow fans must be used, which are very noisy when at full speed. We consider this a valid tradeoff since such a system would be deployed in a datacenter where high noise levels are acceptable. \todo{Finish sketch!} \section{Outlook} In this chapter we briefly introduced the challenges raised by MPC at scale, and we outlined a practical solution based on an IHSM-protected sever that can be used to perform MPC with an unique combination of high bandwidth and low latency that was previously considered impractical due to physical security concerns.