phd-thesis/chapter-epa/chapter.tex


\chapterquote{attributed to Grace Hopper\cite{
    WikiQuoteGraceHopper,
    QuoteOriginMost2014,
}}{
    The most dangerous phrase in the language is ``We've always done it this way!''.
}

\chaptertitle{The German ePA: A Motivating Counter-Example}
\label{chapter-epa}

\todo{FIXME: Proper citation here}
\sourceattrib{This part is based on a short paper written by Jan Sebastian Götte and presented by Jan Sebastian Götte at
the HS3 workshop at ESORICS 2025.}
Looking at the landscape of computer security solutions, we are presented with a wide variety of vendors and products
that may give the impression that hardware security is a solved problem. Vendors sell various claims rangning from
\emph{``You don't need hardware security, just do it in the cloud!''}~\cite{
    utimacoWhatCloudHSM2025,
    microsoftOverviewAzureCloud,
    ibmCloudHSM2016,
    amazonAWSCloudHSM,
    googleCloudHSMCloud2025,
    WhatCloudHSM}
to \emph{``Buy our HSM and you will be secure!''}~\cite{utimacoUseCases,thalesLunaNetworkHardware}. In
practice, things are not as easy and even well-intentioned projects still often go awry on the hardware security
dimension. To motivate our research into physical security in this thesis, in this chapter we will have a look at one
such project that was done by capable people with the best intentions, yet it resulted in a hardware security design
that is dangerously inadequate for the purpose.

Beginning May 2025, after several delays, Germany has started the nation-scale rollout of its new electronic medical
record system, named ePA (short for \emph{elektronische Patientenakte}, ``electronic patient
record'')~\cite{kochNochVieleUnklarheiten2025}. The system aims to create a national database accessible to all
healthcare providers that holds the complete electronic medical records of all publically insured people living in
Germany. The system aims to replace paper-based workflows that are error-prone and lead to healthcare providers often
only having access to a subset of patient's medical records. Data in scope for the system includes medical letters,
laboratory results, and medical imaging files.

Due to Germany's mandatory health insurance laws, the system's user base encompasses the majority of all German
residents, approximately 90\%. People who have replaced their public health insurance with private insurance as of now
are not subject to the system. In Germany, by law private health insurance is only available to people from the top 10th
percentile of household income. This means that the system disproportionally affects people who have low income,
creating an equity issue. While it is possible to opt out from the use of the new digital record, the process of opting
out is difficult. Additionally, the government and health insurance providers have publically depicted the system in a
one-sidedly positive way, meaning that it is unlikely the majority of people subject to the system have a comprehensive
understanding of the system's benefits and risks that would be necessary for an informed decision.

While there has been loud criticism of the system's security from civil society organizations such as digital rights
nonprofit organization Chaos Computer Club (CCC) \cite{kochMoreMoreExperts2025} and several severe security flaws have
been demonstrated practically, this criticism has largely been ignored by the political structures in charge. We observe
that despite this civil society outrage and the system's large scale, it has received little attention from the academic
cryptography and information security community.

In this chapter, we aim to highlight some unconventional cryptographic engineering decisions in the system. In
particular, we point out that the system's core per-user secrets are kept in a rudimentary key escrow system whose
security is based on engineering assumptions, not on cryptographic principles. Furthermore, we observe that by
specification, the individual user keys of the system are derived from a per-user cleartext salt based on a system-wide
long-term secret with only 256 bits of entropy\footnote{
    In previous versions of the standard \cite{
        gematikSpezifikationSchluesselgenerierungsdienstEPA2023,
        gematikUebergreifendeSpezifikationVerwendung2025,
    }, there were two escrow services, with both keys used in layers to reduce the risk of a compromise of either one.
    The current standard only requires one escrow service, and drops the entropy requirement of the root keys from 512
    bits to 256 bits. The apparent reason for the long-term nature of these keys is that they are updated manually.
}. Finally, we note that according to specification, the only physical security requirement for the protection of this
highly sensitive secret is a ``hard, opaque potting material'', with no tamper detection and response required.

We base our analysis of the ePA on the system's publicly available standards in their latest version as of the writing
of the paper underlying this chapter in April 2025, describing version 3.0 of the healthcare record system \cite{
    gematikSpezifikationAktensystemEPA2025,
    gematikUbergreifendeSpezifikationVerwendung2024,
}. We note that hypothetically, the implementation might deviate from these standards and be more secure. The reference
implementation provided by the specification authority \cite{GithubRepositoryERPFD} follows the specified minimum
requirements closely. As of now, there is no meaningful way for either the public or for researchers such as us to
ascertain the concrete implementation security of the system.

\section{The Design of ePA}

ePA is embedded into Germany's national public healthcare backend system ``Telematikinfrastruktur'' (abbreviated TI;
German for ``telematics infrastructure''). TI is a highly complex system, and a detailed description would exceed the
limits of this analysis. Briefly put, TI consists of a shared demilitarized zone (DMZ) that parties like insurance
providers and healthcare providers connect to through a VPN. At the client location, usually an individual doctor's
office or a hospital, this VPN connection is terminated by a specialized VPN appliance named ``Konnektor'' that
simultaneously acts as a trusted component inside the client network hosting some software for purposes such as
authentication. The Konnektor contains several smart cards that store keys used for authentication. Konnektor devices
are offered by several vendors and healthcare providers like doctor's offices are indivudally responsible for purchasing
and maintaining a Konnektor.

% FIXME: Is there a threat/trust model of the system that you could summarise in a few sentences?

Every person enrolled in the system as well as every healthcare professional providing services under it is issued an ID
card that contains a smart card with keys to authenticate towards the central infrastructure. The primary use of these
smart cards previously was to automatically provide personal information such as name, birth date, address and insurance
enrollment status when an enrolled person visits a healthcare provider.

ePA is implemented inside the TI system. Its centralized services are accessed by healthcare providers through the TI's
VPN, and by patients through proxy servers connected to TI's VPN. Patient records are encrypted and decrypted inside
TI's backend systems. Smart cards authenticate parties and hardware devices to each other. Each insurance provider picks
one of several implementations of ePA's server-side infrastructure to run for its clients. Currently, there are two
approved implementations of this server-side infrastructure.

With the current version of the specificatoin, the overall architecture of ePA heavily relies on Trusted Execution
Environments (TEEs). Data processing on the server side is done in plaintext inside TEEs, with some cryptographic key
management delegated to a Hardware Security Module (HSM). While attacks on the TEEs are considered in the system, the
HSMs are assumed to be perfectly secure, and the system does not include mitigations for a compromised HSM. The primary
motivation for plaintext processing seems to be to enable large-scale data analysis for research purposes without
requiring consent or cooperation of the people whose records are being
processed~\cite{gematikWhitepaperDatenschutzUnd2025}.

The primary services offered by the server side are authentication services, key escrow, and a database storing the
encrypted records themselves. Records are symmetrically encrypted with keys that are derived from system-wide secrets
inside an HSM. The primary motivation behind the use of a key escrow service seems to be to enable the creation of a
duplicate user ID smartcard in case an enrolled person looses theirs. While the current version of the standard is
unclear on the exact mechanism of key derivation, in previous versions of the standard, the escrow service's root key, a
random salt, and the healthcare ID number of the enrolled person was used in SHA256-HKDF. The specification requires
that a new root key is generated once a year, but as far as we can tell, record key rollover is not done automatically
but is only meant to be done when the \emph{user} requests it, and old root keys must be retained forever to ensure old
records can be accessed. Through this lack of automatic key rollover combined with the need to retain root keys
indefinitely, attack surface is maximized and incremental compromises of the system over long time spans become possible.

\subsection{Previous Analyses}

\emph{gematik}, the state-owned company specifying the system, commissioned several security assessments of the system
relating to the key escrow service.
\citeauthor{fischlinKryptographischeAnalyseSpezifikation2021}~\cite{fischlinKryptographischeAnalyseSpezifikation2021}
focuses on the cryptographic dimension of the key escrow service used in an older version of the standard, and is now
obsolete. \textcite{slanySicherheitsanalyseZurSicherheit2020} approaches the system at a higher level, and focuses on
the cryptography of the inner protocol layers spoken between the system's components. Industry research organization
Fraunhofer SIT was comissioned for a structured, theoretical assessment of attack paths to the system
\cite{fraunhofersitAbschlussberichtSicherheitsanalyseGesamtsystems2024}. We are not currently aware of independent
academic security research on the system.

The design and operation of the system have been independently described in detail by civil society activists, who have
demonstrated several successful attacks on the system. \textcite{tschirsichHackerHinOder2019} demonstrated how they
could trivially acquire each of the smartcards as well as the Konnektor necessary for accessing the system.
\textcite{tschirsichKonnteBisherNoch2024} summarize the history of attacks demonstrated on the system and show multiple
practical attacks on various parts of the system's implementation.

\section{Concerning Cryptographic Engineering Choices}

We wish to highlight some of the design choices in the system that we believe stray from current best practice. This is
by no means an exhaustive list, and is only meant to underscore why we believe the system deserves more scrutiny.

\subsection{Use of Key Escrow}

Key escrow describes a concept that was originally devised during the 1990ies out of a fear that the widespread
availability of strong encryption would stifle the ability of law enforcement agencies to wiretap communications in the
prosecution of crime. At the core of the concept rests the idea that a trusted \emph{key escrow} service should hold a
copy of every private key in use. In case the government wants to access one of these keys, the key escrow service can
provide this access\textcite{andersonSecurityEngineeringGuide2020,jarvisCryptoWarsFight2020}.

While key escrow services have been a topic of political debate in decades past, in the cryptographic community,
consensus generally is that they are a bad idea since they pose a centralized target for attack, and increase attack
surface \cite{
    abelsonRisksKeyRecovery1997,
    abelsonKeysDoormats2015,
    andersonSecurityEngineeringGuide2020,
    rogawayMoralCharacterCryptographic2015,
}.

Our first concern is the system's general approach of using a key escrow service instead of securely storing the keys
inside the system's already existing smart card infrastructure. Like any other key escrow system, this key escrow
service poses a centralized security risk. The system's designers made this decision since it was considered important
that when an encrypted record must be restored after an insurance ID card is lost, it can be re-created without the
cooperation of the healthcare providers holding the primary copies of the person's medical records.

\subsection{Cryptographic Design}

\todo{Feedback from HS3 reviewer: I feel that this section is a mix-up of critique on the cryptographic design and the
    approach to privacy protection and data minimisation. How are they linked? I'm missing some discussion here.}
The system's overall cryptographic design is intentionally kept simple. The standard explicitly mentions that symmetric
primitives have been preferred over asymmetric primitives in the core key escrow functions due to the risk of an attack
on asymmetric primitives in the long term. Notably, other advanced cryptographic techniques such as secret sharing
schemes, oblivious pseudo-random functions, or multiparty computation that could help with the security and privacy of
the key escrow service by reducing trust placed in any single component of the service are also absent while the system
relies extensively on the engineering-based security guarantees of TEEs and HSMs. Given that the ePA system trusts its
HSMs as unconditionally secure, it is unclear what purpose the manual yearly root key renewal serves, especially absent
an automatic way to roll over the wrapped record keys.

A consequence of the systems' simple cryptographic design is that the system trusts its components to a large degree.
For instance, the system leaks a person's insurance ID number to the key escrow HSM every time record keys are
requested. Along with the timing and frequency of these requests, this leaks information on the person's condition to
the key escrow service in an identifiable way.

\subsection{A Realistic Attacker Model}

We observe that the system as a whole does not appear to be designed to defend against well-resourced adversaries. A
series of demonstrated practical attacks on the system, none of which required advanced capabilities, confirm this
impression. In \textcite{tschirsichKonnteBisherNoch2024} summarize a series of successful attacks. Attacks include
social engineering resulting in access to copies of smartcards enabling accessing patient records, using misconfigured
Konnektor VPN appliances with their local network DMZ and authentication interface exposed on the public internet,
circumventing video-based authentication processes resulting in duplicate file keys being provided, classis SQL
injection on a backend service maintaining an authentication database, accessing all national patient records through
brute-force enumeration of weak identifiers, and several more.

We believe that a system like this must be designed to withstand well-resourced adversaries such as foreign secret
services, since the medical data stored in such as information on chronic illness, sexually transmittable disease or
severe food allergies has intelligence value. Repeated breaches of national digital infrastructure such as the 2015
breach of the US Office of Personnel Management \cite{barrettUSSuspectsHackers2015} or the 2024 compromise of US
telecommunications wiretapping systems \cite{mennChineseGovernmentHackers2024} demonstrate that such state-sponsored
attacks on national digital infrastructure are a realistic concern. A possible scenario in the ePA system would be an
foreign secret service gaining access to one of the HSMs storing the systems' root secrets, extracting the root secret
by an advanced physical attack, then being able to decrypt captured encrypted health records at will. Similarly, a
nation-state adversary might have access to an exploit allowing the compromise of the system's TEEs, which would enable
the extraction of any patient records being processed in plaintext inside these TEEs.

\subsection{Physical Security}

Physical security has received some consideration in the system's specification. First, smart cards are used extensively
for authentication. Second, Hardware Security Modules are used in key locations of the system to process some
cryptographic secrets. The core of the system's key escrow service is implemented inside an HSM that is part of a
redundant HSM cluster. However, it is notable that the actual security level required for this HSM is only FIPS 140-2
level 3 \cite{usnationalinstituteofstandardsandtechnologySecurityRequirementsCryptographic2002}. FIPS 140-2 is a US
government standard that used to be popular for the specification of HSMs. However, not only has FIPS 140-2 been made
obsolete by FIPS 140-3 in 2019 \cite{usnationalinstituteofstandardsandtechnologySecurityRequirementsCryptographic2019},
its security level 3 mostly provides logical separation of cryptographic functions from other logic and is not very
meaningful in the context of physical attacks. The only physical requirement of FIPS 140-2 level 3 is that the HSM has a
hard, opaque coating. This coating is specified to be tamper-evident, but notably no active tamper detection or response
features are required by this standard~\cite{andersonSecurityEngineeringGuide2020}. In contrast to the newer FIPS 140-3
standard and the related ISO/IEC 19790 \cite{ISOIEC19790} as well as ISO/IEC 24759 \cite{ISOIEC24759} standards, FIPS
140-2 does not make any particular requirements regarding resistance to side-channel attacks. The lack of tamper
response, unspecified resistance to side-channel attacks and the fact that the ePA specification only requires the
long-lived key escrow root key inside the HSM to have 256 bits of entropy lead to an unsatisfactory overall
constellation.

\section{Conclusion}

In conclusion, we observe that in Germany's ePA national medical record database, despite the decade-long
standardization and implementation process, several cryptographic compromises ended up in the system's final deployment.
Even assuming that nation-scale key escrow is a good idea, the implementation of this key escrow system seems to stray
from current best practice. The system uses a secret key with only 256 bits of entropy to derive highly sensitive secret
keys for potentially tens of millions of people sharing an insurance provider. The cryptographic design of this escrow
system is unsophisticated, ignoring the past three decades in cryptographic developments particularly in multiparty
computation (MPC) and other secret sharing techniques in favor of an engineering approach. In the engineering dimension,
the system's physical security is only held to the basic level 3 of the obsolete FIPS 140-2 standard, which is
considerably less secure than an average credit card payment terminal. The system's root keys are only protected by a
``hard, opaque potting material'' and no tamper detection and response is required. We estimate that the system poses an
attractive and soft target to nation-state adversaries. The system's shortcomings are made more severe by the fact that
the system disproportionally affects the lives of people with low income.

From an academic perspective, it is interesting to see how the ePA ended up in its current state, and the gaps in
cryptographic solutions left by academic research that contributed. A fundamental truth in cryptographic engineering is
that in the absence of technical checks, political promises are no guarantees of restraint. As such, the degree of trust
the ePA system places on organizational measures leads to a concerning overall picture. In particular, the system's
extensive reliance on not just conventional HSMs built to long obsolete security standards but also on trusted execution
environments that have been broken multiple times highlights the need for new approaches to hardware security
that better accomodate real-world use cases.

We believe that Inertial HSMs can address this use case by cleanly separating the physical security primitive into a
retargetable design that can be applied to entire servers if needed, and augment or replace technology like conventional
HSMs or trusted execution environments to provide high-level hardware security. Before introducing IHSMs in
Chapter~\ref{chapter-ihsm}, in the following chapter, we will first complement this chapter's outlook on the state of
the art in hardware security with a survey of tamper sensing meshes in a wide range of real world devices.