phd-thesis/chapter-sampling-mesh-monitor/chapter.tex

1105 lines
76 KiB
TeX

\chaptertitle{High Fidelity Security Mesh Monitoring using Low-Cost, Embedded Time Domain Reflectometry}
\begin{abstract}
Security Meshes are patterns of sensing traces covering an area that are used in Hardware Security Modules (HSMs)
and other systems to detect attempts to physically intrude into the device's protective shell. State-of-the-art
solutions manufacture meshes in bespoke processes from carefully chosen materials, which is expensive and makes
replication challenging. Additionally, state-of-the-art monitoring circuits sacrifice either monitoring precision or
cost efficiency. In this paper, we present an embeddable security mesh monitoring circuit constructed from low-cost,
standard components that utilizes Time Domain Reflectometry (TDR) to create a unique fingerprint of a mesh. Our
approach is both low-cost and precise, and enables the use of inexpensive standard Printed Circuit Boards (PCBs) as
security mesh material. We demonstrate a working prototype of our TDR circuit costing less than \price{10}{\euro} in
components that achieves both time resolution and rise time better than \qty{200}{\pico\second}---a $25\times$
improvement over previous work. We demonstrate our prototype's capability to detect and localize faults in several
practical attack scenarios including probing using a high impedance oscilloscope probe and a patching attempt using
micro soldering.
\end{abstract}
\section{Introduction}
% Bei Diss-Citations in der bib dazu schreiben, dass das ne Diss ist.
% 2.2 / 2.3 Wie related? Warum interessant? In Intro erwähnen?
% In Intro herausstellen, dass TDR-Setup neu ist.
% Storyline für Intro: Wir sind die ersten die die Auflösung hinbekommen, und deshalb geht bei uns TDR.
% Time for 256 times oversampling: 710 ms. 384 times: 1056 ms.
Security meshes continue to be the state of the art for tamper sensing in applications where sophisticated physical
attacks such as attempts at drilling or sawing through the device's enclosure to place probes must be prevented. Common
applications for such meshes include Hardware Security Modules (HSMs) used to store and process cryptographic keys
applying security standards such as
FIPS-140-2\cite{usnationalinstituteofstandardsandtechnologySecurityRequirementsCryptographic2002} or ISO/IEC
24759\cite{ISOIEC24759}. Other applications include card payment terminals where PCI PTS HSM
standards\cite{pcisecuritystandardscouncilPaymentCardIndustry2021} are applicable. Security meshes usually consist of
two or more conductive traces that are laid out in a meandering pattern to cover a surface. A sensing circuit
electrically monitors these traces to detect attempts at penetrating this surface.
As is often the case with security technologies, in practice a tension exists between the level of security offered by a
particular security mesh implementation and its implementation cost. Commercial designs often only coarsely monitor the
conductivity of the mesh traces and are incapable of detecting attacks that manipulate small parts of the mesh. The most
secure meshes are made in custom manufacturing processes. Materials such as polymer substrates are specifically chosen
such that the mesh is difficult to manipulate without breaking it. A drawback of this approach is that the specialized
manufacturing processes are difficult to replicate and that the resulting cost of the mesh is high. In some
lower-security applications such as card payment terminals, simpler approaches are still commonly used for their ease of
implementation. Often, standard copper/polyimide Flexible Printed Circuits (FPCs) or even standard Printed Circuit
Boards (PCBs) are used because of the wide availability of manufacturing services.
Several academic approaches exist that target low-cost\cite{
vasileActiveTamperDetection2017,
vasileTemperatureSensitiveActive2017,
dupontMiniaturizedUltraLowPowerTamper2022,
vasileProtectingSecretsAdvanced2019,
} or high-performance mesh monitoring\cite{
immlerBTREPIDBatterylessTamperresistant2018,
immlerSecurePhysicalEnclosures2018,
garbTamperSensitiveDesignPUFBased,
}. Some academic works even try to replace the security mesh with entirely different tamper sensing primitives\cite{
staatAntiTamperRadioSystemLevel2022,
vaiSecureArchitectureEmbedded2015,}.
High-performance mesh monitoring approaches try to characterize the mesh's physical properties with high accuracy, but
often come at the cost of specialized, expensive circuitry. Low-cost approaches utilize advanced analog techniques in
their circuitry to extract precise measurements using few components. They trade off measurement precision for lower
component cost. Besides simple monitoring, detecting tamper attempts by replacing the mesh with a macro-scale Physically
Unclonable Function (PUF) has also been researched\cite{
immlerBTREPIDBatterylessTamperresistant2018,
staatAntiTamperRadioSystemLevel2022,
vaiSecureArchitectureEmbedded2015,}, albeit this comes with complex monitoring circuits that utilize expensive,
specialty components.
\begin{figure}
\centering
\includegraphics[width=0.6\textwidth]{pic_board_setup_2_small_censored.jpg}
\caption{Measurement setup. Shown are the test specimen board on the left, and the frontend board with one of the
four pulse amplifiers in the center. The frontend board is powered through a USB-C connection, and data is sent to a
computer through a Single-Wire Debug (SWD) interface. The grid in the background has \qty{10}{\milli\meter} pitch.
Note: Author names and institutional affiliation were removed from this picture for peer review.}
\label{fig_pic_board}
\end{figure}
To enable the use of less expensive, commodity materials such as Printed Circuit Boards (PCBs) without compromising
security, mesh integrity must be monitored with high fidelity. In this paper, we present a low-cost monitoring circuit
for security meshes that combines Time Domain Reflectometry (TDR) with equivalent time sampling. Our approach provides
high measurement fidelity and enables the use of meshes made from less expensive materials in high-security
applications.
Our circuit generates a very fast pulse with a rise time lower than \qty{200}{\pico\second} that is broadcast into the
mesh. While the pulse traverses the mesh, parts of its energy are reflected on imperfections inside the mesh, including
those caused by tampering attempts. Our circuit uses a fast, low-cost equivalent time sampling frontend to receive,
amplify and record these reflections to create a \emph{fingerprint} of the mesh that is highly sensitive to changes
caused by tampering.
We demonstrate a working prototype of our design and present practical measurements of its electrical parameters as well
as its performance under several practical attack scenarios. A photo of our prototype setup including a security mesh
specimen is shown in Figure\ \ref{fig_pic_board}.
Compared to previous academic designs, our approach can be implemented at a lower cost using exclusively inexpensive,
commercially available mass-market components. Our TDR frontend improves upon previous, delay-based approaches in
monitoring fidelity\cite{vasileActiveTamperDetection2017,vasileTemperatureSensitiveActive2017}. Our design achieves
sufficient sensitivity to detect high-impedance oscilloscope probes despite such probes being specifically designed to
conduct measurements without disturbing the circuit under test. Unlike previous, capacitance-based approaches, our
design is compatible with inexpensive signal switch ICs, enabling the protection of arbitrarily large meshes at minimal
cost without compromising sensitivity.
The contributions of our work are as follows:
\begin{itemize}
\item To our knowledge, our design is the first to apply a low-cost embedded differential Time Domain Reflectometry
(TDR) frontend to security mesh monitoring. Our design achieves pulse rise times below \qty{200}{\pico\second},
a $25\times$ improvement over the closest previous
work\cite{vasileActiveTamperDetection2017,vasileTemperatureSensitiveActive2017}.
\item Our approach provides higher fidelity compared to state-of-the-art security mesh conductivity monitoring or
previous low-cost approaches. It enables the use of meshes manufactured using less advanced technologies such as
standard FPC or PCB processes. Our TDR frontend produces 70 data points for each meter of mesh length, resulting
in a measurement density per mesh area of \qty{200}{\bit\per\centi\meter^2} when using a
$\qty{200}{\micro\meter}$ pitch mesh manufactured in a standard low-cost PCB process.
\item We present a working prototype along with extensive experimental results, including laboratory performance
measurements. We practically demonstrate that our design is able to not only detect but distinguish and even
localize attacks in several realistic attack scenarios.
\item Our design is based entirely on commercially available, inexpensive mass-market components. It can be
replicated and improved without access to bespoke production equipment or semiconductor manufacturing
capabilities. To facilitate further research and practical applications, we publish our prototype under an Open
Source license.
\end{itemize}
\section{Related Work}
Tamper sensing meshes are used in numerous applications from Hardware Security Modules (HSMs) to card payment
terminals\cite{andersonCryptographicProcessorsASurvey2006,tehranipoorHardwareSecurityPrimitives2023}. Despite their
widespread use, security mesh design and monitoring is covered by a sparse research corpus. Commercially,
security-by-obscurity is often considered a good idea and little detail is published on physical security
implementations\cite{andersonSecurityEngineeringGuide2020}.
Patent literature gives a partial view of commercial developments in this area. Even in recent patents such as\cite{
brodskyTamperRespondentAssemblyFlexible2019, % IBM. ok, mentions conductivity monitoring but mostly on mesh
nortonTamperDetectingCases2019, % HP. ok, mentions continuity monitoring only but mostly on mesh
razaghiTamperDetectionSystem2020, % Square. ok. mentions what is effectively conductivity monitoring
wesselhoffTamperResponsiveSensor2020, % Cryptera. ok. Very basic, only uses the mesh in the power supply.
leekTamperDetection2021, % Texas Instruments. ok. Monitors capacitance.
chockPointSaleTerminal2009, % Zilog. ok. Monitors conductivity and tries to detect emulation.
}
from HSM manufacturers IBM and HP, ATM component manufacturer Cryptera, payment terminal manufacturer Stripe, and chip
manufacturers Texas Instruments and Zilog, cited monitoring methods are basic and do not go beyond a simple measurement
of resistance or capacitance.
Academic research in the area is more advanced and spans both improvements to security meshes and their monitoring
circuits\cite{
immlerBTREPIDBatterylessTamperresistant2018,
dupontMiniaturizedUltraLowPowerTamper2022,
vasileProtectingSecretsAdvanced2019},
as well as approaches that entirely replace the security mesh with other primitives based on e.g.\ radio frequency or
optical measurements that aim to sense tampering
with a device\cite{staatAntiTamperRadioSystemLevel2022,vaiSecureArchitectureEmbedded2015}. A drawback of techniques
aiming to replace security meshes with other sensor types is that it is difficult to prove such sensors do not have
blind spots.
\subsection{Security Mesh Monitoring and Design}
\paragraph{Meshes as capacitive PUFs.}
\textcite{
immlerBTREPIDBatterylessTamperresistant2018,
obermaierMeasurementSystemCapacitive2018,
garbTamperSensitiveDesignPUFBased}
propose one of the most advanced security mesh designs in the current academic state of the art. They use a specialized
security mesh as a Physically Unclonable Function (PUF), combining tamper sensing with cryptographic key storage. In
their design, the mesh consists of a cross-hatch pattern made from several dozen individually addressable capacitive
electrodes. They manufacture their meshes in a specialized process that results in unpredictable, random variations in
capacitance between electrodes. They propose an analog frontend that measures the precise mutual capacitance of each
pair of electrodes\cite{obermaierMeasurementSystemCapacitive2018} using an approach similar to
\textcite{satoToucheEnhancingTouch2012}, and they use the resulting capacitance matrix as the basis of their PUF. In
further work, they demonstrate a custom IC integrating the monitoring
circuit\cite{garbFORTRESSFORtifiedTamperResistant2021}.
Advantages of their system include high sensitivity to modifications, as well as that as a PUF, the system does not
require a continuous power supply. Disadvantages include the limited mesh size a single circuit can support due to
dynamic range constraints, the specialized manufacturing process needed for the mesh as well as the high cost of the
monitoring circuit. Common physical security standards require systems to actively destroy all key material when
tampering is detected\cite{
usnationalinstituteofstandardsandtechnologySecurityRequirementsCryptographic2002,
ISOIEC24759,
pcisecuritystandardscouncilPaymentCardIndustry2021}.
Like other PUF-based systems, their system naturally lacks this capability.
% FIXME go more into multiplexing larger meshes in our system below
Key differences of our system include:
\begin{itemize}
\item Our system can cover larger meshes without loss of precision using a single TDR frontend through multiplexing.
\item Our system supports meshes manufactured using standard, low-cost PCB processes.
\item Our design requires only widely available, low-cost commodity components, for each of which alternatives
from other manufacturers are available.
\item Our approach has improved resiliency to electromagnetic interference and works with unshielded meshes.
\end{itemize}
\paragraph{Bridge measurement of capacitive interdigital meshes.} \textcite{dupontMiniaturizedUltraLowPowerTamper2022}
introduce a simple analog circuit approach for monitoring meshes laid out as a set of capacitive interdigital structures
not unlike the combs found in Micro-Electromechanical System (MEMS) accelerometers and gyroscopes. They subdivide the
mesh into four equal-size quadrants, each containing two equal-size interdigital electrodes. They connect the resulting
eight electrodes in a capacitive bridge configuration and measure the bridge's balance using a simple analog monitoring
circuit based on homodyne detection. Advantages of their system include the simple, low-power monitoring circuit made
from basic, cheap components and the capability to work with single-layer meshes such as those produced using Laser
Direct Structuring (LDS). From a security point of view, a drawback of their approach is that to achieve its low-power
usage, measurement resolution is sacrificed and all information on the mesh's state is collapsed into a single, scalar
measurement.
\paragraph{Frequency-domain mesh characterization.}
\textcite{vasileProtectingSecretsAdvanced2019} introduce a monitoring method where they feed a variable-frequency signal
into one end of a continuous mesh trace, and measure the power of the signal coming out of the other end. In essence,
their setup measures $S_{12}$ magnitude in a similar way to a network analyzer.
Advantages of their design include the simple implementation and the potentially robust nature of frequency-domain
measurements. Disadvantages include a nonstandard three-layer mesh stackup, as well as the susceptibility of the system
to attack by emulation given that the log power sensor they are using at the mesh output is designed to be insensitive
to any signal characteristics apart from total signal power.
\paragraph{Time domain mesh monitoring.}
Time-Domain Reflectometry has been proposed for tamper sensing in nuclear arms control
applications\cite{parsonsTamperRadiationResistant1977}. However, compared to our design, the systems proposed in this
field are usually much larger, using standard benchtop measurement equipment to perform TDR. Additionally, they target
lower time resolution since they are designed to monitor spans of cable up to several hundred meters in length.
Closest to our proposal in the academic corpus is the work of
\textcite{vasileActiveTamperDetection2017,vasileTemperatureSensitiveActive2017}, where they propose monitoring the time
domain response of a mesh using a circuit made from a pulse generator and a fast Analog-to-Digital Converter (ADC). To
avoid an expensive, high-speed digital processing pipeline, their design is centered around a specialized high-speed ADC
that has a built-in sample memory. Using this part, they capture a pulse at high speed after it traverses the mesh.
Subsequently, they slowly process the captured data from memory.
Advantages of their design include better sensitivity to changes in total mesh trace length compared to simple
continuity monitoring and the low complexity of their analog frontend. Disadvantages include the reliance on a specialty
ADC that cannot easily be replaced with any other commercially available component and the coarse time resolution.
Key differences between their design and our proposal include:
\begin{itemize}
\item Their design is sensitive to total length, but not to the location of faults. Their design measures the mesh's
\emph{transmission} characteristic, which collapses detail about faults along the mesh into a small number of
ADC samples at the pulse edge. Using such a measurement, it is not possible to localize faults. In contrast, our
approach measures the signal's \emph{reflected} component, which spreads information over time and enables us to
localize faults.
\item Our design uses only inexpensive, widely available parts. All parts in our design can easily be substituted
for other, similar parts from different manufacturers.
\item Our approach provides $25\times$ higher time resolution through Equivalent Time Sampling. This is a
fundamental limitation of their design, as the cost of ADCs and their associated circuitry increases steeply
with speed\footnote{ For reference, the least expensive ADC available at distributor DigiKey that would match
the \qty{200}{\pico\second} time resolution of our approach would cost \price{320}{\euro} at quantity 100 and
require national security clearance for export from its manufacturer in the USA.}.
\end{itemize}
\subsection{Equivalent Time Sampling}
Today, systems that digitize high-speed signals usually use a fast ADC, sometimes preceded by one or several
downconverting mixers. This development was enabled by both the increasing availability of ADCs capable of digitizing
hundreds of megasamples per second at a reasonable resolution, and by the increase in speed of CPUs,
FPGAs, and other components of the digital processing chain. However, this is largely a development of this
millennium--meanwhile, signals far into the gigahertz range have been studied since the advent of radar technology in
the Second World War\cite{kahrs50YearsRF2003}. Enabled by the progress from vacuum tubes to semiconductor devices,
equivalent time sampling became the technology of choice for the latter half of the twentieth century until around the
turn of the millennium the introduction of high-speed digital processing and fast ADCs enabled real-time conversion up
into higher microwave frequencies, today reaching beyond the \qty{100}{\giga\hertz} boundary.
\textcite{kahrs50YearsRF2003} trace back the style of four-diode balanced bridge sampling gate that we use to a vacuum
tube implementation presented in \textcite{chanceWaveforms1949}. This style of sampling gate found application in a
number of sampling oscilloscopes throughout the twentieth century in several oscilloscope sampling frontends such as
HP's 187B\cite{HP187BDualTrace1962}.
While initially equivalent time sampling was used to circumvent technological limitations, more recently it has also
been used to achieve cost-optimized designs\cite{houtman1GHzSamplingOscilloscope2000}. Going along similar principles,
\textcite{polasekReflektometrCasoveOblasti2020} presents a design for a minimal sampling TDR circuit that uses a CMOS
clock generator IC along with a CML fanout buffer for pulse generation. The circuit improves upon the double sampling
design first presented by \textcite{houtman1GHzSamplingOscilloscope2000} to reconstruct a downsampled copy of the input
signal in the analog domain before digitization.
\subsection{Low-Cost Time Domain Reflectometry}
\textcite{bencivenniTimeDomainReflectometer2013} present an FPGA-based embedded reflectometer design. Since their design
is based on an early FPGA family dating back to 2003 that lacked the speed and the adjustable I/O delay features of more
modern FPGA families, their design uses the FPGA's logic resources to achieve adjustable delays.
\textcite{negreaSequentialSamplingTime2009} show an equivalent time sampling TDR that uses specialized adjustable delay
line ICs for pulse generation. \textcite{lee16psresolutionRandomEquivalent2003} achieve very high time resolution in an
equivalent time sampling TDR system by using a vernier approach to pulse generation, such that their system is limited
by analog bandwidth, not time resolution. \textcite{trebbelsMiniaturizedFPGABasedHighResolution2013} show another
FPGA-based TDR. Their system also uses a part from the same early FPGA family as
\textcite{bencivenniTimeDomainReflectometer2013}, and they work around its lack of precise timing primitives by
generating a low-frequency sine wave through DDS, which they filter, and then sample using a comparator - a similar
approach to the timing generation in \textcite{houtman1GHzSamplingOscilloscope2000}. Additionally, they avoid the need
for a discrete ADC by implementing a $\Delta\Sigma$ loop around a fast comparator, trading off slower acquisition time
for lower hardware complexity. They use a \qty{5.5}{\volt\per\nano\second} slew rate wideband amplifier IC to generate
their stimulus pulse, achieving a rise time of \qty{2}{\nano\second}. As a result, similar to
\textcite{lee16psresolutionRandomEquivalent2003}, their design is limited by analog bandwidth--here resulting from the
nanosecond-scale stimulus rise time--not by frontend time resolution. Compared with this and other previous approaches,
our proposed system is not only faster, but presents a more balanced trade-off between time resolution and analog
bandwidth.
\section{Monitoring a Security Mesh using Time Domain Reflectometry}
Time Domain Reflectometry (TDR) is a well-known technique that is used to locate faults along a signal channel such as a
copper cable, or an optical fiber. In TDR, a pulse is sent into the beginning of the channel. While the pulse traverses
the channel, any fault such as a discontinuity in electrical impedance or optical density causes part of the pulse to
travel back in a partial reflection. TDR monitors these reflections returning to the beginning of the channel by
recording the signal measured at it after the pulse has been sent. When the pulse reaches the end of the channel,
depending on termination it can be reflected to travel back to the beginning, which allows measurement of the channel's
length.
\subsection{Attacks on a Security Mesh Viewed Using TDR}
In this paper, we apply TDR to monitor a security mesh for changes caused by an attack. Our prototype setup consists of
a custom circuit board containing a low-cost embedded TDR frontend that can be connected to a security mesh specimen to
measure its response, creating a fingerprint of the mesh. In a standard PCB manufacturing process, we construct a
security mesh with a ground plane underneath that works similarly to previous work\cite{
immlerBTREPIDBatterylessTamperresistant2018,
obermaierMeasurementSystemCapacitive2018,
garbTamperSensitiveDesignPUFBased}.
When viewed in the microwave domain, such meshes constitute what is essentially a delay line. Security meshes commonly
use a pair of two traces to capture short circuit conditions between adjacent traces, which we treat as a differential
pair for improved resiliency against electromagnetic interference. We constructed our frontend such that it excites the
two traces differentially, but allows for both single-ended and differential measurements.
In an intact mesh, we expect our frontend to record no significant reflections until the stimulus pulse has traversed
the mesh's traces both ways, at which point we expect a large response whose polarity and amplitude depend on the
termination on the far end of the mesh. In our prototype circuit, we made this termination configurable to expand the
range of possible measurement configurations and to enable self-calibration of the circuit.
When an attacker attempts to tamper with the mesh, they will cause an impedance discontinuity. Cuts of one or both
traces or a short circuit between both traces will result in a total reflection of the incident pulse at the location
of the fault, which our circuit will easily detect as the delay of the response changes. However, beyond these simple
cases, our approach can also detect more subtle changes. For instance, a short circuit between two points along the same
mesh trace will also result in a change in delay along this trace. Furthermore, even just probing a mesh trace with an
oscilloscope probe will add the probe's input capacitance, which is usually in the order of several Picofarad, to one
point along the trace, resulting in an impedance step that can be detected by TDR. The TDR approach is thus able to not
only detect but distinguish and even localize several types of faults or attacks in a mesh.
% FIXME subsection on routing and daisychaining
\section{Circuit Design and Driving Approach}
\begin{figure}
\centering
\hspace*{-7mm}
\includegraphics[height=80mm]{block_diagram.pdf}
\caption{Block diagram of our prototype sampling TDR security mesh monitoring circuit.}
\label{fig_block_diagram}
\end{figure}
A TDR can be broken down into three basic components. First, we need a source of fast pulses (or fast edges!) to
stimulate the mesh. Second, we need a coupler that allows us to couple the stimulus pulses into the mesh, and their
reflections out of it. Finally, we need a fast ADC to capture the reflections.
Figure\ \ref{fig_block_diagram} shows a block diagram of our design\footnote{Full schematics are available in this
paper's supplementary material.}. At the core of our design lies an equivalent time sampling setup, where two
diode bridge sampling gates alternately sample the two traces of the mesh.
Since physical attacks happen on a time scale of minutes or hours, we do not need a fast acquisition rate. Equivalent
time sampling uses fast sampling gates to sample a high-frequency signal at a low frequency that is suitable for direct
conversion through an ADC. This reduces the requirements of our data acquisition and signal processing fronted from
gigasamples per second to mere megasamples, well within the range that a commodity microcontroller can handle.
A challenge in equivalent time sampling is precisely phase-synchronizing the sampling pulse to the fundamental frequency
of the input signal, which is usually implemented by using a high-speed comparator. In a TDR-style frontend like ours,
this expensive component can be avoided because the stimulus signal is generated in the frontend, simplifying the
challenge of generating a synchronized sampling pulse at an adjustable phase to the stimulus pulse.
Since an intact mesh has low insertion loss, the amplitude of the response of an intact mesh is large. Thus, we do not
need a high dynamic range in either the frontend amplifiers or in the ADC, enabling the use of commodity operational
amplifiers (opamps) and the built-in ADC of a commodity microcontroller. Further, the strong signal allows us to use a
comparatively lossy \qty{-6}{\deci\bel} resistive tee instead of a directional coupler. A resistive tee does not provide
directionality, but in our case, the incident pulse can never interfere with reflections at the sampling output of the
divider because of causality.
To implement our sub-nanosecond sampler, we chose a simple four-diode bridge sampling gate made from commodity
\partno{BAT17-04W} RF Schottky diodes, which offer turn-on times better than \qty{100}{\pico\second} at
\price{0.13}{\euro} per device at quantity 1000. The four-diode configuration requires only two dual diode packages. In
contrast to \textcite{polasekReflektometrCasoveOblasti2020,houtman1GHzSamplingOscilloscope2000}, in our system, double
sampling is not necessary - instead, we follow the sampling gate directly with an amplifier feeding into the internal
ADC of our microcontroller. We use an internal timer peripheral of the same microcontroller to generate both stimulus
and sample pulses such that we can easily phase-lock the internal ADC to the same timer.
We base our circuit around an \partno{STM32G474RB} microcontroller, a \price{5}{\euro}-class commodity ARM
microcontroller. Besides adequate processing speed for its price class, this microcontroller offers two features that
are critical to our design. First, its internal ADCs are both higher resolution and faster than those of older parts.
Second, it is one of a few parts in its series that include a \emph{high-resolution timer} (\partno{HRTIM}) peripheral
that provides several outputs that can be controlled with better than \qty{200}{\pico\second} resolution through
per-output, self-calibrating delay line circuitry. We use this peripheral to produce both the stimulus pulse and the
phase-adjustable sampling pulse.
While the HRTIM peripheral allows us to finely adjust the phase of its output waveform, the digital output structures of
the \partno{STM32G4} series are still limited to nanosecond-scale rise and fall times with the datasheet quoting
$t_r=t_f=\qty{1.7}{\nano\second}$ into a \qty{10}{\pico\farad} load when using the fastest GPIO output drive strength
setting and a \qty{3.3}{\volt} supply\cite{stmicroelectronicsSTM32G474xBDatasheet2021}. We work around this issue by
applying two circuit tricks. First, we send its output through a fast amplifier to square up the edges to a rise time
better than \qty{500}{\pico\second}. The remaining challenge is that while we now have pulses with crisp edges, due to
constraints of the HRTIM peripheral, at more than \qty{10}{\nano\second}, these pulses are still too wide to be useful.
We solve this issue by applying a clip line\cite{tektronixinc.TektronixS6Sampling1982} pulse forming network at the
output of the amplifier--i.e.\ we connect the amplifier's output to the load in parallel with a short, terminated
transmission line stub. The length of this stub determines the pulse width.
\subsection{Driver Selection}
Several types of amplifiers can be used in our pulse shaping application. Common to all options, we require differential
outputs. In practice, for most parts, this means we are looking for a part with Current Mode Logic (CML) outputs. CML is
a differential signaling standard that is widely used in high-speed logic. In CML, a current source feeds a pair of
transistors that steer current between the two outputs of the differential pair. By steering current between the two
outputs, common-mode currents are minimized which both reduces the effect of power supply impedance at the transmitter
and reduces electromagnetic emissions from the differential pair's PCB traces. In our experiments, we considered several
parts and settled on four parts for evaluation in this paper: A \partno{74LVC2G157} standard logic IC, two display
protocol redrivers, \partno{PI3HDX12211} and \partno{TDP0604}, as well as \partno{MAX3748}, a limiting amplifier for
optical networking applications. We implemented four variants of our prototype using a steady hand under a microscope as
shown in Figure\ \ref{fig_pic_amps}.
One notable omission from our tests was the series of CML-output comparators made by Analog Devices due to the cost of
these devices.
\begin{figure}
\centering
\begin{subfigure}{0.23\textwidth}
\centering
\includegraphics[width=0.9\textwidth]{pic_74lvc_small.jpg}
\caption{74LVC2G157}
\end{subfigure}
\begin{subfigure}{0.23\textwidth}
\centering
\includegraphics[width=0.9\textwidth]{pic_max3748_small.jpg}
\caption{MAX3748}
\end{subfigure}
\begin{subfigure}{0.23\textwidth}
\centering
\includegraphics[width=0.9\textwidth]{pic_tdp0604_small.jpg}
\caption{TDP0604}
\end{subfigure}
\begin{subfigure}{0.23\textwidth}
\centering
\includegraphics[width=0.9\textwidth]{pic_pi3hdx_small.jpg}
\caption{PI3HDX12211}
\end{subfigure}
\caption{Circuit-board implementation of the four pulse amplifier variants of the design. Amplifiers were mounted
dead bug style on a piece of copper tape connected to one of the supply rails and hooked up with
\qty{120}{\micro\meter} diameter wire according to their respective datasheets. Supply rails were hooked up using
copper tape where possible to reduce series impedance. Additional \qty{10}{\micro\farad} MLCC power supply
decoupling capacitors were placed close to the ICs on the copper tape to reduce loop area.}
\label{fig_pic_amps}
\end{figure}
\paragraph{Standard logic ICs.}
As a baseline, we evaluated the \partno{74LVC2G157} standard logic IC. This IC contains a single multiplexer, however,
we are not interested in the multiplexer functionality. The interesting trivia about this chip is that it also is one of
the only \partno{74} series standard logic parts that have complimentary outputs. According to manufacturer
specifications, at a comparable \qty{20}{\pico\farad} load, \partno{74LVC} series parts have slightly faster rise and
fall times compared to our \partno{STM32} microcontroller's digital IO
pins\cite{renesaselectronicscorporationApplicationNoteAN2242019}.
\paragraph{Optical Networking Chipsets.}
A category of CML-output drivers suitable for our application is a class of optical networking chipset ICs. While
today, the construction of optical transmitters has moved to direct bonding of optical components and driver ICs to
minimize parasitics, discrete driver ICs for some chipsets from the mid-2000s era are still available at reasonable
cost. Both the laser driver used to drive the transmitter laser diode, and the limiting amplifier used to amplify the
receiver photodiode's output can be used in our application, with the limiting amplifier part requiring less additional
circuitry in our application due to its lack of output bias control. In our evaluation below, we include the
\partno{MAX3748} limiting amplifier as a representative part from this category that is still commercially available. A
drawback of relying on a part like this is that its future availability is uncertain given the evolution of the
industry.
\paragraph{Bus Redrivers.}
The final category of amplifiers suitable for our pulse shaping needs is redrivers intended for high-speed data
interfaces such as USB 3, PCI Express, HDMI, or DisplayPort. All of these interfaces use CML drivers, with differential
voltage levels usually in the order of \qtyrange{600}{1000}{\milli\volt}. \emph{Redriver} ICs are intended to be used to
amplify the sensitive high-speed bus signal at the edge of a PCBA, either before it leaves the board through a connector
to ensure adequate signal levels at the connector, or after it enters through a connector to compensate for loss in the
PCB traces between the connector and the signal's destination. For our application, redrivers intended for HDMI and
DisplayPort applications are most suitable, as they can usually be configured to act as simple amplifiers without
processing any protocol logic on the signals that are amplified. In contrast, both USB 3 and PCIe redrivers often
implement power saving features that try to parse parts of the actual signal transmitted through them, which are hard to
bypass in our application.
Redrivers can be classified according to their way of operation. \emph{Retimers} include a full
serialization/deserialization (SerDes) setup and parse the low-level protocol of the bus to reconstruct bit-level
timing. We focus only on simpler redrivers that only contain amplifiers and (analog) equalizers here.
Amplifying redrivers can be separated into two classes: Limiting and linear redrivers. A limiting redriver is configured
to have a high gain such that a small input signal will be amplified to the full output voltage swing. Limiting
redrivers are well-suited for our application, but they have come out of fashion since they interfere with link training
and with power saving features of protocols like USB 3.
Linear redrivers are constructed with a low gain instead. Sufficient to compensate for wiring losses, their gain is low
enough to leave them transparent to bus protocol features such as link training or power saving features. To compensate
for their reduced gain, linear redrivers usually contain configurable equalizers that can be used to apply targeted
enhancements for particular signal defects, such as boosting high-frequency gain or providing a set amount of overshoot.
Where available, in our prototype variants we set these equalization features to provide maximum gain.
In our evaluation below, we include \partno{PI3HDX12211} as a linear redriver intended for DisplayPort and HDMI
applications, as well as \partno{TPD0604} as a ``hybrid'' linear or limiting redriver for HDMI applications, configured
for limiting mode in our experiments. An attractive feature of both of these chips as well as comparable devices is that
they usually include at least four independent channels, so only one chip is needed for both pulse paths. Additionally,
they are consumer mass market parts, resulting in a low price. For instance, \partno{PI3HDX12211} is available at
\price{2.11}{\euro} in single quantity and less than \price{1.30}{\euro} at a quantity of several hundred at distributor
LCSC, and \partno{TPD0604} is available at \price{4.72}{\euro} and \price{3.44}{\euro}, respectively, at distributor
Mouser.
\subsection{Cost Breakdown}
Table\ \ref{tab_bom} shows a breakdown of the cost of the main components of our prototype, resulting in a total
component cost of less than \price{10}{\euro}. We did not include power supply components in this breakdown as our
circuit is meant to be embedded into a payload circuit that will already have sufficient power supplies.
Due to its \partno{HRTIM} peripheral, the \partno{STM32G4} microcontroller is the component of our design that is
hardest to replace. However, this part can still be replaced with a wide range of FPGAs, which commonly include
digitally configurable delay lines on their IO pins for signal de-skewing. For instance, the \partno{ODELAY} primitive
of Xilinx 7 Series FPGAs provides the same $\frac{1}{32}$ clock cycle resolution that the \partno{STM32G4}
\partno{HRTIM} peripheral provides while supporting higher input clock frequencies.
\begin{table}
\centering
\begin{tabular}{c|c|c|l}
\textbf{Part number}&\textbf{Amount}&\textbf{Cost in \euro}&\textbf{Description}\\\hline
PI3HDX12211&1&1.37&Pulse amplifier\\
STM32G474RB&1&3.51&Main microcontroller\\
OPA1656&1&1.25&Sampling post-amplifier\\
TMUXHS4212&2&0.64&Signal routing switch\\
SKYA21003&2&0.49&Termination switch\\
74LVC2G157&2&0.15&Pulse pre-conditioning\\
BAT17-04W&4&0.12&Sampling gates\\
&25&0.01&Various MLCC capacitors\\
&25&0.01&Various resistors\\\hline
\multicolumn{2}{r}{}&\textbf{9.67}&\textbf{Total}
\end{tabular}
\caption{A cost breakdown of the major components of our design. Listed prices are for 1000 pieces order quantity to
make prices more comparable between distributors. The number of switches necessary for signal routing and
termination depends on the specific mesh signal routing of the application. Numbers shown here are for our
prototype, which can measure a mesh from both ends and supports short, open and matched termination.}
\label{tab_bom}
\end{table}
\subsection{Measurement Principle and Scan Scheduling}
\label{sec_scan_schedule}
The goal of a time domain reflectometer is to send a pulse into the Device Under Test (DUT)--i.e.\ in our application,
the mesh--and to record all reflections returning from the DUT afterwards. In something like a security mesh whose
traces might only be a few meters long in total, the time span between the pulse being sent and the last reflections
from the very end of the mesh arriving is in the order of several tens of nanoseconds. Directly recording a response at
this timescale would be infeasible using a commodity microcontroller, so we utilize an equivalent time sampling
approach.
As shown in Figure\ \ref{fig_block_diagram}, our analog frontend contains amplifiers that produce the stimulus pulse, a
sampling gate with amplifiers, and a coupler that couples the pulse into the mesh and couples the reflections back into
the sampling gate. A microcontroller controls this frontend with two primary signals: A stimulus pulse, and a sampling
pulse. By adjusting the timing between these two pulses every time a stimulus pulse is sent, the microcontroller can
select a particular point in time after the stimulus pulse to record using the sampling gate. By slowly sweeping across
the whole time span, the microcontroller can reconstruct the waveform of the reflected signal at the sampling gate
across one period of the stimulus pulse. The recording rate of this waveform is limited by the repetition rate of the
stimulus pulse as well as the time step size.
The attainable repetition rate of our stimulus and sampling circuits is limited by two main components. First, the
sampling post-amplifier's bandwidth limits the maximum sample rate. In our design, we chose an \partno{OPA1656}
\qty{50}{\mega\hertz} Gain-Bandwidth Product (GBP) FET input low noise operational amplifier. We need a FET input part
to avoid loading the sampling gate. The comparatively high GBP and the low noise input stage of this device allow us to
amplify small signals that could result from weak reflections in small impedance discontinuities inside the mesh.
The second major factor limiting repetition rate is the microcontroller's ADC speed, as well as the speed of the
software processing the ADC's output. At full \qty{12}{b} resolution, this corresponds to a sampling rate of
approximately \qty{4}{MSps}. The microcontroller contains five ADCs, which can be interleaved to achieve higher rates.
Combining these factors, we conservatively decided on a sampling rate of \qty{1}{MSps} across both channels of the
differential pair. At this sampling rate, it is feasible to control the sample timing on a sample-by-sample basis. For
all measurements in this paper, we use a sequential sampling approach where the microcontroller takes a series of
measurements for oversampling at a particular delay, and then increases the delay by one \partno{HRTIM} output clock
interval.
In our prototype, one sweep of a \qty{188}{\nano\second} time span consisting of $1024$ data points took
\qty{710}{\milli\second} at $256\times$ oversampling and \qty{1.1}{\second} at $384\times$ oversampling. The time span
corresponds to \qty{28}{\meter} of mesh length, which at a \qty{200}{\micro\meter} pitch corresponds to a mesh area of
\qty{113}{\centi\meter\squared} and at a \qty{1}{\milli\meter} pitch corresponds to
\qty{565}{\centi\meter\squared}. Using the same microcontroller, by optimizing timing, moving oversampling processing
out of the interrupt handler, and by interleaving four of the microcontroller's five ADC peripherals, the lower limit of
acquisition time of a $1024$-point scan is \qty{33}{\milli\second} for $256\times$ oversampling and
\qty{49}{\milli\second} for $384\times$ oversampling.
While for our development, sequential scanning is adequate, in a future practical application, two simple optimizations
would decrease the time to detection for an attack. First, in a practical application, the range of scanned delays
should be adjusted to the length of the particular security mesh in use. For this paper, we always
scanned a time range of $1024$ points at \qty{184}{\pico\second} spacing starting before one stimulus pulse and ending
shortly before the next stimulus pulse so that any waveform artifacts will be visible. In a practical application, there
would be little information gained by sampling much beyond the edges of the expected mesh response, so the scan window
should be kept small to increase scan rate.
Secondly, in a practical application, the feature that is most relevant to detect tamper attempts is the trailing edge
of the mesh's response. This trailing edge corresponds to the return of the stimulus pulse's reflection at the far end
of the mesh. Any attack that affects the impedance even only of part of the mesh has a high chance of affecting its
delay, and thus this trailing edge is likely to move. In a practical application, it would thus be efficient to use a
heuristic scan schedule instead of the sequential scan we are using in our research prototype. Such a heuristic schedule
would sample delays near the expected trailing edge of the particular mesh in use more frequently compared to delays
that lie somewhere else, such as in the middle of the mesh's return window.
\section{Experimental Evaluation}
To validate our design, we performed a two-fold evaluation. First, we measured the performance of our sampling circuit
as a time-domain reflectometer. The most relevant figure to our mesh monitoring application is the pulse generators'
rise time, which determines the frontend's bandwidth and consequently the level of detail that we are able to extract
from a connected mesh during one scan. Since we aim at fingerprinting a connected mesh, not at performing absolute
measurements, we do not need to characterize or de-embed the transfer function of our TDR frontend.
Second, we characterized the end-to-end performance of our design on a mesh test specimen, and we evaluated its
performance on several realistic tamper attempts. As a baseline characterization, in Section\ \ref{sec_attack_short} we
will show measurements of both short and open mesh traces, allowing us to evaluate our designs' capacity to spatially
localize faults. Building upon this baseline, in Section\ \ref{sec_attack_probe} we will then demonstrate a probing
attack, in which we measured our design's response to a standard \qty{100}{\mega\hertz} bandwidth
$\qty{10}{\mega\ohm}||\qty{10}{\pico\farad}$ oscilloscope probe. Compared to the baseline open/short test, this provides
a greater challenge due to the probe's intentionally high impedance and minimal capacitive loading. Concluding our
attack tests, in Section\ \ref{sec_attack_bridge} we demonstrate a bridging attack that attempts to repair a break
created in the mesh through drilling.
\subsection{Rise Time Measurement}
We measured two figures of merit to characterize frontend speed. First, as shown in Section\ \ref{sec_spec_risetime}
below, we measured pulse rise time at the mesh interface using a Keysight N9020A MXA \qty{26.5}{\giga\hertz} signal
analyzer to evaluate the rise time of our pulse generator. This figure indicates the raw performance of our pulse
generator. Second, we used our circuit to perform a TDR measurement of a mesh test specimen and measured the rise time
of the sampling pulse as seen by the circuit itself. This figure indicates the actual measurement performance of our
circuit. In general, this rise time is different from the raw pulse rise time because of the non-linear characteristic
of the sampling Schottky pairs. Depending on the IC, our pules generator produces output waveforms with
\qtyrange{470}{3200}{\milli\volt} differential voltage swing. Since the sampling diode pairs start to conduct at a
combined forward voltage of approximately \qty{300}{\milli\volt}, they will transition from high impedance to low
impedance during a corresponding \qty{300}{\milli\volt} window at the middle of the strobe pulse's edge. Thus, even if
the strobe pulse shows a low-pass response with rounding at both ends, as long as its slew rate
$\frac{\mathrm{d}V}{\mathrm{d}t}$ during the zero crossing is fast enough, the pulse will still result in a sharp
turn-on knee of the sampling diodes.
\subsubsection{Stimulus Pulse Rise Time at the Mesh}
\label{sec_spec_risetime}
\begin{figure}
\begin{center}
\begin{subfigure}{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{fig_spec_risetime_74lvc.pdf}
\caption{74LVC2G157}
\label{fig_spec_risetime_74lvc}
\end{subfigure}
\unskip\begin{subfigure}{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{fig_spec_risetime_max3748.pdf}
\caption{MAX3748}
\label{fig_spec_risetime_max3748}
\end{subfigure}
\begin{subfigure}{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{fig_spec_risetime_tdp0604.pdf}
\caption{TDP0604}
\label{fig_spec_risetime_tdp0604}
\end{subfigure}
\unskip\begin{subfigure}{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{fig_spec_risetime_pi3hdx.pdf}
\caption{PI3HDX12211}
\label{fig_spec_risetime_pi3hdx}
\end{subfigure}
\end{center}
\caption{Spectrum measurements and re-constructed time domain pulse edge shape of the stimulus pulse measured at the
mesh interface for each of the four driver ICs. Amplitudes were normalized for rise time plots. The $\frac{1}{f}$
curve in the spectrum plots shows the peak amplitude of the frequency components of an ideal infinite-bandwidth
square wave. The horizontal gray lines in the time domain plots show thresholds used for rise time calculation.}
\label{fig_spec_risetime}
\end{figure}
To measure the rise time of our frontend's pulse generator, we measured the stimulus output at the mesh interface using
a Keysight N9020A MXA \qty{26.5}{\giga\hertz} signal analyzer\footnote{The spectrum analyzer used significantly exceeded
the capabilities of the fastest oscilloscopes we had access to, so it was the more appropriate choice of measurement
instrument.}. All measurements were taken with the prototype's mesh interface connected to the spectrum analyzer through
a bias tee configured for DC blocking followed by a \qty{20}{\deci\bel} attenuator for protection. Since both stimulus
and sampling pulses are generated using identical circuits, we can transfer those results to the sampling pulse modulo
amplifier output loading effects.
Figure\ \ref{fig_spec_risetime} and Table\ \ref{tab_edge_risetime} show the resulting measurements. For ease of
interpretation, we projected the measurements from the frequency domain (upper traces) back into the time domain (lower
traces), and extracted rise time measurements from those traces. Our measurements show that, as expected, the bare
\partno{74LVC}-series logic gate has the slowest rise time at approximately \qty{500}{\pico\second}. All three amplifier
variants we implemented showed significantly improved rise time, with the \partno{PI4HDX12211} achieving below
\qty{200}{\pico\second}, and the other two showing around \qty{120}{\pico\second}. A noteworthy detail is that
\partno{MAX3748} and \partno{TDP0604} only achieved a low output signal amplitude, which stems from a combination of
them having low output amplitude by design and of our circuit loading their outputs heavily. Since their amplitude is
only marginally within the knee region of the RF Schottky diodes used in the sampling bridges, in these variants,
the sampling gates end up slower than the raw pulse rise time value alone would suggest.
\subsubsection{Self-Characterization}
\begin{figure}
\begin{center}
\includegraphics[width=\textwidth]{fig_edge_risetime.pdf}
\end{center}
\caption{One edge of the stimulus pulse with no mesh connected measured by the board itself, using different
amplifier ICs. For each IC, ten traces are shown. The vertical scale is in Volts at the sampling amplifier output.}
\label{fig_edge_risetime}
\end{figure}
\begin{table}
\begin{center}
\begin{tabular}{r|cccc}
\textbf{IC}
&\partno{74LVC2G157}
&\partno{MAX3748}
&\partno{TDP0604}
&\partno{PI3HDX12211}\\\hline
\textbf{$t_r$ (Self-Characterization)}&
\qty{916}{\pico\second}&
\qty{743}{\pico\second}&
\qty{333}{\pico\second}&
\qty{264}{\pico\second}\\
\textbf{$t_r$ (Stimulus at Mesh)}&
\qty{573}{\pico\second}&
\qty{125}{\pico\second}&
\qty{119}{\pico\second}&
\qty{191}{\pico\second}\\
\textbf{Stimulus Pulse $V_{pp}$}&
\qty{1600}{\milli\volt}&
\qty{236}{\milli\volt}&
\qty{254}{\milli\volt}&
\qty{430}{\milli\volt}\\
\textbf{Effective Slew Rate}&
\qty{2.79}{\volt\per\nano\second}&
\qty{1.89}{\volt\per\nano\second}&
\qty{2.13}{\volt\per\nano\second}&
\qty{2.25}{\volt\per\nano\second}
\end{tabular}
\end{center}
\caption{Single-ended stimulus edge rise times for different amplifier ICs. The single-ended rise times of both
positive and negative half of the differential pair have been averaged. External measurements are from Figure\
\ref{fig_spec_risetime}, measuring the stimulus pulse at the mesh interface. $V_{pp}$ measurements are taken at the
mesh interface. Effective slew rates are calculated from the external measurements and pulse $V{pp}$.}
\label{tab_edge_risetime}
\end{table}
Figure\ \ref{fig_edge_risetime} shows the result of our self-characterization experiments, where we used the frontend to
measure its own pulse shape. These results correspond to the actual rise time we can expect in practical measurements.
In these experiments, we ran a measurement using $256\times$ oversampling at \qty{12}{b} ADC resolution. The plots show
voltage at the amplifier output voltage against time in \unit{\nano\second}. The absolute value of the amplifier output
voltage is not relevant here - only the rise time is. Since we use some of these amplifiers--particularly the redriver
ICs--well outside of their intended application, the actual voltage they develop across the nonlinear load that our
sampling gate's diode bridge presents depends on implementation details of the amplifier's CML output stage. To maximize
ADC resolution and minimize ringing, we tuned gain and bandwidth of each post-sampling amplifier for each IC. Ringing in
the amplifier output leads to jitter in the ADC's sampling period to directly feeding through to the ADC output value.
Since in \partno{STM32} MCUs, the ADC is clocked independently of the rest of the system, its sampling timing is poorly
controlled and this jitter causes a significant error unless the amplifier is well-compensated. The key figure for us is
how fast our sampling gate turns on, not how hard, so we can largely ignore the units on the graph's vertical scale.
Table\ \ref{tab_edge_risetime} shows rise times calculated from each trace, averaged across both traces of the
differential pair. From these results and from the graphs in Figure\ \ref{fig_edge_risetime} we can see that in the
optical networking limiting amplifier produces slower edges than the measurements from Figure\ \ref{fig_spec_risetime}
would suggest. We suspect that this is caused by its low output amplitude resulting in part from its specifications and
in part from a poor match between its CML output structure and the nonlinear impedance presented by the sampling diode
bridges. Surprisingly, even the \partno{74LVC2G157} baseline unit has a rise time of less than \qty{1}{\nano\second}. We
estimate that this is caused by the large output voltage swing of this part, going from ground to its $V_{CC}$ at
\qty{3.3}{\volt}. Due to the construction of our sampling gate, its switching happens in the short period between its
input differential voltage crossing zero and it rising above the combined forward voltage of the Schottky diodes. Thus,
while the \partno{74LVC} might produce slow edges overall, its large output swing results in a high slew rate in the
critical region around the zero crossing that mostly determines the speed of the sampling gates.
We observed the best result overall with the \partno{PI3HDX12211} redriver, resulting in a rise time of
\qty{264}{\pico\second}. In this test specimen, we fed the pulse through the amplifier twice since we had two unused
channels, and we used \qty{200}{\pico\second} clip lines on the amplifier's output for pulse shaping. We could only use
the clip lines in this specimen as in all other specimens, the amplifiers' output did not contain sufficient harmonic
content such that it was still able to turn on the sampling gate's diode bridge when used with the clip line.
\subsection{Mesh Specimen Characterization}
\begin{table}
\begin{center}
\begin{tabular}{r|cccc}
\textbf{Specimen}
&1
&2
&3
&4\\\hline
\textbf{Size}&
$35\times\qty{70}{\milli\meter}$&
$35\times\qty{70}{\milli\meter}$&
$35\times\qty{70}{\milli\meter}$&
$35\times\qty{70}{\milli\meter}$\\
\textbf{Area}&
$\qty{24.5}{\centi\meter^2}$&
$\qty{24.5}{\centi\meter^2}$&
$\qty{24.5}{\centi\meter^2}$&
$\qty{24.5}{\centi\meter^2}$\\\hline
\textbf{Trace width}&
\qty{150}{\micro\meter}&
\qty{200}{\micro\meter}&
\qty{300}{\micro\meter}&
\qty{500}{\micro\meter}\\
\textbf{Trace spacing}&
\qty{150}{\micro\meter}&
\qty{200}{\micro\meter}&
\qty{300}{\micro\meter}&
\qty{500}{\micro\meter}\\
\textbf{Trace pitch}&
\qty{300}{\micro\meter}&
\qty{400}{\micro\meter}&
\qty{600}{\micro\meter}&
\qty{1.00}{\milli\meter}\\\hline
\textbf{Trace length}&
\qty{1.07}{\meter}&
\qty{1.93}{\meter}&
\qty{2.86}{\meter}&
\qty{3.86}{\meter}\\
\textbf{Approximate Delay}&
\qty{7.1}{\nano\second}&
\qty{13}{\nano\second}&
\qty{19}{\nano\second}&
\qty{26}{\nano\second}\\
\end{tabular}
\end{center}
\caption{Specifications of mesh test specimens used in the experiments in this paper. All four specimens were placed
on a single, four-layer, \qty{1.0}{\milli\meter} thickness PCB. The meshes were placed two per side on the outer
layers, and the inner layers were used as ground. Approximate signal delays were calculated using wave velocity
$v=\frac{c}{\sqrt{\epsilon_r}}\approx\frac{c}{2}$\cite{wheelerTransmissionLinePropertiesParallel1965} assuming
$\epsilon_r\approx 4$\cite{mumbyDielectricPropertiesFR41989} for the test specimens' \partno{FR-4} substrate.}
\label{tab_mesh_spec}
\end{table}
To measure the practical performance of our prototype, we created a set of security mesh test specimens. Four specimens
each cover the same area using four different mesh pitches using two, looped mesh traces according to the design
specifications listed in Table\ \ref{tab_mesh_spec}. The four specimens have a trace length ratio of approximately
$1:2:3:4$. As a baseline validation of our prototype as well as the mesh design, we performed TDR measurements of each
mesh specimen using each amplifier variant of our prototype. Figure\ \ref{fig_mesh_length} shows the results of these
measurements. The graphs show the step response resulting from an edge entering the mesh, and its reflection arriving
back at the start after traversing the mesh back and forth.
We validated the results from Figure\ \ref{fig_mesh_length} by calculating speed of light in our mesh specimen's
substrate based on them. The resulting measurements are shown in Table\ \ref{tab_speed_of_light}. All amplifier
configurations yield comparable measurements of approximately \qty{1.6}{\meter\per\second}, which corresponds well with
the expected signal propagation velocity in \partno{FR-4} PCB material of
\qty{1.5d8}{\meter\per\second}\cite{wheelerTransmissionLinePropertiesParallel1965,mumbyDielectricPropertiesFR41989}.
An interesting aspect of the graphs in Figure\ \ref{fig_mesh_length} is that all except the \partno{74LVC} graph show a
dispersion effect increasingly rounding out the trailing edge of the response with longer mesh lengths. We suspect this
effect stems from higher-frequency components coupling into adjacent trace segments further up or down the mesh more
easily, spreading high-frequency components of the response signal out throughout time and effectively creating a
low-pass response. We suspect the poor visibility of this effect in the \partno{74LVC} measurements is a result of this
variant's pulse amplifier output amplitude being very large, allowing reflected response components to forward-bias the
sampling gate's diode bridges, resulting in amplitude clipping.
From this dispersion effect follows a key point for the design of practical security meshes: To increase the temporal
resolution of TDR mesh monitoring, meshes should be broken up into relatively short segments that are multiplexed
through signal switching. Where this is not desirable, the mesh can be treated as a microwave circuit design that can be
optimized through the electronic CAD/electromagnetic simulation co-design approach used for such circuits.
\begin{figure}
\begin{center}
\includegraphics[width=\textwidth]{fig_mesh_length.pdf}
\end{center}
\caption{TDR responses captured using our design with each of four candidate pulse amplifier ICs and four mesh test
specimens. The shown time range covers the primary reflection of the stimulus pulse's falling edge. The vertical
scale of all four graphs is in Volts at the ADC. For clarity, only one channel of the response is shown.}
\label{fig_mesh_length}
\end{figure}
\begin{table}
\begin{center}
\begin{tabular}{r|cccc|c}
&\multicolumn{4}{c|}{Specimen}&\\
Pulse amplifier IC&
1&
2&
3&
4&
Calculated speed of light $c$
\\\hline
\partno{PI3HDX12211}&
\qty{16.9}{\nano\second}&
\qty{26.0}{\nano\second}&
\qty{36.4}{\nano\second}&
\qty{46.1}{\nano\second}&
$\qty{1.59d8}{\meter\per\second}$\\
\partno{74LVC2G157}&
\qty{17.1}{\nano\second}&
\qty{26.4}{\nano\second}&
\qty{36.6}{\nano\second}&
\qty{48.2}{\nano\second}&
$\qty{1.55d8}{\meter\per\second}$\\
\partno{MAX3748}&
\qty{17.2}{\nano\second}&
\qty{26.4}{\nano\second}&
\qty{36.6}{\nano\second}&
\qty{45.6}{\nano\second}&
$\qty{1.59d8}{\meter\per\second}$\\
\partno{TDP0604}&
\qty{17.0}{\nano\second}&
\qty{26.2}{\nano\second}&
\qty{36.5}{\nano\second}&
\qty{45.8}{\nano\second}&
$\qty{1.59d8}{\meter\per\second}$\\
\end{tabular}
\end{center}
\caption{Speed of light and time offset calculated from delays read from the graphs in Figure\
\ref{fig_mesh_length}. $c$ is the speed of light determined by linear fit.}
\label{tab_speed_of_light}
\end{table}
\subsection{Tamper tests}
After validating our prototype's electrical performance as well as our mesh specimen designs in the previous sections,
we performed a series of experiments where we performed tampering attempts on a mesh specimen while monitoring it using
our TDR prototype, capturing responses both before and after tampering. We performed two sets of experiments.
\subsubsection{Short and Open Circuits}
\label{sec_attack_short}
\begin{figure}
\begin{center}
\includegraphics[width=\textwidth]{fig_manip_shape.pdf}
\end{center}
\caption{TDR responses captured using our design under three short- and one open-circuit scenario. The distance from
mesh start to Location 1, 2, and 3 is \qty{558}{\milli\meter}, \qty{125}{\milli\meter} and \qty{850}{\milli\meter},
respectively. The cut is approximately halfway through the mesh. Left and right plots show the positive and negative
trace of the differential pair, respectively. Black traces show baseline measurements in between attacks. The
baselines show vertical offsets due to temperature drift, which causes a small DC offset in our design. The vertical
scale is in Volts at the ADC.}
\label{fig_manip_shape}
\end{figure}
In our first experiment, we tested both short and open-circuit conditions. We tested a short circuit between the two
mesh traces in three locations as well as a cut trace halfway through the mesh. Figure\ \ref{fig_pic_specimens} in
Appendix\ \ref{appendix_photos} shows photos of our test specimen. Figure\ \ref{fig_manip_shape} shows the result of our
experiment. The graphs show a clear response of our monitoring circuit to all four tampering scenarios. Short and open
circuit conditions can clearly be distinguished from each other, and in all cases, the fault location can be determined
with sub-nanosecond precision, corresponding to several centimeters in distance along the mesh.
\subsubsection{Probing by Oscilloscope Probe}
\label{sec_attack_probe}
\begin{figure}
\begin{center}
\includegraphics[width=\textwidth]{fig_probe_shape.pdf}
\end{center}
\caption{The circuit's TDR response under a probing attack using an oscilloscope probe. Black traces are a series of
un-probed baseline measurements taken between attacks. All traces are plotted relative to a separate baseline trace
taken at the beginning of the experiment. The top and bottom plots show the two halves of the differential pair.}
\label{fig_probe_shape}
\end{figure}
In our second experiment, we probed each of the three locations from the test specimen shown in Figure\
\ref{fig_pic_specimens} in the Appendix once at each trace of the trace pair using a Rigol \partno{PVP3150} $\times
1/\times 10$ oscilloscope probe set to $\times 10$ mode. We grounded the probe's ground clip to the mesh ground and used
the probe without tip attachment.
Using the \partno{PI3HDX12211} variant of our prototype, we measured the mesh's TDR response while probing. Figure\
\ref{fig_manip_shape} shows the resulting TDR traces. Oscilloscope probes are specifically designed to disturb the
circuit under test as little as possible, with this one being specified as presenting as a \qty{10}{\mega\ohm} resistive
load in parallel with a \qty{10}{\pico\farad} capacitance when used in $\times 10$ mode as we did here. Since the
resulting disturbance to the TDR traces is smaller than those in Figure\ \ref{fig_manip_shape}, we post-processed the
traces by subtracting a baseline trace taken before the measurements. To highlight drift in the baseline trace, we
include additional baseline traces taken in between and after measurements using the same post-processing.
In each trace, the mesh was probed in one of three locations as in Figure\ \ref{fig_manip_shape}, and on one of the two
mesh traces. The time range shown in the graph covers the primary reflection of the stimulus pulse's rising edge. We can
clearly see a distinct response to each of the three probing attempts with the only caveat being that the response of
the two mesh traces is asymmetrical due to asymmetry in our sampling frontend when measuring such low signal levels.
Interestingly, this asymmetry is fully compensated by the fact that we excite the mesh differentially, and as a result
probing either trace distorts their shared electromagnetic field, and impacts measurements on \emph{both} traces.
Particularly on the first trace, we can distinguish which trace was probed, as well as where it was probed, in a single
measurement.
\subsubsection{Circumvention Through Micro-Soldering}
\label{sec_attack_bridge}
\begin{figure}
\centering
\begin{subfigure}{0.78\textwidth}
\centering
\includegraphics[width=\textwidth]{fig_drill_mod_shape.pdf}
\label{fig_drill_mod_shape_plot}
\end{subfigure}
\begin{subfigure}{0.2\textwidth}
\centering
\includegraphics[width=\textwidth]{pic_manip_microsoldering_small.jpg}
\vspace*{2mm}
\label{fig_drill_mod_shape_pic}
\end{subfigure}
\caption{The circuit's TDR response under a manipulation attack bridging part of a trace to allow a
\qty{300}{\micro\meter} drill to penetrate. The mesh pitch is \qty{240}{\micro\meter}. Red traces show
measurements with a looped wire patch comparable to \textcite{immlerSecurePhysicalEnclosures2018}, black traces
show the same gap bridged with a minimally short straight piece of wire. The left and right plots show the two
halves of the differential pair. The photo shows the looped wire patch with a \qty{1}{\milli\meter} pitch ruler
for reference. Traces are normalized as in Figure\ \ref{fig_probe_shape}.}
\label{fig_drill_mod_shape}
\end{figure}
While our proposed measurement setup significantly increases the level of effort required from an attacker, as long as
standard PCBs are used, PCB rework techniques that are widely used in the industry for PCB repair can be applied. If we
assume a standard PCB process with \qty{100}{\micro\meter} trace/space design rules, a drilling attack targeting a
\qty{300}{\micro\meter} hole size as proposed by \textcite{immlerSecurePhysicalEnclosures2018} will break at least one
trace. Patching the resulting break using a wire is possible, but with increasing wire length, the TDR response of the
mesh is increasingly distorted. We experimentally performed an attack comparable to the one shown by
\textcite{immlerSecurePhysicalEnclosures2018} on a \qty{240}{\micro\meter} pitch mesh specimen. Figure\
\ref{fig_drill_mod_shape} shows our modification and the resulting change in TDR response. As we can see, adding even
just a few millimeters of wire will measurably and consistently distort the TDR response.
\subsection{Countermeasures}
As shown above, PCB security meshes can be manipulated using industry-standard micro-soldering techniques. Keeping the
length of any patch wires as short as possible, it is conceivable that the impact on TDR response could be kept below
detection thresholds. Our setup provides increased resistance against such attacks since the entire attack would have to
be carried out without electrically contacting either mesh trace. In particular, soldering would have to be done using a
minimal amount of solder as well as a bespoke, insulated soldering iron tip. While manufacturing such a tool out of a
material like sintered ceramic is conceivable, to our knowledge, no such tool exists on the market.
Furthermore, the actual drilling would have to happen with a dielectric drill bit, placing special attention on
evacuating conductive copper chips before they can create shorts to nearby traces. Again, it is conceivable that such a
tool could be manufactured, but to our knowledge, such a tool is not currently available as a standard component on the
market.
Finally, any probes penetrating the mesh would have to be placed such that their presence in the vicinity of the mesh
traces does not disturb the TDR response. In particular, we have observed that even touching the mesh will distort the
response, so modifications would have to be carried out with great care, likely using micromanipulators or similar
specialized equipment.
The PCI PTS HSM DTR standard\cite{pcisecuritystandardscouncilPaymentCardIndustry2021a} contains a useful framework for
thinking about attacker capabilities. Applying their taxonomy, our monitoring system raises the skill level required for
a patching attack from a \emph{skilled} attacker to an \emph{expert} attacker, and the equipment requirement from
\emph{standard} equipment to \emph{bespoke} equipment such as dielectric drill bits and ceramic soldering tips.
\section{Future Work}
\paragraph{Design variants.} While the \partno{STM32G4}'s \partno{HRTIM} peripheral offers edge position control at a
precision of $\frac{1}{32}$ system clock cycle using an automatically adjusted delay-locked loop at each output driver,
due to the comparatively slow maximum system clock speed of \qty{168}{\mega\hertz}, this still only results in a timing
resolution of \qty{184}{\pico\second}. While we have demonstrated this is sufficient to detect and localize several
attack variants, it would be interesting to increase time resolution since in our measurements, we observed that the
end-to-end jitter of our sampler is low enough that our circuit would benefit from finer delay control. In our
prototype, we implemented a--so far unused--adjustable power supply for the \partno{74LVC} series buffer in between the
\partno{HRTIM} outputs and the pulse amplifier. By adjusting this buffer's power supply through one of the
microcontroller's digital-to-analog converter (DAC) channels, we expect that it should be possible to exploit the supply
voltage dependency of the propagation delay of \partno{74LVC} series CMOS logic to create a digitally controllable delay
with picosecond resolution. The internal DLL of the \partno{HRTIM} peripheral is likely implemented similarly.
% FIXME reword for publication
\paragraph{System design.} The work we presented in this paper is complementary to the work previously presented by
\textcite{gotteCantTouchThis2022}, where the authors improved security of a simple security mesh made from standard PCBs
through mechanical motion. We are currently working on a prototype combining both approaches and incorporating heuristic
scan scheduling as mentioned in Section\ \ref{sec_scan_schedule} for a cost-efficient yet powerful physical security
primitive.
\paragraph{Auxiliary applications.} In this work, we have presented a design for a low-cost, embedded TDR frontend.
Besides security mesh monitoring, through multiplexing this TDR frontend could be used for other system monitoring
tasks from tamper sensing to system health monitoring. For instance, \textcite{vaiSecureArchitectureEmbedded2015}
propose an approach for checking the integrity of a PCBA using an external Vector Network Analyzer (VNA) attached to
test points on the PCBA's Power Distribution Network (PDN). TDR can produce fingerprints similar to a VNA, and it would
be interesting to measure parts of the secure subsystem other than its security mesh using our TDR frontend.
\section{Conclusion}
In this paper, we presented a design for a low-cost frontend for integrity monitoring of security meshes in applications
such as HSMs based on the principles of sub-nanosecond Time Domain Reflectometry. Our design repurposes an inexpensive
HDMI redriver IC to produce sharp edges for the TDR stimulus, and applies a microwave clip line to form fast pulses for
TDR sampling. Our design creates a detailed fingerprint of the intact mesh's condition that not only captures the length
of the mesh's traces but also reflects the impedance at every point along the mesh.
Beyond simply detecting faults or manipulations that disturb the mesh without causing breaks, we have demonstrated our
prototype circuit's capability to distinguish and physically localize faults inside the mesh in several practical attack
scenarios with even careful attacks causing strong disturbances in the generated fingerprint.
Compared to the state of the art, our approach enables the monitoring of larger meshes, at higher sensitivity and lower
cost. Our is easy to replicate, does not require any specialized or custom components, and unlocks high-security
applications for security meshes made using low-cost, standard PCB manufacturing processes.
% FIXME put into actual appendix
%\appendix
%\section{Additional photos}
%\label{appendix_photos}
%
%\begin{figure}[h!]
% \centering
% \begin{subfigure}{0.45\textwidth}
% \centering
% \includegraphics[width=0.8\textwidth]{pic_short_2_small.jpg}
% \label{fig_pic_specimens_short}
% \caption{Short circuit test specimen}
% \end{subfigure}
% \begin{subfigure}{0.45\textwidth}
% \centering
% \includegraphics[width=0.8\textwidth]{pic_cut_1_small.jpg}
% \label{fig_pic_specimens_open}
% \caption{Cut trace test specimen}
% \end{subfigure}
% \caption{Photos of the short circuit and cut trace test specimens. In the specimen shown on the left, in each of the
% three marked locations, both traces of the mesh were exposed. To measure short circuit response, the traces were
% shorted in one of the locations using a soldering iron. In the specimen shown on the right, one trace was
% exposed and cut in the marked location. To measure baseline values, the test specimen shown on the right was
% used with the trace temporarily repaired.}
% \label{fig_pic_specimens}
%\end{figure}
%
%