Cut down a bunch of stuff, make space for majR measurements

This commit is contained in:
jaseg 2025-07-14 14:37:42 +02:00
parent 7642a0e3ee
commit 3a287db5e4

View file

@ -29,7 +29,8 @@
\tcbuselibrary{breakable}
\usepackage{float}
\definecolor{highlightgreen}{rgb}{0.18 0.4 0.13}
\definecolor{highlightred}{rgb}{0.6 0.1 0.1}
\definecolor{highlightgreen}{rgb}{0.12 0.5 0.07}
\DeclareSIUnit{\baud}{Bd}
\DeclareSIUnit{\year}{a}
\DeclareSIUnit{\rpm}{rpm}
@ -408,6 +409,8 @@ multiplexers.
\section{Circuit Design and Driving Approach}
% FIXME peer review only, for major revision @ TCHES
\color{highlightred}
\begin{figure}
\centering
\hspace*{-7mm}
@ -416,72 +419,60 @@ multiplexers.
\label{fig_block_diagram}
\end{figure}
A TDR can be broken down into three basic components. First, we need a source of fast pulses (or fast edges!) to
stimulate the mesh. Second, we need a coupler that allows us to couple the stimulus pulses into the mesh, and their
reflections out of it. Finally, we need a fast ADC to capture the reflections.
A TDR can be broken down into three basic components: A source of fast stimulus pulses (or edges!), a coupler that
separates stimulus pulses and their reflection at the output, and a fast ADC to capture the reflections.
Figure\ \ref{fig_block_diagram} shows a block diagram of our design\footnote{Full schematics are available in this
paper's supplementary material.}. At the core of our design lies an equivalent time sampling setup, where two
diode bridge sampling gates alternately sample the two traces of the mesh.
Since physical attacks happen on a time scale of minutes or hours, we do not need a fast acquisition rate. Equivalent
time sampling uses fast sampling gates to sample a high-frequency signal at a low frequency that is suitable for direct
conversion through an ADC. This reduces the requirements of our data acquisition and signal processing fronted from
gigasamples per second to mere megasamples, well within the range that a commodity microcontroller can handle.
conversion through an ADC. Using equivalent-time sampling, we can sample \unit{\giga\hertz}-Scale signals at the
\unit{\mega\hertz}-scale sampling rate of the internal ADCs of the commodity microcontroller we use. We use two of the
microcontroller's ADCs interleaved, each of which provides approximately \qty{1.7}{\mega Sp\per\second} at
\qty{12}{\bit} resolution. Due to the high conversion speed of the modern ADC cores in this microcontroller, we are able
to use up to $384\times$ oversampling for increased precision without unduly affecting measurement times.
A challenge in equivalent time sampling is precisely phase-synchronizing the sampling pulse to the fundamental frequency
of the input signal, which is usually implemented by using a high-speed comparator. In a TDR-style frontend like ours,
this expensive component can be avoided because the stimulus signal is generated in the frontend, simplifying the
challenge of generating a synchronized sampling pulse at an adjustable phase to the stimulus pulse.
%A challenge in equivalent time sampling is precisely phase-synchronizing the sampling pulse to the fundamental
%frequency of the input signal, which is usually implemented by using a high-speed comparator. In a TDR-style frontend
%like ours, this expensive component can be avoided because the stimulus signal is generated in the frontend,
%simplifying the challenge of generating a synchronized sampling pulse at an adjustable phase to the stimulus pulse.
Since an intact mesh has low insertion loss, the amplitude of the response of an intact mesh is large. Thus, we do not
need a high dynamic range in either the frontend amplifiers or in the ADC, enabling the use of commodity operational
amplifiers (opamps) and the built-in ADC of a commodity microcontroller. Further, the strong signal allows us to use a
comparatively lossy \qty{-6}{\deci\bel} resistive tee instead of a directional coupler. A resistive tee does not provide
directionality, but in our case, the incident pulse can never interfere with reflections at the sampling output of the
divider because of causality.
The mesh has low insertion loss. Thanks to the resulting large amplitude of the reflection signal, the noise floor of
our frontend based on commodity operational amplifiers (opamps) is below the resolution limit of the built-in ADCs of
our chosen microcontroller. The main source of frontend noise stems from timing jitter between the sampling gate and the
ADC due to the clock generation of the ADC, which could be reduced through firmware changes. The strong signal allows us
to use a comparatively lossy but simple \qty{-6}{\deci\bel} resistive tee instead of a directional coupler.
To implement our sub-nanosecond sampler, we chose a simple four-diode bridge sampling gate made from commodity
We implemented the sub-nanosecond sampler using a simple four-diode bridge sampling gate made from commodity
\partno{BAT17-04W} RF Schottky diodes, which offer turn-on times better than \qty{100}{\pico\second} at
\price{0.13}{\euro} per device at quantity 1000. The four-diode configuration requires only two dual diode packages. In
contrast to \textcite{polasekReflektometrCasoveOblasti2020,houtman1GHzSamplingOscilloscope2000}, in our system, double
sampling is not necessary - instead, we follow the sampling gate directly with an amplifier feeding into the internal
ADC of our microcontroller. We use an internal timer peripheral of the same microcontroller to generate both stimulus
and sample pulses such that we can easily phase-lock the internal ADC to the same timer.
\price{0.13}{\euro} per device at quantity 1000. In contrast to prior
work\cite{polasekReflektometrCasoveOblasti2020,houtman1GHzSamplingOscilloscope2000}, we precisely control the timing of
our ADC and avoid the need for a second sampling stage.
We base our circuit around an \partno{STM32G474RB} microcontroller, a \price{5}{\euro}-class commodity ARM
microcontroller. Besides adequate processing speed for its price class, this microcontroller offers two features that
are critical to our design. First, its internal ADCs are both higher resolution and faster than those of older parts.
Second, it is one of a few parts in its series that include a \emph{high-resolution timer} (\partno{HRTIM}) peripheral
that provides several outputs that can be controlled with better than \qty{200}{\pico\second} resolution through
per-output, self-calibrating delay line circuitry. We use this peripheral to produce both the stimulus pulse and the
phase-adjustable sampling pulse.
We base our circuit around an \partno{STM32G474RB} microcontroller, \price{5}{\euro}-class commodity ARM
microcontroller. This is a recent part, which has internal ADCs that are both higher resolution and faster than those of
older parts. Furthermore, it includes a \emph{high-resolution timer} (\partno{HRTIM}) peripheral that provides better
than \qty{200}{\pico\second} timing resolution through self-calibrating delay lines. We use this peripheral to produce
adjustable, phase-locked stimulus and sampling pulses.
While the HRTIM peripheral allows us to finely adjust the phase of its output waveform, the digital output structures of
the \partno{STM32G4} series are still limited to nanosecond-scale rise and fall times with the datasheet quoting
$t_r=t_f=\qty{1.7}{\nano\second}$ into a \qty{10}{\pico\farad} load when using the fastest GPIO output drive strength
setting and a \qty{3.3}{\volt} supply\cite{stmicroelectronicsSTM32G474xBDatasheet2021}. We work around this issue by
applying two circuit tricks. First, we send its output through a fast amplifier to square up the edges to a rise time
better than \qty{500}{\pico\second}. The remaining challenge is that while we now have pulses with crisp edges, due to
constraints of the HRTIM peripheral, at more than \qty{10}{\nano\second}, these pulses are still too wide to be useful.
We solve this issue by applying a clip line\cite{tektronixinc.TektronixS6Sampling1982} pulse forming network at the
output of the amplifier--i.e.\ we connect the amplifier's output to the load in parallel with a short, terminated
transmission line stub. The length of this stub determines the pulse width.
While the HRTIM peripheral provides sub-nanosecond phase adjustment, the digital outputs of the \partno{STM32G4} series
are limited to a minimum transition time of $t_r=t_f=\qty{1.7}{\nano\second}$\footnote{Datasheet specification, when
driving a \qty{10}{\pico\farad} load\cite{stmicroelectronicsSTM32G474xBDatasheet2021}.}. We work around this issue with
two circuit tricks. First, we send the output through a fast amplifier to square up the edges to a rise time better than
\qty{500}{\pico\second}. We then reduce the \qty{10}{\nano\second} minimum pulse width supported by the \partno{HRTIM}
peripheral by applying a clip line\cite{tektronixinc.TektronixS6Sampling1982} pulse forming network--i.e.\ we connect
the amplifier's output to the load in parallel with a short, terminated transmission line stub. The length of this stub
determines the pulse width.
\subsection{Driver Selection}
Several types of amplifiers can be used in our pulse shaping application. Common to all options, we require differential
outputs. In practice, for most parts, this means we are looking for a part with Current Mode Logic (CML) outputs. CML is
a differential signaling standard that is widely used in high-speed logic. In CML, a current source feeds a pair of
transistors that steer current between the two outputs of the differential pair. By steering current between the two
outputs, common-mode currents are minimized which both reduces the effect of power supply impedance at the transmitter
and reduces electromagnetic emissions from the differential pair's PCB traces. In our experiments, we considered several
parts and settled on four parts for evaluation in this paper: A \partno{74LVC2G157} standard logic IC, two display
protocol redrivers, \partno{PI3HDX12211} and \partno{TDP0604}, as well as \partno{MAX3748}, a limiting amplifier for
optical networking applications. We implemented four variants of our prototype using a steady hand under a microscope as
shown in Figure\ \ref{fig_pic_amps}.
One notable omission from our tests was the series of CML-output comparators made by Analog Devices due to the cost of
these devices.
We evaluated multiple options for the pulse shaping amplifier in our design. For both sampling and stimulus, we work
with fully differential signals, so Current Mode Logic (CML) devices, which are widely used in high-speed logic, are a
natural fit. We settled on four parts for evaluation in this paper: A \partno{74LVC2G157} standard logic IC, two
HDMI/DisplayPort redrivers, \partno{PI3HDX12211} and \partno{TDP0604}, as well as \partno{MAX3748}, a limiting amplifier
for optical networking. Figure\ \ref{fig_pic_amps} shows the four hand-soldered prototypes. We avoided specialty parts
such as the CML-output comparators made by Analog Devices due to cost.
\begin{figure}
\centering
@ -505,74 +496,41 @@ these devices.
\includegraphics[width=0.9\textwidth]{pic_pi3hdx_small.jpg}
\caption{PI3HDX12211}
\end{subfigure}
\caption{Circuit-board implementation of the four pulse amplifier variants of the design. Amplifiers were mounted
dead bug style on a piece of copper tape connected to one of the supply rails and hooked up with
\qty{120}{\micro\meter} diameter wire according to their respective datasheets. Supply rails were hooked up using
copper tape where possible to reduce series impedance. Additional \qty{10}{\micro\farad} MLCC power supply
decoupling capacitors were placed close to the ICs on the copper tape to reduce loop area.}
\caption{Implementation of the pulse amplifier variants of the design. Amplifiers were mounted dead bug style on
copper tape and connected with \qty{120}{\micro\meter} wire. Supply rails were connected with copper tape where
possible to reduce impedance. MLCC power supply decoupling capacitors were placed on the copper tape to reduce loop
area.}
\label{fig_pic_amps}
\end{figure}
\paragraph{Standard logic ICs.}
As a baseline, we evaluated the \partno{74LVC2G157} standard logic IC. This IC contains a single multiplexer, however,
we are not interested in the multiplexer functionality. The interesting trivia about this chip is that it also is one of
the only \partno{74} series standard logic parts that have complimentary outputs. According to manufacturer
specifications, at a comparable \qty{20}{\pico\farad} load, \partno{74LVC} series parts have slightly faster rise and
fall times compared to our \partno{STM32} microcontroller's digital IO
pins\cite{renesaselectronicscorporationApplicationNoteAN2242019}.
As a baseline, we evaluated the \partno{74LVC2G157} CMOS multiplexer configured to provide complementary outputs.
According to manufacturer specifications, this part provides slightly faster rise and fall times than
oumicrocontroller\cite{renesaselectronicscorporationApplicationNoteAN2242019}.
\paragraph{Optical Networking Chipsets.}
A category of CML-output drivers suitable for our application is a class of optical networking chipset ICs. While
today, the construction of optical transmitters has moved to direct bonding of optical components and driver ICs to
minimize parasitics, discrete driver ICs for some chipsets from the mid-2000s era are still available at reasonable
cost. Both the laser driver used to drive the transmitter laser diode, and the limiting amplifier used to amplify the
receiver photodiode's output can be used in our application, with the limiting amplifier part requiring less additional
circuitry in our application due to its lack of output bias control. In our evaluation below, we include the
\partno{MAX3748} limiting amplifier as a representative part from this category that is still commercially available. A
drawback of relying on a part like this is that its future availability is uncertain given the evolution of the
industry.
Optical transceivers use CML-output limiting amplifiers and laser drivers, some of which are still available as discrete
components despite the industry moving from PCB implementations to direct bonding. We evaluated the \partno{MAX3748}
limiting amplifier as a representative part from this category.
\paragraph{Bus Redrivers.}
The final category of amplifiers suitable for our pulse shaping needs is redrivers intended for high-speed data
interfaces such as USB 3, PCI Express, HDMI, or DisplayPort. All of these interfaces use CML drivers, with differential
voltage levels usually in the order of \qtyrange{600}{1000}{\milli\volt}. \emph{Redriver} ICs are intended to be used to
amplify the sensitive high-speed bus signal at the edge of a PCBA, either before it leaves the board through a connector
to ensure adequate signal levels at the connector, or after it enters through a connector to compensate for loss in the
PCB traces between the connector and the signal's destination. For our application, redrivers intended for HDMI and
DisplayPort applications are most suitable, as they can usually be configured to act as simple amplifiers without
processing any protocol logic on the signals that are amplified. In contrast, both USB 3 and PCIe redrivers often
implement power saving features that try to parse parts of the actual signal transmitted through them, which are hard to
bypass in our application.
Most modern, high-speed buses like USB 3, PCI Express, HDMI, and Display Port use CML drivers. \emph{Redriver} ICs
intended to amplify such signals to compensate for loss in connectors or cables contain amplifiers that are suitable for
our application. HDMI/DisplayPort redrivers are most suitable since they can be configured as simple amplifiers,
turning off any signal-dependent power saving features.
Redrivers can be classified according to their way of operation. \emph{Retimers} include a full
serialization/deserialization (SerDes) setup and parse the low-level protocol of the bus to reconstruct bit-level
timing. We focus only on simpler redrivers that only contain amplifiers and (analog) equalizers here.
Amplifying redrivers can be separated into two classes: Limiting and linear redrivers. A limiting redriver is configured
to have a high gain such that a small input signal will be amplified to the full output voltage swing. Limiting
redrivers are well-suited for our application, but they have come out of fashion since they interfere with link training
and with power saving features of protocols like USB 3.
Linear redrivers are constructed with a low gain instead. Sufficient to compensate for wiring losses, their gain is low
enough to leave them transparent to bus protocol features such as link training or power saving features. To compensate
for their reduced gain, linear redrivers usually contain configurable equalizers that can be used to apply targeted
enhancements for particular signal defects, such as boosting high-frequency gain or providing a set amount of overshoot.
Where available, in our prototype variants we set these equalization features to provide maximum gain.
In our evaluation below, we include \partno{PI3HDX12211} as a linear redriver intended for DisplayPort and HDMI
applications, as well as \partno{TPD0604} as a ``hybrid'' linear or limiting redriver for HDMI applications, configured
for limiting mode in our experiments. An attractive feature of both of these chips as well as comparable devices is that
they usually include at least four independent channels, so only one chip is needed for both pulse paths. Additionally,
they are consumer mass market parts, resulting in a low price. For instance, \partno{PI3HDX12211} is available at
\price{2.11}{\euro} in single quantity and less than \price{1.30}{\euro} at a quantity of several hundred at distributor
LCSC, and \partno{TPD0604} is available at \price{4.72}{\euro} and \price{3.44}{\euro}, respectively, at distributor
Mouser.
In our evaluation below, we include \partno{PI3HDX12211} and \partno{TPD0604}, two inexpensive, consumer mass market
redrivers\footnote{
\partno{PI3HDX12211} is available at \price{2.11}{\euro} in single quantity and less than \price{1.30}{\euro} at a
quantity of several hundred at distributor LCSC, and \partno{TPD0604} is available at \price{4.72}{\euro} and
\price{3.44}{\euro}, respectively, at distributor Mouser}.
Both parts have four independent channels, so only one chip is needed for the two pulse paths.
\subsection{Cost Breakdown}
Table\ \ref{tab_bom} shows a breakdown of the cost of the main components of our prototype, resulting in a total
component cost of less than \price{10}{\euro}. We did not include power supply components in this breakdown as our
circuit is meant to be embedded into a payload circuit that will already have sufficient power supplies.
Table\ \ref{tab_bom} shows a breakdown of the cost of the main components of our prototype, totalling less than
\price{10}{\euro}. We did not include power supply components in this breakdown since our circuit is meant to be
embedded into a payload circuit that will already have sufficient power supplies.
Due to its \partno{HRTIM} peripheral, the \partno{STM32G4} microcontroller is the component of our design that is
hardest to replace. However, this part can still be replaced with a wide range of FPGAs, which commonly include
@ -595,10 +553,8 @@ of Xilinx 7 Series FPGAs provides the same $\frac{1}{32}$ clock cycle resolution
&25&0.01&Various resistors\\\hline
\multicolumn{2}{r}{}&\textbf{9.67}&\textbf{Total}
\end{tabular}
\caption{A cost breakdown of the major components of our design. Listed prices are for 1000 pieces order quantity to
make prices more comparable between distributors. The number of switches necessary for signal routing and
termination depends on the specific mesh signal routing of the application. Numbers shown here are for our
prototype, which can measure a mesh from both ends and supports short, open and matched termination.}
\caption{Cost breakdown of our prototype design. Prices are listed at order quantity 1000 to make prices more
comparable between distributors.}
\label{tab_bom}
\end{table}
@ -606,61 +562,34 @@ of Xilinx 7 Series FPGAs provides the same $\frac{1}{32}$ clock cycle resolution
\label{sec_scan_schedule}
The goal of a time domain reflectometer is to send a pulse into the Device Under Test (DUT)--i.e.\ in our application,
the mesh--and to record all reflections returning from the DUT afterwards. In something like a security mesh whose
traces might only be a few meters long in total, the time span between the pulse being sent and the last reflections
from the very end of the mesh arriving is in the order of several tens of nanoseconds. Directly recording a response at
this timescale would be infeasible using a commodity microcontroller, so we utilize an equivalent time sampling
approach.
the mesh--and to record all reflections returning from the DUT afterwards. In a security mesh with a few meters of total
trace length, the time span between the pulse being sent and the last reflections arriving from the end of the mesh is
in the order of tens of nanoseconds. Directly recording a response at this timescale would be infeasible in a commodity
microcontroller, so we use equivalent time sampling.
As shown in Figure\ \ref{fig_block_diagram}, our analog frontend contains amplifiers that produce the stimulus pulse, a
sampling gate with amplifiers, and a coupler that couples the pulse into the mesh and couples the reflections back into
the sampling gate. A microcontroller controls this frontend with two primary signals: A stimulus pulse, and a sampling
the sampling gate. A microcontroller controls this frontend with two main signals: A stimulus pulse, and a sampling
pulse. By adjusting the timing between these two pulses every time a stimulus pulse is sent, the microcontroller can
select a particular point in time after the stimulus pulse to record using the sampling gate. By slowly sweeping across
the whole time span, the microcontroller can reconstruct the waveform of the reflected signal at the sampling gate
across one period of the stimulus pulse. The recording rate of this waveform is limited by the repetition rate of the
stimulus pulse as well as the time step size.
sample the response at any chosen point in time. By sweeping across the whole time span, the microcontroller can
reconstruct the waveform of the reflected signal at the sampling gate.
The attainable repetition rate of our stimulus and sampling circuits is limited by two main components. First, the
sampling post-amplifier's bandwidth limits the maximum sample rate. In our design, we chose an \partno{OPA1656}
\qty{50}{\mega\hertz} Gain-Bandwidth Product (GBP) FET input low noise operational amplifier. We need a FET input part
to avoid loading the sampling gate. The comparatively high GBP and the low noise input stage of this device allow us to
amplify small signals that could result from weak reflections in small impedance discontinuities inside the mesh.
In our prototype, we sample the response once after each stimulus pulse. We conservatively decided on a sampling rate of
\qty{1}{MSps} across both channels of the mesh's differential pair. This sampling rate leaves some headroom to the
\qty{50}{\mega\hertz} Gain-Bandwidth Product (GBP) of the \partno{OPA1656} frontend opamp, as well as the \qty{4}{MSps}
that the ADCs can reach. The processing speed of the microcontroller allows individual control of the timing of each
sampling pulse.
The second major factor limiting repetition rate is the microcontroller's ADC speed, as well as the speed of the
software processing the ADC's output. At full \qty{12}{b} resolution, this corresponds to a sampling rate of
approximately \qty{4}{MSps}. The microcontroller contains five ADCs, which can be interleaved to achieve higher rates.
Combining these factors, we conservatively decided on a sampling rate of \qty{1}{MSps} across both channels of the
differential pair. At this sampling rate, it is feasible to control the sample timing on a sample-by-sample basis. For
all measurements in this paper, we use a sequential sampling approach where the microcontroller takes a series of
measurements for oversampling at a particular delay, and then increases the delay by one \partno{HRTIM} output clock
interval.
In our prototype, one sweep of a \qty{188}{\nano\second} time span consisting of $1024$ data points took
\qty{710}{\milli\second} at $256\times$ oversampling and \qty{1.1}{\second} at $384\times$ oversampling. The time span
corresponds to \qty{28}{\meter} of mesh length, which at a \qty{200}{\micro\meter} pitch corresponds to a mesh area of
\qty{113}{\centi\meter\squared} and at a \qty{1}{\milli\meter} pitch corresponds to
\qty{565}{\centi\meter\squared}. Using the same microcontroller, by optimizing timing, moving oversampling processing
out of the interrupt handler, and by interleaving four of the microcontroller's five ADC peripherals, the lower limit of
acquisition time of a $1024$-point scan is \qty{33}{\milli\second} for $256\times$ oversampling and
\qty{49}{\milli\second} for $384\times$ oversampling.
While for our development, sequential scanning is adequate, in a future practical application, two simple optimizations
would decrease the time to detection for an attack. First, in a practical application, the range of scanned delays
should be adjusted to the length of the particular security mesh in use. For this paper, we always
scanned a time range of $1024$ points at \qty{184}{\pico\second} spacing starting before one stimulus pulse and ending
shortly before the next stimulus pulse so that any waveform artifacts will be visible. In a practical application, there
would be little information gained by sampling much beyond the edges of the expected mesh response, so the scan window
should be kept small to increase scan rate.
Secondly, in a practical application, the feature that is most relevant to detect tamper attempts is the trailing edge
of the mesh's response. This trailing edge corresponds to the return of the stimulus pulse's reflection at the far end
of the mesh. Any attack that affects the impedance even only of part of the mesh has a high chance of affecting its
delay, and thus this trailing edge is likely to move. In a practical application, it would thus be efficient to use a
heuristic scan schedule instead of the sequential scan we are using in our research prototype. Such a heuristic schedule
would sample delays near the expected trailing edge of the particular mesh in use more frequently compared to delays
that lie somewhere else, such as in the middle of the mesh's return window.
% major revision: Since we did all measurements for the majR with only 768 samples, we re-scaled the numbers in this
% paragraph accordingly.
% FIXME mention in majR letter.
In our prototype, one sweep of a \qty{141}{\nano\second} time span consisting of $768$ data points took
\qty{825}{\milli\second} at $384\times$ oversampling. The time span corresponds to \qty{21}{\meter} of mesh length,
which at a \qty{200}{\micro\meter} pitch corresponds to a mesh area of \qty{85}{\centi\meter\squared} and at a
\qty{1}{\milli\meter} pitch corresponds to \qty{426}{\centi\meter\squared}. By optimizing timing, moving oversampling
processing out of the interrupt handler, and by interleaving four instead of two of the microcontroller's five ADC
peripherals, the lower limit of acquisition time of a $768$-point scan is \qty{37}{\milli\second} for $384\times$
oversampling.
\section{Experimental Evaluation}
@ -1109,13 +1038,8 @@ thinking about attacker capabilities. Applying their taxonomy, our monitoring sy
a patching attack from a \emph{skilled} attacker to an \emph{expert} attacker, and the equipment requirement from
\emph{standard} equipment to \emph{bespoke} equipment such as dielectric drill bits and ceramic soldering tips.
% https://tex.stackexchange.com/questions/336201/vertical-highlight-of-a-paragraph
\begin{tcolorbox}[breakable,
enhanced,
colback=yellow!10!white,
boxrule=0pt,frame hidden,
borderline west={1mm}{-2mm}{highlightgreen}]
% FIXME peer review only, for major revision @ TCHES
\color{highlightgreen}
\begin{figure}[H]
\begin{subfigure}{0.5\textwidth}
\includegraphics[width=\textwidth]{fig_covar_patch_repeat_tridelta_all_the_data_p0.3.pdf}
@ -1262,33 +1186,32 @@ a patching attack from a \emph{skilled} attacker to an \emph{expert} attacker, a
\caption{}
\label{}
\end{figure}
\end{tcolorbox}
% FIXME peer review only, for major revision @ TCHES
\color{black}
\section{Future Work}
\paragraph{Design variants.} The \partno{STM32G4}'s \partno{HRTIM} peripheral is limited by to the comparatively slow
maximum system clock speed of \qty{168}{\mega\hertz} to a timing resolution of \qty{184}{\pico\second}. While we have
demonstrated that this is sufficient to detect and localize several attack variants, it would be interesting to increase
time resolution since in our measurements, we observed that the end-to-end jitter of our frontend is low enough that our
circuit would benefit from finer delay control. In our prototype, we implemented a--so far unused--adjustable power
supply for the \partno{74LVC} series buffer in between the \partno{HRTIM} outputs and the pulse amplifier. By adjusting
this buffer's power supply through one of the microcontroller's digital-to-analog converter (DAC) channels, we expect
that it should be possible to exploit the supply voltage dependency of the propagation delay of \partno{74LVC} series
CMOS logic to create a digitally controllable delay with picosecond resolution. The internal DLL of the \partno{HRTIM}
peripheral is likely implemented similarly.
\paragraph{Design variants.} We found that the timing jitter of our sampling frontend is low enough to reach the
\qty{184}{\pico\second} resolution limit of the \partno{STM32G4} \partno{HRTIM} peripheral. In our prototype, we
implemented a -- so far unused -- adjustable power supply for the \partno{74LVC} series buffer in between the
\partno{HRTIM} outputs and the pulse amplifier. By adjusting this buffer's power supply through one of the
microcontroller's digital-to-analog converter (DAC) channels, we expect that it should be possible to exploit the supply
voltage dependency of the propagation delay of \partno{74LVC} series CMOS logic to create a digitally controllable delay
with picosecond resolution.
% FIXME reword for publication
\paragraph{System design.} The work we presented in this paper is complementary to the work previously presented by
\textcite{gotteCantTouchThis2022}, where the authors improved security of a simple security mesh made from standard PCBs
through mechanical motion. We are currently working on a prototype combining both approaches and incorporating heuristic
scan scheduling as mentioned in Section\ \ref{sec_scan_schedule}.
\paragraph{Non-sequential sampling.} Not all parts of the reflected signal are equally sensitive to tampering atttempts.
For instance, the reflection's trailing edge corresponds contains information on both the length of the mesh and on its
attenuation. Instead of recording the response waveform in a linear scan, in a practical application, more relevant
parts of the response such as this trailing edge could be scanned at a higher rate than other, less relevant parts.
Similarly, fast scans at a coarse time resolution could be interleaved with slow scans at a finer time resolution to
detect large changes more quickly.
\paragraph{Auxiliary applications.} In this work, we have presented a design for a low-cost, embedded TDR frontend.
Besides security mesh monitoring, through multiplexing this TDR frontend could be used for other system monitoring
tasks from tamper sensing to system health monitoring. For instance, \textcite{vaiSecureArchitectureEmbedded2015}
propose an approach for checking the integrity of a PCBA using an external Vector Network Analyzer (VNA) attached to
test points on the PCBA's Power Distribution Network (PDN). TDR can produce fingerprints similar to a VNA and it would
be interesting to measure parts of the secure subsystem other than its security mesh using our TDR frontend.
\paragraph{Auxiliary applications.} The low-cost, embedded TDR frontend presented in this paper could be used for other
monitoring tasks from tamper sensing to system health monitoring. For instance,
\textcite{vaiSecureArchitectureEmbedded2015} propose checking the integrity of a PCBA using an external Vector Network
Analyzer (VNA) attached to test points on the PCBA's Power Distribution Network (PDN). TDR can produce fingerprints
similar to a VNA and it would be interesting to measure parts of the secure subsystem other than its security mesh using
our TDR frontend.
\section{Conclusion}