Cut down a bunch of stuff, make space for majR measurements

2025-07-14 14:37:42 +02:00 · 2025-07-14 14:37:42 +02:00 · 3a287db5e4
commit 3a287db5e4
parent 7642a0e3ee
1 changed files with 113 additions and 190 deletions
--- a/paper/paper.tex
+++ b/paper/paper.tex
@ -29,7 +29,8 @@
 \tcbuselibrary{breakable}
 \usepackage{float}

-\definecolor{highlightgreen}{rgb}{0.18 0.4 0.13}
+\definecolor{highlightred}{rgb}{0.6 0.1 0.1}
+\definecolor{highlightgreen}{rgb}{0.12 0.5 0.07}
 \DeclareSIUnit{\baud}{Bd}
 \DeclareSIUnit{\year}{a}
 \DeclareSIUnit{\rpm}{rpm}
@ -408,6 +409,8 @@ multiplexers.

 \section{Circuit Design and Driving Approach}

+% FIXME peer review only, for major revision @ TCHES
+\color{highlightred}
 \begin{figure}
    \centering
    \hspace*{-7mm}
@ -416,72 +419,60 @@ multiplexers.
    \label{fig_block_diagram}
 \end{figure}

-A TDR can be broken down into three basic components. First, we need a source of fast pulses (or fast edges!) to
-stimulate the mesh. Second, we need a coupler that allows us to couple the stimulus pulses into the mesh, and their
-reflections out of it. Finally, we need a fast ADC to capture the reflections.
+A TDR can be broken down into three basic components: A source of fast stimulus pulses (or edges!), a coupler that
+separates stimulus pulses and their reflection at the output, and a fast ADC to capture the reflections.

 Figure\ \ref{fig_block_diagram} shows a block diagram of our design\footnote{Full schematics are available in this
 paper's supplementary material.}. At the core of our design lies an equivalent time sampling setup, where two
 diode bridge sampling gates alternately sample the two traces of the mesh. 
 Since physical attacks happen on a time scale of minutes or hours, we do not need a fast acquisition rate. Equivalent
 time sampling uses fast sampling gates to sample a high-frequency signal at a low frequency that is suitable for direct
-conversion through an ADC. This reduces the requirements of our data acquisition and signal processing fronted from
-gigasamples per second to mere megasamples, well within the range that a commodity microcontroller can handle.
+conversion through an ADC. Using equivalent-time sampling, we can sample \unit{\giga\hertz}-Scale signals at the
+\unit{\mega\hertz}-scale sampling rate of the internal ADCs of the commodity microcontroller we use. We use two of the
+microcontroller's ADCs interleaved, each of which provides approximately \qty{1.7}{\mega Sp\per\second} at
+\qty{12}{\bit} resolution. Due to the high conversion speed of the modern ADC cores in this microcontroller, we are able
+to use up to $384\times$ oversampling for increased precision without unduly affecting measurement times.

-A challenge in equivalent time sampling is precisely phase-synchronizing the sampling pulse to the fundamental frequency
-of the input signal, which is usually implemented by using a high-speed comparator. In a TDR-style frontend like ours,
-this expensive component can be avoided because the stimulus signal is generated in the frontend, simplifying the
-challenge of generating a synchronized sampling pulse at an adjustable phase to the stimulus pulse.
+%A challenge in equivalent time sampling is precisely phase-synchronizing the sampling pulse to the fundamental
+%frequency of the input signal, which is usually implemented by using a high-speed comparator. In a TDR-style frontend
+%like ours, this expensive component can be avoided because the stimulus signal is generated in the frontend,
+%simplifying the challenge of generating a synchronized sampling pulse at an adjustable phase to the stimulus pulse.

-Since an intact mesh has low insertion loss, the amplitude of the response of an intact mesh is large. Thus, we do not
-need a high dynamic range in either the frontend amplifiers or in the ADC, enabling the use of commodity operational
-amplifiers (opamps) and the built-in ADC of a commodity microcontroller. Further, the strong signal allows us to use a
-comparatively lossy \qty{-6}{\deci\bel} resistive tee instead of a directional coupler. A resistive tee does not provide
-directionality, but in our case, the incident pulse can never interfere with reflections at the sampling output of the
-divider because of causality.
+The mesh has low insertion loss. Thanks to the resulting large amplitude of the reflection signal, the noise floor of
+our frontend based on commodity operational amplifiers (opamps) is below the resolution limit of the built-in ADCs of
+our chosen microcontroller. The main source of frontend noise stems from timing jitter between the sampling gate and the
+ADC due to the clock generation of the ADC, which could be reduced through firmware changes. The strong signal allows us
+to use a comparatively lossy but simple \qty{-6}{\deci\bel} resistive tee instead of a directional coupler.

-To implement our sub-nanosecond sampler, we chose a simple four-diode bridge sampling gate made from commodity
+We implemented the sub-nanosecond sampler using a simple four-diode bridge sampling gate made from commodity
 \partno{BAT17-04W} RF Schottky diodes, which offer turn-on times better than \qty{100}{\pico\second} at
-\price{0.13}{\euro} per device at quantity 1000. The four-diode configuration requires only two dual diode packages. In
-contrast to \textcite{polasekReflektometrCasoveOblasti2020,houtman1GHzSamplingOscilloscope2000}, in our system, double
-sampling is not necessary - instead, we follow the sampling gate directly with an amplifier feeding into the internal
-ADC of our microcontroller. We use an internal timer peripheral of the same microcontroller to generate both stimulus
-and sample pulses such that we can easily phase-lock the internal ADC to the same timer.
+\price{0.13}{\euro} per device at quantity 1000. In contrast to prior
+work\cite{polasekReflektometrCasoveOblasti2020,houtman1GHzSamplingOscilloscope2000}, we precisely control the timing of
+our ADC and avoid the need for a second sampling stage.

-We base our circuit around an \partno{STM32G474RB} microcontroller, a \price{5}{\euro}-class commodity ARM
-microcontroller. Besides adequate processing speed for its price class, this microcontroller offers two features that
-are critical to our design. First, its internal ADCs are both higher resolution and faster than those of older parts.
-Second, it is one of a few parts in its series that include a \emph{high-resolution timer} (\partno{HRTIM}) peripheral
-that provides several outputs that can be controlled with better than \qty{200}{\pico\second} resolution through
-per-output, self-calibrating delay line circuitry. We use this peripheral to produce both the stimulus pulse and the
-phase-adjustable sampling pulse.
+We base our circuit around an \partno{STM32G474RB} microcontroller, \price{5}{\euro}-class commodity ARM
+microcontroller. This is a recent part, which has internal ADCs that are both higher resolution and faster than those of
+older parts. Furthermore, it includes a \emph{high-resolution timer} (\partno{HRTIM}) peripheral that provides better
+than \qty{200}{\pico\second} timing resolution through self-calibrating delay lines. We use this peripheral to produce
+adjustable, phase-locked stimulus and sampling pulses.

-While the HRTIM peripheral allows us to finely adjust the phase of its output waveform, the digital output structures of
-the \partno{STM32G4} series are still limited to nanosecond-scale rise and fall times with the datasheet quoting
-$t_r=t_f=\qty{1.7}{\nano\second}$ into a \qty{10}{\pico\farad} load when using the fastest GPIO output drive strength
-setting and a \qty{3.3}{\volt} supply\cite{stmicroelectronicsSTM32G474xBDatasheet2021}. We work around this issue by
-applying two circuit tricks. First, we send its output through a fast amplifier to square up the edges to a rise time
-better than \qty{500}{\pico\second}. The remaining challenge is that while we now have pulses with crisp edges, due to
-constraints of the HRTIM peripheral, at more than \qty{10}{\nano\second}, these pulses are still too wide to be useful.
-We solve this issue by applying a clip line\cite{tektronixinc.TektronixS6Sampling1982} pulse forming network at the
-output of the amplifier--i.e.\ we connect the amplifier's output to the load in parallel with a short, terminated
-transmission line stub. The length of this stub determines the pulse width.
+While the HRTIM peripheral provides sub-nanosecond phase adjustment, the digital outputs of the \partno{STM32G4} series
+are limited to a minimum transition time of $t_r=t_f=\qty{1.7}{\nano\second}$\footnote{Datasheet specification, when
+driving a \qty{10}{\pico\farad} load\cite{stmicroelectronicsSTM32G474xBDatasheet2021}.}. We work around this issue with
+two circuit tricks. First, we send the output through a fast amplifier to square up the edges to a rise time better than
+\qty{500}{\pico\second}. We then reduce the \qty{10}{\nano\second} minimum pulse width supported by the \partno{HRTIM}
+peripheral by applying a clip line\cite{tektronixinc.TektronixS6Sampling1982} pulse forming network--i.e.\ we connect
+the amplifier's output to the load in parallel with a short, terminated transmission line stub. The length of this stub
+determines the pulse width.

 \subsection{Driver Selection}

-Several types of amplifiers can be used in our pulse shaping application. Common to all options, we require differential
-outputs. In practice, for most parts, this means we are looking for a part with Current Mode Logic (CML) outputs. CML is
-a differential signaling standard that is widely used in high-speed logic. In CML, a current source feeds a pair of
-transistors that steer current between the two outputs of the differential pair. By steering current between the two
-outputs, common-mode currents are minimized which both reduces the effect of power supply impedance at the transmitter
-and reduces electromagnetic emissions from the differential pair's PCB traces. In our experiments, we considered several
-parts and settled on four parts for evaluation in this paper: A \partno{74LVC2G157} standard logic IC, two display
-protocol redrivers, \partno{PI3HDX12211} and \partno{TDP0604}, as well as \partno{MAX3748}, a limiting amplifier for
-optical networking applications. We implemented four variants of our prototype using a steady hand under a microscope as
-shown in Figure\ \ref{fig_pic_amps}.
-
-One notable omission from our tests was the series of CML-output comparators made by Analog Devices due to the cost of
-these devices.
+We evaluated multiple options for the pulse shaping amplifier in our design. For both sampling and stimulus, we work
+with fully differential signals, so Current Mode Logic (CML) devices, which are widely used in high-speed logic, are a
+natural fit. We settled on four parts for evaluation in this paper: A \partno{74LVC2G157} standard logic IC, two
+HDMI/DisplayPort redrivers, \partno{PI3HDX12211} and \partno{TDP0604}, as well as \partno{MAX3748}, a limiting amplifier
+for optical networking. Figure\ \ref{fig_pic_amps} shows the four hand-soldered prototypes. We avoided specialty parts
+such as the CML-output comparators made by Analog Devices due to cost.

 \begin{figure}
    \centering
@ -505,74 +496,41 @@ these devices.
        \includegraphics[width=0.9\textwidth]{pic_pi3hdx_small.jpg}
        \caption{PI3HDX12211}
    \end{subfigure}
-    \caption{Circuit-board implementation of the four pulse amplifier variants of the design. Amplifiers were mounted
-    dead bug style on a piece of copper tape connected to one of the supply rails and hooked up with
-    \qty{120}{\micro\meter} diameter wire according to their respective datasheets. Supply rails were hooked up using
-    copper tape where possible to reduce series impedance. Additional \qty{10}{\micro\farad} MLCC power supply
-    decoupling capacitors were placed close to the ICs on the copper tape to reduce loop area.}
+    \caption{Implementation of the pulse amplifier variants of the design. Amplifiers were mounted dead bug style on
+    copper tape and connected with \qty{120}{\micro\meter} wire. Supply rails were connected with copper tape where
+    possible to reduce impedance. MLCC power supply decoupling capacitors were placed on the copper tape to reduce loop
+    area.}
    \label{fig_pic_amps}
 \end{figure}

 \paragraph{Standard logic ICs.}
-As a baseline, we evaluated the \partno{74LVC2G157} standard logic IC. This IC contains a single multiplexer, however,
-we are not interested in the multiplexer functionality. The interesting trivia about this chip is that it also is one of
-the only \partno{74} series standard logic parts that have complimentary outputs. According to manufacturer
-specifications, at a comparable \qty{20}{\pico\farad} load, \partno{74LVC} series parts have slightly faster rise and
-fall times compared to our \partno{STM32} microcontroller's digital IO
-pins\cite{renesaselectronicscorporationApplicationNoteAN2242019}.
+As a baseline, we evaluated the \partno{74LVC2G157} CMOS multiplexer configured to provide complementary outputs.
+According to manufacturer specifications, this part provides slightly faster rise and fall times than
+oumicrocontroller\cite{renesaselectronicscorporationApplicationNoteAN2242019}.

 \paragraph{Optical Networking Chipsets.}
-A category of CML-output drivers suitable for our application is a class of optical networking chipset ICs. While
-today, the construction of optical transmitters has moved to direct bonding of optical components and driver ICs to
-minimize parasitics, discrete driver ICs for some chipsets from the mid-2000s era are still available at reasonable
-cost. Both the laser driver used to drive the transmitter laser diode, and the limiting amplifier used to amplify the
-receiver photodiode's output can be used in our application, with the limiting amplifier part requiring less additional
-circuitry in our application due to its lack of output bias control. In our evaluation below, we include the
-\partno{MAX3748} limiting amplifier as a representative part from this category that is still commercially available. A
-drawback of relying on a part like this is that its future availability is uncertain given the evolution of the
-industry.
+Optical transceivers use CML-output limiting amplifiers and laser drivers, some of which are still available as discrete
+components despite the industry moving from PCB implementations to direct bonding. We evaluated the \partno{MAX3748}
+limiting amplifier as a representative part from this category.

 \paragraph{Bus Redrivers.}
-The final category of amplifiers suitable for our pulse shaping needs is redrivers intended for high-speed data
-interfaces such as USB 3, PCI Express, HDMI, or DisplayPort. All of these interfaces use CML drivers, with differential
-voltage levels usually in the order of \qtyrange{600}{1000}{\milli\volt}. \emph{Redriver} ICs are intended to be used to
-amplify the sensitive high-speed bus signal at the edge of a PCBA, either before it leaves the board through a connector
-to ensure adequate signal levels at the connector, or after it enters through a connector to compensate for loss in the
-PCB traces between the connector and the signal's destination. For our application, redrivers intended for HDMI and
-DisplayPort applications are most suitable, as they can usually be configured to act as simple amplifiers without
-processing any protocol logic on the signals that are amplified. In contrast, both USB 3 and PCIe redrivers often
-implement power saving features that try to parse parts of the actual signal transmitted through them, which are hard to
-bypass in our application.
+Most modern, high-speed buses like USB 3, PCI Express, HDMI, and Display Port use CML drivers. \emph{Redriver} ICs
+intended to amplify such signals to compensate for loss in connectors or cables contain amplifiers that are suitable for
+our application. HDMI/DisplayPort redrivers are most suitable since they can be configured as simple amplifiers,
+turning off any signal-dependent power saving features.

-Redrivers can be classified according to their way of operation. \emph{Retimers} include a full
-serialization/deserialization (SerDes) setup and parse the low-level protocol of the bus to reconstruct bit-level
-timing. We focus only on simpler redrivers that only contain amplifiers and (analog) equalizers here.
-
-Amplifying redrivers can be separated into two classes: Limiting and linear redrivers. A limiting redriver is configured
-to have a high gain such that a small input signal will be amplified to the full output voltage swing. Limiting
-redrivers are well-suited for our application, but they have come out of fashion since they interfere with link training
-and with power saving features of protocols like USB 3.
-
-Linear redrivers are constructed with a low gain instead. Sufficient to compensate for wiring losses, their gain is low
-enough to leave them transparent to bus protocol features such as link training or power saving features. To compensate
-for their reduced gain, linear redrivers usually contain configurable equalizers that can be used to apply targeted
-enhancements for particular signal defects, such as boosting high-frequency gain or providing a set amount of overshoot.
-Where available, in  our prototype variants we set these equalization features to provide maximum gain.
-
-In our evaluation below, we include \partno{PI3HDX12211} as a linear redriver intended for DisplayPort and HDMI
-applications, as well as \partno{TPD0604} as a ``hybrid'' linear or limiting redriver for HDMI applications, configured
-for limiting mode in our experiments. An attractive feature of both of these chips as well as comparable devices is that
-they usually include at least four independent channels, so only one chip is needed for both pulse paths. Additionally,
-they are consumer mass market parts, resulting in a low price. For instance, \partno{PI3HDX12211} is available at
-\price{2.11}{\euro} in single quantity and less than \price{1.30}{\euro} at a quantity of several hundred at distributor
-LCSC, and \partno{TPD0604} is available at \price{4.72}{\euro} and \price{3.44}{\euro}, respectively, at distributor
-Mouser.
+In our evaluation below, we include \partno{PI3HDX12211} and \partno{TPD0604}, two inexpensive, consumer mass market
+redrivers\footnote{
+    \partno{PI3HDX12211} is available at \price{2.11}{\euro} in single quantity and less than \price{1.30}{\euro} at a
+    quantity of several hundred at distributor LCSC, and \partno{TPD0604} is available at \price{4.72}{\euro} and
+    \price{3.44}{\euro}, respectively, at distributor Mouser}.
+Both parts have four independent channels, so only one chip is needed for the two pulse paths.

 \subsection{Cost Breakdown}

-Table\ \ref{tab_bom} shows a breakdown of the cost of the main components of our prototype, resulting in a total
-component cost of less than \price{10}{\euro}. We did not include power supply components in this breakdown as our
-circuit is meant to be embedded into a payload circuit that will already have sufficient power supplies.
+Table\ \ref{tab_bom} shows a breakdown of the cost of the main components of our prototype, totalling less than
+\price{10}{\euro}. We did not include power supply components in this breakdown since our circuit is meant to be
+embedded into a payload circuit that will already have sufficient power supplies.

 Due to its \partno{HRTIM} peripheral, the \partno{STM32G4} microcontroller is the component of our design that is
 hardest to replace. However, this part can still be replaced with a wide range of FPGAs, which commonly include
@ -595,10 +553,8 @@ of Xilinx 7 Series FPGAs provides the same $\frac{1}{32}$ clock cycle resolution
        &25&0.01&Various resistors\\\hline
        \multicolumn{2}{r}{}&\textbf{9.67}&\textbf{Total}
    \end{tabular}
-    \caption{A cost breakdown of the major components of our design. Listed prices are for 1000 pieces order quantity to
-    make prices more comparable between distributors. The number of switches necessary for signal routing and
-    termination depends on the specific mesh signal routing of the application. Numbers shown here are for our
-    prototype, which can measure a mesh from both ends and supports short, open and matched termination.}
+    \caption{Cost breakdown of our prototype design. Prices are listed at order quantity 1000 to make prices more
+    comparable between distributors.}
    \label{tab_bom}
 \end{table}

@ -606,61 +562,34 @@ of Xilinx 7 Series FPGAs provides the same $\frac{1}{32}$ clock cycle resolution
 \label{sec_scan_schedule}

 The goal of a time domain reflectometer is to send a pulse into the Device Under Test (DUT)--i.e.\ in our application,
-the mesh--and to record all reflections returning from the DUT afterwards. In something like a security mesh whose
-traces might only be a few meters long in total, the time span between the pulse being sent and the last reflections
-from the very end of the mesh arriving is in the order of several tens of nanoseconds. Directly recording a response at
-this timescale would be infeasible using a commodity microcontroller, so we utilize an equivalent time sampling
-approach.
+the mesh--and to record all reflections returning from the DUT afterwards. In a security mesh with a few meters of total
+trace length, the time span between the pulse being sent and the last reflections arriving from the end of the mesh is
+in the order of tens of nanoseconds. Directly recording a response at this timescale would be infeasible in a commodity
+microcontroller, so we use equivalent time sampling.

 As shown in Figure\ \ref{fig_block_diagram}, our analog frontend contains amplifiers that produce the stimulus pulse, a
 sampling gate with amplifiers, and a coupler that couples the pulse into the mesh and couples the reflections back into
-the sampling gate. A microcontroller controls this frontend with two primary signals: A stimulus pulse, and a sampling
+the sampling gate. A microcontroller controls this frontend with two main signals: A stimulus pulse, and a sampling
 pulse. By adjusting the timing between these two pulses every time a stimulus pulse is sent, the microcontroller can
-select a particular point in time after the stimulus pulse to record using the sampling gate. By slowly sweeping across
-the whole time span, the microcontroller can reconstruct the waveform of the reflected signal at the sampling gate
-across one period of the stimulus pulse. The recording rate of this waveform is limited by the repetition rate of the
-stimulus pulse as well as the time step size.
+sample the response at any chosen point in time. By sweeping across the whole time span, the microcontroller can
+reconstruct the waveform of the reflected signal at the sampling gate.

-The attainable repetition rate of our stimulus and sampling circuits is limited by two main components. First, the
-sampling post-amplifier's bandwidth limits the maximum sample rate. In our design, we chose an \partno{OPA1656}
-\qty{50}{\mega\hertz} Gain-Bandwidth Product (GBP) FET input low noise operational amplifier. We need a FET input part
-to avoid loading the sampling gate. The comparatively high GBP and the low noise input stage of this device allow us to
-amplify small signals that could result from weak reflections in small impedance discontinuities inside the mesh.
+In our prototype, we sample the response once after each stimulus pulse. We conservatively decided on a sampling rate of
+\qty{1}{MSps} across both channels of the mesh's differential pair. This sampling rate leaves some headroom to the
+\qty{50}{\mega\hertz} Gain-Bandwidth Product (GBP) of the \partno{OPA1656} frontend opamp, as well as the \qty{4}{MSps}
+that the ADCs can reach. The processing speed of the microcontroller allows individual control of the timing of each
+sampling pulse.

-The second major factor limiting repetition rate is the microcontroller's ADC speed, as well as the speed of the
-software processing the ADC's output. At full \qty{12}{b} resolution, this corresponds to a sampling rate of
-approximately \qty{4}{MSps}. The microcontroller contains five ADCs, which can be interleaved to achieve higher rates.
-
-Combining these factors, we conservatively decided on a sampling rate of \qty{1}{MSps} across both channels of the
-differential pair. At this sampling rate, it is feasible to control the sample timing on a sample-by-sample basis. For
-all measurements in this paper, we use a sequential sampling approach where the microcontroller takes a series of
-measurements for oversampling at a particular delay, and then increases the delay by one \partno{HRTIM} output clock
-interval.
-
-In our prototype, one sweep of a \qty{188}{\nano\second} time span consisting of $1024$ data points took
-\qty{710}{\milli\second} at $256\times$ oversampling and \qty{1.1}{\second} at $384\times$ oversampling. The time span
-corresponds to \qty{28}{\meter} of mesh length, which at a \qty{200}{\micro\meter} pitch corresponds to a mesh area of
-\qty{113}{\centi\meter\squared} and at a \qty{1}{\milli\meter} pitch corresponds to
-\qty{565}{\centi\meter\squared}. Using the same microcontroller, by optimizing timing, moving oversampling processing
-out of the interrupt handler, and by interleaving four of the microcontroller's five ADC peripherals, the lower limit of
-acquisition time of a $1024$-point scan is \qty{33}{\milli\second} for $256\times$ oversampling and
-\qty{49}{\milli\second} for $384\times$ oversampling.
-
-While for our development, sequential scanning is adequate, in a future practical application, two simple optimizations
-would decrease the time to detection for an attack. First, in a practical application, the range of scanned delays
-should be adjusted to the length of the particular security mesh in use. For this paper, we always
-scanned a time range of $1024$ points at \qty{184}{\pico\second} spacing starting before one stimulus pulse and ending
-shortly before the next stimulus pulse so that any waveform artifacts will be visible. In a practical application, there
-would be little information gained by sampling much beyond the edges of the expected mesh response, so the scan window
-should be kept small to increase scan rate.
-
-Secondly, in a practical application, the feature that is most relevant to detect tamper attempts is the trailing edge
-of the mesh's response. This trailing edge corresponds to the return of the stimulus pulse's reflection at the far end
-of the mesh. Any attack that affects the impedance even only of part of the mesh has a high chance of affecting its
-delay, and thus this trailing edge is likely to move. In a practical application, it would thus be efficient to use a
-heuristic scan schedule instead of the sequential scan we are using in our research prototype. Such a heuristic schedule
-would sample delays near the expected trailing edge of the particular mesh in use more frequently compared to delays
-that lie somewhere else, such as in the middle of the mesh's return window.
+% major revision: Since we did all measurements for the majR with only 768 samples, we re-scaled the numbers in this
+% paragraph accordingly.
+% FIXME mention in majR letter.
+In our prototype, one sweep of a \qty{141}{\nano\second} time span consisting of $768$ data points took
+\qty{825}{\milli\second} at $384\times$ oversampling. The time span corresponds to \qty{21}{\meter} of mesh length,
+which at a \qty{200}{\micro\meter} pitch corresponds to a mesh area of \qty{85}{\centi\meter\squared} and at a
+\qty{1}{\milli\meter} pitch corresponds to \qty{426}{\centi\meter\squared}. By optimizing timing, moving oversampling
+processing out of the interrupt handler, and by interleaving four instead of two of the microcontroller's five ADC
+peripherals, the lower limit of acquisition time of a $768$-point scan is \qty{37}{\milli\second} for $384\times$
+oversampling.

 \section{Experimental Evaluation}

@ -1109,13 +1038,8 @@ thinking about attacker capabilities. Applying their taxonomy, our monitoring sy
 a patching attack from a \emph{skilled} attacker to an \emph{expert} attacker, and the equipment requirement from
 \emph{standard} equipment to \emph{bespoke} equipment such as dielectric drill bits and ceramic soldering tips.

-% https://tex.stackexchange.com/questions/336201/vertical-highlight-of-a-paragraph
-\begin{tcolorbox}[breakable,
-    enhanced,
-    colback=yellow!10!white,
-    boxrule=0pt,frame hidden,
-    borderline west={1mm}{-2mm}{highlightgreen}]
-
+% FIXME peer review only, for major revision @ TCHES
+\color{highlightgreen}
 \begin{figure}[H]
    \begin{subfigure}{0.5\textwidth}
        \includegraphics[width=\textwidth]{fig_covar_patch_repeat_tridelta_all_the_data_p0.3.pdf}
@ -1262,33 +1186,32 @@ a patching attack from a \emph{skilled} attacker to an \emph{expert} attacker, a
    \caption{}
    \label{}
 \end{figure}
-\end{tcolorbox}

+% FIXME peer review only, for major revision @ TCHES
+\color{black}
 \section{Future Work}

-\paragraph{Design variants.} The \partno{STM32G4}'s \partno{HRTIM} peripheral is limited by to the comparatively slow
-maximum system clock speed of \qty{168}{\mega\hertz} to a timing resolution of \qty{184}{\pico\second}. While we have
-demonstrated that this is sufficient to detect and localize several attack variants, it would be interesting to increase
-time resolution since in our measurements, we observed that the end-to-end jitter of our frontend is low enough that our
-circuit would benefit from finer delay control. In our prototype, we implemented a--so far unused--adjustable power
-supply for the \partno{74LVC} series buffer in between the \partno{HRTIM} outputs and the pulse amplifier. By adjusting
-this buffer's power supply through one of the microcontroller's digital-to-analog converter (DAC) channels, we expect
-that it should be possible to exploit the supply voltage dependency of the propagation delay of \partno{74LVC} series
-CMOS logic to create a digitally controllable delay with picosecond resolution. The internal DLL of the \partno{HRTIM}
-peripheral is likely implemented similarly.
+\paragraph{Design variants.} We found that the timing jitter of our sampling frontend is low enough to reach the
+\qty{184}{\pico\second} resolution limit of the \partno{STM32G4} \partno{HRTIM} peripheral.  In our prototype, we
+implemented a -- so far unused -- adjustable power supply for the \partno{74LVC} series buffer in between the
+\partno{HRTIM} outputs and the pulse amplifier. By adjusting this buffer's power supply through one of the
+microcontroller's digital-to-analog converter (DAC) channels, we expect that it should be possible to exploit the supply
+voltage dependency of the propagation delay of \partno{74LVC} series CMOS logic to create a digitally controllable delay
+with picosecond resolution.

-% FIXME reword for publication
-\paragraph{System design.} The work we presented in this paper is complementary to the work previously presented by
-\textcite{gotteCantTouchThis2022}, where the authors improved security of a simple security mesh made from standard PCBs
-through mechanical motion. We are currently working on a prototype combining both approaches and incorporating heuristic
-scan scheduling as mentioned in Section\ \ref{sec_scan_schedule}.
+\paragraph{Non-sequential sampling.} Not all parts of the reflected signal are equally sensitive to tampering atttempts.
+For instance, the reflection's trailing edge corresponds contains information on both the length of the mesh and on its
+attenuation. Instead of recording the response waveform in a linear scan, in a practical application, more relevant
+parts of the response such as this trailing edge could be scanned at a higher rate than other, less relevant parts.
+Similarly, fast scans at a coarse time resolution could be interleaved with slow scans at a finer time resolution to
+detect large changes more quickly.

-\paragraph{Auxiliary applications.} In this work, we have presented a design for a low-cost, embedded TDR frontend.
-Besides security mesh monitoring, through multiplexing this TDR frontend could be used for other system monitoring
-tasks from tamper sensing to system health monitoring. For instance, \textcite{vaiSecureArchitectureEmbedded2015}
-propose an approach for checking the integrity of a PCBA using an external Vector Network Analyzer (VNA) attached to
-test points on the PCBA's Power Distribution Network (PDN). TDR can produce fingerprints similar to a VNA and it would
-be interesting to measure parts of the secure subsystem other than its security mesh using our TDR frontend.
+\paragraph{Auxiliary applications.} The low-cost, embedded TDR frontend presented in this paper could be used for other
+monitoring tasks from tamper sensing to system health monitoring. For instance,
+\textcite{vaiSecureArchitectureEmbedded2015} propose checking the integrity of a PCBA using an external Vector Network
+Analyzer (VNA) attached to test points on the PCBA's Power Distribution Network (PDN). TDR can produce fingerprints
+similar to a VNA and it would be interesting to measure parts of the secure subsystem other than its security mesh using
+our TDR frontend.

 \section{Conclusion}