Comments for 42nd IEEE VLSI Test Symposium 2024 https://tttc-vts.org/public_html/new/2024 Tue, 09 Jun 2020 07:42:28 +0000 hourly 1 https://wordpress.org/?v=6.6.1 Comment on RP10-1 – Ultra-Wideband Modulation Signal Measurement Using Local Sweep Digitizing Method by Takeshi Iwasaki https://tttc-vts.org/public_html/new/2024/virtual-conference/rp10-1/#comment-2428 Tue, 09 Jun 2020 07:42:28 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=2870#comment-2428 In reply to Koji Asami.

Thank you for your reply and I am terribly sorry about the delay in my expression of gratitude. I understand thanks to you. Thank you very much.

]]>
Comment on RP10-1 – Ultra-Wideband Modulation Signal Measurement Using Local Sweep Digitizing Method by Koji Asami https://tttc-vts.org/public_html/new/2024/virtual-conference/rp10-1/#comment-2427 Thu, 21 May 2020 00:58:51 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=2870#comment-2427 In reply to Takeshi Iwasaki.

Dear Dr. Iwasaki, thank you very much for your question. This method assumes the repetitive waveform and the entire waveform of one period is captured in one measurement. This means that the captured waveform is stable, not swept. Even if the waveform has any packet structure like Fig. 9 in the paper, this method can be applied as long as the entire waveform is repeating, and this can be practical for ATE. I hope above answers your question.

]]>
Comment on RP10-1 – Ultra-Wideband Modulation Signal Measurement Using Local Sweep Digitizing Method by Takeshi Iwasaki https://tttc-vts.org/public_html/new/2024/virtual-conference/rp10-1/#comment-2426 Wed, 20 May 2020 05:27:35 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=2870#comment-2426 Thank you for your presentation. I have questions for your presentation. My name is Iwasaki in AKM@Japan.
You said that The frequency band taken out by the BPF can be controlled by changing the local oscillation frequency, and you can reproduced by overlapping each subband at the edge frequency in Fig. 4.(your regular paper). but I think that waves can’t be overlapped, because each wave are swept, so each frq of waves are different. Can you tell me how to overlap each wave which frq are different from each other.

]]>
Comment on ST-3 – Scalable Functional Validation of Next Generation SoCs by Pal Debjit https://tttc-vts.org/public_html/new/2024/virtual-conference/st03-2/#comment-2425 Tue, 12 May 2020 17:21:06 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=3105#comment-2425 In reply to Sohrab Aftabjahani.

Thank you for attending my talk, Dr. Aftabjahani. Please find my responses below.

1. I collected the third animated figure of Slide 2 from the internet as a graphical reference to illustrate the complexities of SoC validation challenges and how the cost of such validation increases exponentially with the complexity of the SoC. The reference of the plot is given in the same slide as https://www.slideshare.net/santoshverma336/soc-system-on-chip-66887857. I believe I overlooked the linear SiP curve as I was focussing more on the complexity/cost. My apologies.

2. In Slide 8, the NoM refers to the number of messages traced before a failure symptom (such as a crash) is observed during post-silicon execution. The NoC refers to the number of cycles the design executed before a failure symptom (such as a crash) is observed during post-silicon execution. These two metrics together show that the bugs that we injected in our OpenSPARC T2 SoC case studies are realistic, subtle, and hard-to-detect.

]]>
Comment on ST-7 – Energy Efficient and Reliable Deep Learning Accelerator Design by Jeff Zhang https://tttc-vts.org/public_html/new/2024/virtual-conference/st07/#comment-2424 Tue, 12 May 2020 04:00:24 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=3115#comment-2424 In reply to Sohrab Aftabjahani.

Hi Sohrab, thank you for your questions!
1. Yes, from our empirical study, we found that the higher-order bits of outputs are usually on the critical paths after synthesize. Thus, when we underscale the voltage, the timing errors would happen first on those higher order of bits, which result the bit flipping.
2. The DNN model itself is still correct, i.e., as long as it is running on the hardware without any defects, it behaves exactly as expected. To run a DNN model in hardware, there are multiple ways to do the mapping. FAP+T still maps the DNN model to TPUs to ensure the functional correctness. However, since the underlying hardware is faulty, it figures out which weights in the DNN models are mapped to faulty MACs, then it prunes those weights and retrains the remaining weights. A detailed definition of each scheme can also be found in our Design&Test paper “Fault-Tolerant Systolic Array Based Accelerators for Deep Neural Network Execution.”

]]>
Comment on ST-7 – Energy Efficient and Reliable Deep Learning Accelerator Design by Jeff Zhang https://tttc-vts.org/public_html/new/2024/virtual-conference/st07/#comment-2423 Tue, 12 May 2020 03:55:18 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=3115#comment-2423 In reply to Adtih Singh.

Hi Adtih, thank you for your questions.

1.(i)
Yes, we are using the buffers to make sure the minimum path delay is greater than the propagation delay plus the hold time constraint. It is the same technique that Razor flip-flops [MICRO’03] make use of to handle the short-paths. The clock delay of Razor was set at ½ of the clock period. This simplified the generation of the delayed clock while the short-path constraints could still be easily met and resulted in a power overhead (due to buffers) of less than 3%. The area overhead for Thundervolt is about 10% compared to the baseline TPU (similar to the numbers reported in Razor).

1.(ii)
The flip-flop metastability issue is also addressed in the original Razor flip-flops [MICRO’03]. We’re using the same technique, i.e., by using a metastability-tolerant comparator.

1.(iii)
Say MAC1 has a timing error and MAC2 is the MAC unit after MAC1. In a nutshell, we “drop” the computation being performed by MAC2, and borrow time from MAC2’s computation to complete MAC1’s computation correctly. That is, when MAC1 incurs a timing error, it steals the next clock cycle from MAC2 to correctly finish its own update to the partial sum, and bypasses (or drops) MAC2’s update.
For the implementation, TE-Drop adds a multiplexer (MUX) controlled by the error signal from the prior MAC unit. If the previous MAC unit incurs an error, the MUX forwards the previous MAC’s correctly computed partial sum (obtained from Razor’s shadow flip- flop) to the next MAC unit; if not, the current MAC unit updates the partial sum as it normally would and forwards to the next MAC.
(Note that the reason we drop MAC2’s contribution is that timing errors are only detected halfway through MAC2’s compute cycle, by which time MAC2 has already started adding its own contribution to MAC1’s incorrectly accumulated value. To “drop” the error induced by MAC1, MAC2 would have to re-do its entire computation in the remaining one-half clock cycle, which may not be possible.)

2.
The MACs are synthesized with Cadence Genus with FreePDK 45nm Technology Library by the default high optimization effort setting, and let Cadence pick the multiplier architecture and perform logic optimization. In our subsequent ICCAD’19 paper (FATE) we actually compared two different fixed-point representations (signed magnitude, and 2’s complement) and showed that although both benefit significantly from ThunderVolt, the energy savings with SMR are greater.

]]>
Comment on ST-7 – Energy Efficient and Reliable Deep Learning Accelerator Design by Jeff Zhang https://tttc-vts.org/public_html/new/2024/virtual-conference/st07/#comment-2422 Tue, 12 May 2020 03:47:36 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=3115#comment-2422 In reply to Vivek Chickermane.

Hi Vivek, Thank you for the questions.
1) We are referring to setup time violations due to supply voltage underscaling, which is the typical target for timing error detection/recovery methods that make use of Razor flip-flops [MICRO’03]. We note that Razor flip-flops introduce extra hold-time constraints that are handled using buffering.
2) Yes, they can be, but that would result in a performance degradation because the frequency would have to be set conservatively. Instead, with Thundervolt, we are able to run at an aggressive frequency (i.e., without compromising performance). Thunderbolt can actually deal with any source of timing errors such as those due to aging, process/temperature variation, without scaling the clock frequency.
3) Our primary power reductions in ThunderVolt actually come from reducing (actually underscaling) supply voltage Vdd (note that power is quadratically dependent on supply voltage). We additionally gate idle registers during “Zero-Skipping” but this is not our primary contribution.
4) ThunderVolt reduces power by reducing Vdd: this results in a reduction in both average power (and heating) and thus lower overall energy consumption but also reduces instantaneous power.
5) Timing speculation methods like Razor (on which ThunderVolt develops) detect timing errors caused not only by data-dependent variations in timing behavior but by a variety of noise sources including process variations, cross-talk, and supply noise. While in our results we model data-dependent timing errors and process variations, the technique will also work in the presence of other noise sources.
6) ECC approaches are typically used to protect errors in memory structures like DRAM and SRAM, but our target is the core logic, i.e., the systolic array.
7) Indeed, these traditional redundancy methods come with a significant area and energy overhead since it requires extra MAC units. In contrast, our proposed fault-tolerance scheme does increase the size of the systolic array (there is a small area penalty of <9% due to extra MUXes), while still continuing to provide high-accuracy even at high fault rates.
8) Yes, indeed our method already assumes that test-time LBIST/MBIST techniques are used to identify faulty MAC units during and potentially periodically during the lifetime of a chip. The proposed scheme kicks in *after* a faulty MAC unit is identified by skipping it and if necessary re-training the DNN to account for the skipped connections.

]]>
Comment on ST-6 – On-Die Learning: A Pathway to Post-Deployment Robustness and Trustworthiness of Analog/RF ICs by Georgios Volanis https://tttc-vts.org/public_html/new/2024/virtual-conference/st06/#comment-2421 Mon, 11 May 2020 22:51:30 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=3131#comment-2421 In reply to Sohrab Aftabjahani.

Thank you for your questions.
1. The underlying premise of this method is that any distortion imposed by an active hardware Trojan on the parametric profile of an IC has to be systematic and above noise-level, in order to be of utility to an adversary. However, by individually training the decision boundary for each chip, the only margins left for a hardware Trojan to exploit are measurement noise and non-idealities of the invariance extraction and checking circuitry and the training algorithm. Therefore, evading detection becomes particularly challenging. Even for the marginal case that you describe, there is no guarantee that the attacker can exploit the resulted transmission power distortion to steal the key. Moreover, assuming that the adversary stages the hardware Trojan attack through an untrusted foundry, we advocate that the invariant property used for CHTD and the criterion employed to evaluate its compliance should not be publicly known. In fact, to prevent the adversary from reverse engineering them from the layout file, their exact details should be withheld from the design and should be introduced and individualized to each chip fabrication, through non-volatile memory.
2. There exist 6 instances where the footprint of the invariance observation is inside the boundary, even though the hardware Trojan is active. In all 6 of these cases, the following property was true: the key bits leaked when taking the two observations had the same number of ‘0s’. Since the additional power required by a hardware Trojan is a linear function of the number of stolen ‘0s’, we have aliasing. Fortunately, this is only temporary. While the probability that the key bits leaked during the two observations have the same number of ‘0s’ is non-negligible, the probability that this will be the case every time the invariance is checked is infinitesimal. In fact, in my experiment, the next invariance check following each of the six misclassified instances failed the invariance compliance test. In other words, the hardware Trojan is still detected for these cases, but with a latency of one invariance check, or approximately 23 bits of ciphertext transmission.
3. Experimentally, I demonstrated that the proposed method for extracting and checking the invariant property does not introduce false positives in Trojan-free circuits. Initially, from Trojan-dormant circuits, invariance observations were collected which are used as the training set of the CHTD checker. Then, under minor perturbations in operating conditions (i.e., power supply and temperature), in order to account for the impact of measurement noise, a second set of observations was collected (i.e., validation set). The boundary learned by the trained 1-class classifier successfully encloses both the training and the validation set. This result remained valid in all cross-validation iterations. This corroborates that, even in the presence of measurement noise, the invariance extraction circuit and the trained classifier operate correctly and do not inadvertently assert the CHTD output for a Trojan-free circuit (i.e., no false positives).
4. The ability of the proposed method to withstand attacks is based on three facts. First, the size of the analog input space and the time-consuming process of applying and evaluating candidate analog keys prohibits brute force attacks. Second, due to the high entropy of the function learned by the neural network, when incorrect analog keys are applied the resulting analog IC performance does not provide guidance towards the correct key. Lastly, programmability of the analog neural network through the use of analog floating gate transistors, which serve as permanent storage for the synapse weights, enables individualization of the key per chip.

]]>
Comment on ST-1 – Assuring Security and Reliability of Emerging Non-Volatile Memories by Khan Mohammad Nasim Imtiaz https://tttc-vts.org/public_html/new/2024/virtual-conference/st01/#comment-2420 Mon, 11 May 2020 18:19:53 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=3095#comment-2420 In reply to Sohrab Aftabjahani.

Dear Dr. Aftabjahani,
Thank you for your questions. Please find my inline answers below:

1. In slide 3, what MBI and MBI-BI stand for?
Answer: MBI stands for Magnetic Burn-In and MBI-BI stands for Magnetic Burn-In with traditional thermal Burn-In.

MBI: Retention time of spintronic memories reduces as the magnitude of an external magnetic field applied on the chip increases. Therefore, I have proposed to apply an external magnetic field on spintronic memories during retention testing to reduce the test time by 1E12X.

MBI-BI: Retention time of spintronic memories also reduces as the magnitude of an external thermal field applied on the chip increases. Therefore, I have proposed to apply external thermal field along with external magnetic field to further reduce retention test time.

2. Inn slide 14, why are NVM memories prone to row hammer attacks?
Answer: Emerging NVMs incur high write current which can create high supply noise such as ground bounce. This can be leveraged by an adversary to launch row hammer attack. Adversary can keep writing in his memory space and generate ground bounce, which can propagate to Wordline drivers of unselected cells and partially turn them ON. Therefore, they will incur a disturb current which can lead to retention failure. Further details are given below:

Let’s consider the write current of each bit is 100µA. Therefore, the total write current is 51.2mA. Additionally, we know that the true ground and Vdd of a chip are implemented at a higher metal layer (e.g. metal-8, M8), and local ground/Vdd are implemented in the lower metal layer (e.g. metal-1, M1). Therefore, there exists a parasitic resistance and capacitance between true and local ground or Vdd. During a write operation, the total write current will be dumped into the local ground of the corresponding sub-array. The local ground will experience a voltage bounce. This bounce will propagate to the word-line/source-line/bit-line drivers of the neighboring bits. If the bounce propagates to word-lines drivers, the unselected bits that share the same bit-line/source-line drives will partially turn the access transistor ON and a disturb current will pass through them. These bitcells will experience retention failure and read disturb. Furthermore, if the bounce propagates to source-line/bit-line drivers, the bitcells will experience lower voltage headroom. Therefore, read/write operations may fail. In summary, the unselected (selected) bits will suffer from retention issues (read/write failure) if they experience continuous disturb current (lower voltage headroom) during retention (read/write) modes.

Therefore, in shared memory, an adversary can launch row hammer attack by keep writing into his memory space and inject failure into another user’s read/write/retention mode.

]]>
Comment on ST-4 – Machine Learning-Based Hotspot Detection: Fallacies, Pitfalls and Marching Orders by Gaurav Rajavendra Reddy https://tttc-vts.org/public_html/new/2024/virtual-conference/st04/#comment-2419 Mon, 11 May 2020 17:46:11 +0000 http://tttc-vts.org/public_html/new/2020/?page_id=3127#comment-2419 In reply to Sohrab Aftabjahani.

Dear Dr. Aftabjahani,
Thank you for the questions.

1) (a) For our experiments in [1], we generated about 200 synthetic patterns for every known hotspot. The data enhancement reduced prediction error by about 57% in comparison to the State-Of-The-Art (SOTA). Furthermore, we performed a detailed analysis, wherein, we varied the number of synthetic patterns generated per known hotspot and observed the change in error. We found that an addition of a mere 40 synthetic patterns (per known hotspot) reduces classification error by about 36%. For detailed results, we refer you to Figure 7 of [1].
(b) Generally, the user can pick the optimal number of synthetic patterns by performing a trade-off between error reduction and lithography simulation overhead. However, from our experiments, we have found that about 200 synthetic patterns per known hotspot is optimal. Beyond that, we continue to observe a reduction in error but at a lower rate and, eventually, we reach a point of diminishing returns.

2) Yes, Machine-Learning is an integral part of our hotspot detection strategy. The enhanced dataset generated using synthetic pattern generation, is used as the training dataset for an ML entity. When new customer layouts arrive at a foundry, they are decomposed into snippets, and tested using the trained model. The model flags potential hotspots in the testing datasets. Foundries can, then, make minor changes to these patterns or request design fixes to make them safe.
In [1], we used a Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel as our classifier. In the extended journal version of this manuscript, which is under preparation, we have experimented with other classifiers such as Convolutional Neural Networks (CNNs), and have observed similar improvement in results. Essentially, demonstrating that the DB enhancement strategy propose herein, works with a wide range of ML algorithms.

References:
[1] G. R. Reddy, C. Xanthopoulos and Y. Makris, “Enhanced hotspot detection through synthetic pattern generation and design of experiments,” IEEE 36th VLSI Test Symposium (VTS), 2018, pp. 1-6

]]>