## DESIGN OF A TIME-AMPLIFIED, STOCHASTIC PHASE INTERPOLATION TIME-TO-DIGITAL-CONVERTER FOR BIOMEDICAL IMAGING APPLICATIONS

## DESIGN OF A TIME-AMPLIFIED, STOCHASTIC PHASE INTERPOLATION TIME-TO-DIGITAL-CONVERTER FOR BIOMEDICAL IMAGING APPLICATIONS

By CHENGXIN LIU, B. Eng.

A Thesis Submitted to the School of Graduate Studies in Partial Fulfilment of the Requirements for the Degree of Master of Applied Science

McMaster University © Copyright by Chengxin Liu, November 2024

| McMaster University |  |
|---------------------|--|
| Hamilton, Ontario   |  |

Master of Applied Science (2024) (Electrical and Computer Engineering)

| TITLE:           | Design Of a Time-Amplified, Stochastic Phase Interpolation |                |                 |         |
|------------------|------------------------------------------------------------|----------------|-----------------|---------|
|                  | Time-To-Digital-Converter                                  | for            | Biomedical      | Imaging |
|                  | Applications                                               |                |                 |         |
|                  |                                                            |                |                 |         |
| AUTHOR:          | Chengxin Liu                                               |                |                 |         |
|                  | B.Eng. McMaster University                                 | , Ham          | ilton, Canada   |         |
|                  |                                                            |                |                 |         |
| SUPERVISOR:      | Dr. M. Jamal Deen, Distingu                                | ished <b>I</b> | University Prof | essor   |
|                  |                                                            |                |                 |         |
| NUMBER OF PAGES: | xviii, 74                                                  |                |                 |         |

## Lay Abstract

Biomedical imaging is an important tool for the disease diagnostic process and the study of human organs. Enhancing the performance of various biomedical imaging systems is a research hot bed. In recent years, biomedical imaging systems implemented with single-photon avalanche diodes (SPAD) coupled with time-to-digital converters (TDC) to produce time-of-flight (ToF) images became highly desirable due to their improved SNR and depth profiling compared to conventional imaging methods. As a result, it is crucial to improve the performance of TDC as it directly relates to the temporal resolution of SPAD-imager-based imaging systems.

This thesis focusses on the design and measurement of TDC using standard CMOS fabrication processes aiming for high-performance SPAD-imagers in biomedical imaging systems. The following chapters present the recent progress of various biomedical imaging systems and TDCs. Next, we present the design of a custom-designed time-amplified, stochastic phase interpolation (TASPI) TDC. The proposed TDC aims to achieve comparable performance with the current state-of-the-art TDC architectures with minimum silicon footprint. Finally, the measured performance of the proposed TASPI-TDC is presented along with areas for improvement in future design iterations.

### Abstract

Time-to-digital converters (TDC) and single-photon avalanche diodes (SPAD) can be integrated together into SPAD-imagers. TDC is a mixed-signal circuit that can convert the time differences between the two input signals. In SPAD-imagers, the electrical pulses triggered by incident photons are measured against the reference clock to extract time-offlight (ToF) data. The performance of TDC is directly related to the temporal performance of the SPAD-imagers in biomedical imaging systems, such as positron emission tomography (PET) and diffuse optical tomography (DOT). In recent years, the evolution of modern complementary metal-oxide-semiconductor (CMOS) technology made it possible to implant SPAD-imagers for imaging neural activities in moving subjects. This work proposes a new TDC design to further improve future SPAD-imager based time-domain imaging systems.

Firstly, this thesis provides a detailed review of the current research on brain imaging and neural activity recording methods. Next, the operating principles of different TDC architectures are presented. In the following chapter, the proposed time amplified, stochastic phase interpolation (TASPI) TDC architecture was designed and tested in TSMC 65 nm standard CMOS technology nodes that can achieve a ~16 ps resolution with 6 effective number of bits in a 0.06 mm<sup>2</sup> silicon area is presented. Based on the results, areas for future improvements are identified and discussed in detail.

### Acknowledgments

First and foremost, I would like to express my sincere appreciation to Dr. M. Jamal Deen for supervising my research over the past 3 years. The microelectronic course taught by Dr. Deen in my senior year fueled my passion for studying advanced circuit components and systems. I was fortunate to have the unwavering support from Dr. Deen to start my research topics presented in this thesis. Under his guidance, I practice technical and communication skills through research projects and presentations. The invaluable guidance, insightful feedback, and encouragement from Dr. Deen have inspired and motivated me for my future career as a researcher. I would also like to thank my committee members Prof. Chih-hung (James) Chen and Prof. Mohamed Elamien for reviewing my thesis and providing insightful comments on my research.

My growth cannot be possible without the collaborative and stimulating environment fostered by my colleagues and peers in Dr. Deen's team during these three years of study: Wei Jiang, Xuanyu Qian, Junzhi Liu, Sophini Subramaniam, Abu Ilius Faisal, Mahtab Teheri, and Mahdi Naghshvarianjahromi. Especially, Wei Jiang and Sophini Subramaniam deserve recognition for offering their expertise that greatly helped my research topics. Wei always welcomed all my questions and even took much of his time to provide his opinions and guidance on my writing and presentations. During my research on brain and neural imaging, the biological part of it would not be possible without the support from Sophini. I would not have been able to tackle key challenges in the past three years without the help of my colleagues.

The completion of my thesis research was also aided by the McMaster ECE technical staff. For their help setting up our EDA tools and helping to resolve technology licensing issues.

Lastly, I would like to thank Yuxi Chen, my parents, and my grandparents for their support during my graduate studies. This journey would not have been possible without them. I dedicate this thesis to my family.

# **Table of Contents**

| Lay Abs   | tract     |                                                  | iii   |
|-----------|-----------|--------------------------------------------------|-------|
| Abstract  | t         |                                                  | iv    |
| Acknow    | ledgme    | nts                                              | V     |
| Table of  | Conten    | ıts                                              | vi    |
| List of F | igures.   |                                                  | ix    |
| List of T | ables     |                                                  | xiii  |
| List of A | bbrevia   | ations                                           | xiv   |
| List of S | ymbols    |                                                  | xvii  |
| Declarat  | tion of A | Academic Achievement                             | xviii |
| Chapter   | 1 Intro   | duction                                          | 1     |
| 1.1.      | M         | lotivations for Single Photon Biomedical Imaging | 1     |
| 1.2.      | С         | omponents of Single Photon Detectors             | 4     |
|           | 1.2.1.    | Single Photon Avalanche Diodes                   | 4     |
|           | 1.2.2.    | Time-to-Digital Converters                       | 6     |
| 1.3.      | R         | esearch Contributions                            | 7     |
| 1.4.      | T         | hesis Organization                               | 8     |
| Chapter   | 2 Evolu   | ution of Brain and Neural Imaging                |       |
| 2.1.      | In        | troduction                                       | 10    |
| 2.2.      | Μ         | licroscopic Brain/Neural Imaging & Optogenetics  | 11    |
|           | 2.2.1.    | Microscopic Neural Imaging                       | 12    |
|           | 2.2.2.    | Light Sheet Fluorescence Microscopy              | 15    |
|           | 2.2.3.    | Comparison and Discussion                        | 17    |
| 2.3.      | In        | nplantable Neural Imager                         | 19    |
| 2.4.      | C         | hallenges and Future Perspectives                | 26    |
|           | 2.4.1.    | Signal Characteristics                           | 26    |
|           | 2.4.2.    | Brain Tissue Characteristics                     | 27    |

|         | 2.4.3.   | SPAD Characteristics                                     | 27          |
|---------|----------|----------------------------------------------------------|-------------|
|         | 2.4.4.   | System Integration and Requirement                       | 28          |
|         | 2.4.5.   | Wireless Communication                                   | 28          |
|         | 2.4.6.   | Cooling                                                  |             |
|         | 2.4.7.   | Long-Term Recording                                      |             |
|         | 2.4.8.   | Fabrication Requirements                                 | 29          |
|         | 2.4.9.   | Microscopic and Deep Brain Imaging                       | 29          |
|         | 2.4.10.  | Data Processing                                          | 29          |
|         | 2.4.11.  | Implantable Device and Applications                      | 30          |
| 2.5.    | Co       | nclusion                                                 | 30          |
| Chapter | 3 Funda  | mentals of CMOS Time-to-Digital Converters               | 32          |
| 3.1.    | Intr     | roduction                                                | 32          |
| 3.2.    | Cor      | nventional TDC Structures                                | 32          |
| 3.3.    | Tw       | o-Step TDC                                               | 33          |
| 3.4.    | Vei      | rnier TDC                                                | 35          |
| 3.5.    | Pul      | se Shrinking TDC.                                        | 36          |
| 3.6.    | Tin      | ne-Amplified TDC                                         |             |
| 3.7.    | Sto      | chastic Phase Interpolation TDC                          | 40          |
| 3.8.    | Cor      | mparison and Discussion                                  | 43          |
| Chapter | 4 Time A | Amplified, Stochastic Phase Interpolation Time-to-Digita | l Converter |
| •••••   | ••••••   |                                                          | 46          |
| 4.1.    | Intr     | roduction                                                | 46          |
| 4.2.    | Op       | erating Principle                                        | 48          |
| 4.3.    | Cir      | cuit Design                                              | 49          |
|         | 4.3.1.   | Delay Locked Loop                                        | 52          |
|         | 4.3.2.   | Sampler Array                                            | 55          |
|         | 4.3.3.   | Remainder Generation Logic                               | 57          |
|         | 4.3.4.   | Time Amplifier                                           | 58          |
|         | 4.3.5.   | Thermal-to-binary Encoder                                | 60          |

| 4.4.         | Measurement Setup           | 62 |
|--------------|-----------------------------|----|
| 4.5.         | Measurement Results         | 64 |
| 4.6.         | Conclusion and Future Works | 67 |
| Chapter 5 Co | onclusions and Future Work  | 71 |
| 5.1.         | Conclusions                 | 71 |
| 5.2.         | Future Work                 | 72 |
| References   |                             | 74 |

# **List of Figures**

| Figure 1-1: A simplified illustration of PET imaging system2                                                                                                                                                      |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 1-2: (Left) A simplified illustration of a DOT imaging system consists of an array                                                                                                                         |
| of LEDs (red) and detectors (blue). (Right) Time-correlated single photon counting                                                                                                                                |
| methods (TCSPC)                                                                                                                                                                                                   |
| Figure 1-3: A simple illustration of implantable neural imaging. A neural probe integrated                                                                                                                        |
| with both mini-LED (blue) and SPAD imagers (green)                                                                                                                                                                |
| Figure 1-4: Operating principle of single-photon avalanche diode (SPAD). The SPAD is                                                                                                                              |
| first charged to a voltage $V_{SPAD} = V_{BR} + V_{EX}$ . (1-2) The arrival of an incident photon induces                                                                                                         |
| an avalanche current. (2-3) The avalanche current lead to the voltage drops across the                                                                                                                            |
| quenching resistor to quench the SPAD. (3-1) The SPAD is reset and charged to $V_{\text{SPAD}}$                                                                                                                   |
| again for next photon arrival5                                                                                                                                                                                    |
| Figure 2-1: Fluorophore is excited from the ground state to the excited state by a single                                                                                                                         |
| photon (left), two photons (middle), or three photons (right). Fluorescence is then released.                                                                                                                     |
|                                                                                                                                                                                                                   |
|                                                                                                                                                                                                                   |
|                                                                                                                                                                                                                   |
|                                                                                                                                                                                                                   |
|                                                                                                                                                                                                                   |
| 12<br>Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the<br>excitation light through an objective lens. An entire section of the sample is excited and<br>emit fluorescence |
| 12<br>Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the<br>excitation light through an objective lens. An entire section of the sample is excited and<br>emit fluorescence |
| 12<br>Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the<br>excitation light through an objective lens. An entire section of the sample is excited and<br>emit fluorescence |
| 12<br>Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the<br>excitation light through an objective lens. An entire section of the sample is excited and<br>emit fluorescence |
| Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the excitation light through an objective lens. An entire section of the sample is excited and emit fluorescence             |
| 12<br>Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the<br>excitation light through an objective lens. An entire section of the sample is excited and<br>emit fluorescence |
| Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the excitation light through an objective lens. An entire section of the sample is excited and emit fluorescence             |
| Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the excitation light through an objective lens. An entire section of the sample is excited and emit fluorescence             |

Figure 2-4: Illustration of common image reconstruction methods used in microscopic brain and neural imaging. (a) 3D image of the imaging volume is reconstructed by stacking multiple 2D images acquired from multiple depth. (b) Large FoV imaging achieved by overlapping multiple images from different regions. This method is used in. (c) Imaging volume is reconstructed by stacking images of multiple oblique section......15 Figure 2-5: Illustration of different light-sheet microscopy techniques. a. Light sheet fluorescence microscopy. A laser beam is projected as a light sheet using a cylindrical lens to selectively illuminate a single plane inside the imaging volume. **b**. Laser-scanning light sheet fluorescence microscopy. Light sheet is formed by using digitally controlled micrometer scale laser beams to illuminate the imaging plane. c. Swept confocally aligned planar excitation (SCAPE) microscopy. Light sheet is projected obliquely into the imaging Figure 2-6: Structure of angle-sensitive SPAD, (A-SPAD). Two layers of grating, talbot and analyser grating is placed on top of the SPAD. A diffractive pattern is formed depending on the grating pitch and vertical gap between the two gratings. Incident light can only be detected when it approaches the A-SPAD with an angle matching the diffractive Figure 2-7: Time gating single photon counting (TGSPC). Several consecutive time-gate Figure 2-8: Common implantable neural imaging probe and excitation source setup. a. Implantable neural imager with SPAD array integrated on-probe that require external laser for excitation. **b**. Two laser diodes (LDs) are placed at the end and tip of the probe. **c**. SPAD array and micro light emitting diodes (µLEDs) are co-integrated on the probe......23 Figure 3-2: Block diagram of a two-step TDC. Two coarse phases  $(P_C)$  and one fine phase 

| Figure 3-4: Block diagram of a Vernier ring oscillator TDC                                                          |
|---------------------------------------------------------------------------------------------------------------------|
| Figure 3-5: Block diagram of a pulse-shrinking TDC                                                                  |
| Figure 3-6: Block diagram of pulse-shrinking Ring TDC                                                               |
| Figure 3-7: Example of TA-TDC implemented with a $2 \times$ time-amplifier                                          |
| Figure 3-8: Operation concept of the pulse-train amplifier                                                          |
| Figure 3-9: Illustration of a SPI-TDC. Arrows represent the positive edges of each phase.                           |
|                                                                                                                     |
| Figure 4-1: Block diagram of the proposed time amplified, stochastic phase interpolation                            |
| TDC                                                                                                                 |
| Figure 4-2: <i>INLmax, fine</i> versus $\beta$ (fractional part of <i>nTdTCLK</i> ) for different $\gamma$ (integer |
| part of <i>nTdTCLK</i> )                                                                                            |
| Figure 4-3: Critical standard deviation of unit delay due to jitter ( $\sigma j$ , 0), mismatch ( $\sigma m$ , 0)   |
| and their sum ( $\sigma total$ , 0) for each number of bits (b)                                                     |
| Figure 4-4: Schematic of the 205 MHz SPI delay locked loop in coarse and fine TDC52                                 |
| Figure 4-5: Schematic of the current-starved gated delay cell replicated to form the voltage-                       |
| controlled delay lines                                                                                              |
| Figure 4-6: Schematic of the (a) phase-frequency detector and (b) charge pump used in the                           |
| delay-locked loop54                                                                                                 |
| Figure 4-7: Deviation of accumulated delay of the DLL from the ideal value measured in                              |
| both pre-layout (left) and post-layout (right) simulation. $V_{CN}$ and $V_{CP}$ are adjusted to meet               |
| the target delay in process and temperature corner simulation. Simulation temperature is                            |
| 27°C and supply voltage is 100% $V_{DD}$ if not otherwise specified55                                               |
| Figure 4-8: Schematic of the sampler used for each delay cell, consisting of two DFF55                              |
| Figure 4-9: Example of sampler output S[i] prematurely switched when STOP signal is                                 |
| larger than 2 ns                                                                                                    |
| Figure 4-10: Schematic of the remainder generation logic used for each GDE57                                        |
| Figure 4-11: Schematic of the pulse-train time amplifier. Inset showing the schematic of                            |
| the pseudo-OR gate                                                                                                  |

| Figure 4-12: Gain error of TA used in the proposed TASPI-TDC in various PVT conditions.     |
|---------------------------------------------------------------------------------------------|
| Extracted from both pre- (left) and post-layout (right) simulations. Simulation temperature |
| is 27°C and supply voltage is 100% V <sub>DD</sub> if not otherwise specified60             |
| Figure 4-13: Schematic of the 5-bit MUX TTBE. The output bit order is shown as B[i]. 60     |
| Figure 4-14: Response of MUX TTBE to number of input phases in decimal. (a) pre-layout      |
| and (b) post-layout simulation results at different PVT conditions. Simulation temperature  |
| is 27°C and supply voltage is 100% $V_{DD}$ if not otherwise specified61                    |
| Figure 4-15: Annotated layout of the complete TDC in the TSMC 65 nm CMOS process.           |
|                                                                                             |
| Figure 4-16: Block diagram of the measurement setup used for the TDC characterization.      |
|                                                                                             |
| Figure 4-17: Quantization characteristics of the 8-bit TDC                                  |
| Figure 4-18: Nonlinearity performance of the 8-bit TDC. Differential nonlinearity (DNL)     |
| is shown at the top, and the integrated nonlinearity (INL) is at the bottom                 |
| Figure 4-19: Quantization characteristics of the 5-bit coarse TDC67                         |
| Figure 4-20: Nonlinearity performance of the 5-bit coarse TDC within the 1 ns testing       |
| range. Differential nonlinearity (DNL) is shown at the top, and the integrated nonlinearity |
| (INL) is shown at the bottom67                                                              |

## **List of Tables**

| Table 2-1: Selected Publications for Different Microscopic Neural Imaging Techniques 18 |
|-----------------------------------------------------------------------------------------|
| Table 2-2: Selected Publications for Different Implantable Neural Imagers               |
| Table 3-1: Summary of Performance of TDC from Selected Publications 45                  |
| Table 4-1: Target performance metrics for the proposed TDC. 48                          |
| Table 4-2: Minimum detectable time interval of sampler cell measured from both pre- and |
| post-layout simulations in various PVT conditions56                                     |
| Table 4-3: Maximum remainder error measured in both pre- and post-layout simulations    |
| under various PVT conditions                                                            |
| Table 4-4: Initialization time of the 5-bit MUX TTBE at different PVT conditions62      |
| Table 4-5. Tested chip conditions and number of measurements                            |
| Table 4-6: Comparison Table of TDC performance with selected publications. 68           |

# List of Abbreviations

| ADI   | Array Density Index                                 |
|-------|-----------------------------------------------------|
| AOD   | Acousto-Optic Deflector                             |
| AP    | Action Potentials                                   |
| AQR   | Active Quench and Reset                             |
| ASIC  | Application Specific Integrated Circuit             |
| CMOS  | Complementary Metal-Oxide-Semiconductor             |
| СР    | Charge Pump                                         |
| DAC   | Digital-to-analog converter                         |
| DCR   | Dark Count Rate                                     |
| DEMUX | Demultiplexer                                       |
| DFF   | D Flip-Flop                                         |
| DL    | Delay Line                                          |
| DLL   | Delay-Locked Loop                                   |
| DNL   | Differential Nonlinearity                           |
| DOT   | Diffuse Optical Tomography                          |
| DR    | Dynamic Range                                       |
| ECoG  | Electrocorticography                                |
| FF    | Fill Factor                                         |
| FLIM  | Fluorescence Lifetime Imaging                       |
| FoM   | Figure-of-Merit                                     |
| FoV   | Field-of-View                                       |
| FPGA  | Field-Programmable Gate Array                       |
| FPS   | Frames Per Second                                   |
| GDE   | Gated Delay Elements                                |
| GINA  | Genetically Encoded Indicators of Neural Activities |
| IC    | Integrated Circuit                                  |

| INL   | Integral Nonlinearity                      |
|-------|--------------------------------------------|
| LED   | Light Emitting Diodes                      |
| LFP   | Local Field Potential                      |
| LoR   | Line-of-Response                           |
| LSB   | Least Significant Bit                      |
| LSFM  | Light Sheet Fluorescence Microscopy        |
| MEA   | Microelectrode Array                       |
| MPM   | Multiphoton Microscopy                     |
| MSB   | Most Significant Bit                       |
| MUX   | Multiplexer                                |
| NIR   | Near-Infrared                              |
| PCB   | Printed Circuit Board                      |
| PDE   | Photon Detection Efficiency                |
| PDP   | Photon Detection Probability               |
| PET   | Positron Emission Tomography               |
| PFD   | Phase Frequency Detector                   |
| PI    | Photonic Index                             |
| PQR   | Passive Quench and Reset                   |
| PTAT  | Proportional to absolute temperature       |
| PSR   | Pulse-shrinking Ring                       |
| PVT   | Process, Voltage, Temperature              |
| QR    | Quench and reset                           |
| RAM   | Random Access Memory                       |
| ROI   | Region-of-Interest                         |
| SCAPE | Swept Confocally Aligned Planar Excitation |
| SLAP  | Scanned Line Angular Projection Microscopy |
| SNR   | Signal-to-Noise Ratio                      |
| SPAD  | Single Photon Avalanche Diode              |
| SPI   | Stochastic Phase Interpolation             |

| Time Amplifier                         |
|----------------------------------------|
| Time-Correlated Single Photon Counting |
| Time-Domain                            |
| Time-to-Digital Converter              |
| Time-Gated                             |
| Time-Gating Single Photon Counting     |
| Time-of-Flight                         |
| True Single-Phase Clock                |
| Typical-Typical                        |
| Voltage-Controlled Delay Line          |
| Vernier Delay Line                     |
| Vernier Ring-Oscillator                |
| Threshold Voltage Based Reference      |
|                                        |

# **List of Symbols**

| $V_{BR}$            | Breakdown Voltage (V)                                |
|---------------------|------------------------------------------------------|
| $V_{EX}$            | Excess Voltage (V)                                   |
| $T_{CLK}$           | Reference Clock Period (s)                           |
| N <sub>Linear</sub> | Effective Number of Bits                             |
| $K^+$               | Potassium                                            |
| Na <sup>+</sup>     | Sodium                                               |
| Cl <sup>-</sup>     | Chloride                                             |
| Ca <sup>2+</sup>    | Calcium                                              |
| $V_m$               | Membrane Potential (V)                               |
| Т                   | Total Propagation Delay                              |
| $T_{C}$             | Unit Delay of The Coarse Delay Line (s)              |
| $T_f$               | Unit Delay of The Fine Delay Line (s)                |
| $T_d$               | Unit Delay (s)                                       |
| $P_{C}$             | Number Of Coarse Phases                              |
| $P_f$               | Number Of Fine Phases                                |
| N'                  | Effective Number of Bits                             |
| n                   | Number Of Delay Units                                |
| $\sigma_{T_d}^2$    | Unit Delay Variance (s <sup>2</sup> )                |
| $\sigma_j^2$        | Unit Delay Variance from Jitter (s <sup>2</sup> )    |
| $\sigma_m^2$        | Unit Delay Variance from Mismatch (s <sup>2</sup> )  |
| $\mu_{T_d}$         | Mean Unit Delay (s)                                  |
| $	au_i$             | Step-Width of TDC Step Response                      |
| N <sub>total</sub>  | Total Number of Counts in TDC Measurement            |
| H(n)                | Cumulative Distribution Function of The TDC Response |
| $F_S$               | Sampling Rate (MHz)                                  |

## **Declaration of Academic Achievement**

This thesis was written by Chengxin Liu under the supervision and guidance of Dr. M. Jamal Deen from McMaster University.

- **Chapters 1:** I present an overview of SPAD-imager based biomedical imaging systems and background information on the SPAD and TDC.
- **Chapter 2:** I present a detailed review of a wide range of brain and neural imaging technologies. Introducing both microscopic and the latest implantable methods.
- Chapter 3: A comprehensive review of existing TDC architectures and inspirational ideas for the proposed work.
- **Chapter 4:** I first present the design and simulation result of the proposed timeamplified, stochastic phase interpolation TDC using the TSMC 65 nm process. Followed by the measurement results and discussion for future areas to improve.
- **Chapter 5:** Based on the literature review and design experience, I discussed several key research challenges for future design iterations.

# Chapter 1 Introduction

### **1.1. Motivations for Single Photon Biomedical Imaging**

TDC is a special type of converter that precisely measures time differences between the two incident signals. When integrated with single-photon avalanche diodes (SPAD) into SPAD-imagers, TDC enables accurate measurement of photon arrival time for timeof-flight (ToF) imaging technologies such as ToF positron emission tomography (PET) and diffuse optical tomography (DOT) [1], [2]. In recent years, technology from these ToF imaging techniques inspires microscopic and implantable imaging systems to understand brain and neural activity [3], [4], [5], [6], [7]. High-performance TDC with picoseconds resolution can significantly improve the temporal resolution and frame rate of SPADimagers to maximize the high sensitivity and fast response benefits of SPAD compared to conventional imaging systems [8], [9]In addition to recent advances in high-density SPAD arrays to achieve high spatial resolution, SPAD-imagers show the capability of high-speed and spatiotemporal resolution imaging of crucial organs at the cellular level. In this section, we will provide a brief overview of the aforementioned applications and operating principles of these SPAD imagers as motivation for the entire research presented in this literature.

Positron emission tomography (PET) is a biomedical imaging technology that uses radioactive markers to diagnose diseases such as tumors at the molecular level [1], [10], [11]. A simplified illustration of a digital PET system using SPAD-based sensors is shown in Figure 2-4, which consists of SPAD pixels and TDC. The injected positron radioactive markers annihilate with the electrons at the tumor's location. The released gamma rays travel in opposite directions and are captured by a ring of scintillators surrounding the

imaging target. Scintillators convert the incident gamma ray into a photon, which is captured by SPAD imagers and converted into electrical signals.



Figure 1-1: A simplified illustration of PET imaging system.

TDC enables SPAD-imagers to timestamp each incident photon into ToF distribution along the line-of-response (LoR), which offers a higher signal-to-noise ratio (SNR) than conventional PET imaging [1], [12]. The SNR of the ToF PET system is inversely proportional to the smallest measurable time difference between the reference signal and the detected photon signal [1], [12]. A high-performance TDC offering higher resolution with high linearity can directly improve the location precision of the ToF PET system.

Diffuse optical tomography (DOT), shown in Figure 2-5, is a non-invasive brain imaging method used to study neural activities and brain diseases [13]. DOT commonly uses near-infrared (NIR) light sources such as pulsed lasers and light-emitting diodes (LED) [13], [14]. Fluorescence scattered through the scull and brain tissue is detected by the photodetector arrays to profile the region of interest (ROI). Time-domain DOT (TD-DOT) is a type of DOT that uses SPADs and TDCs for time-correlated single photon counting (TCSPC), a method used to extract histograms of ToF distribution as shown in Figure 2-5 [2]. By collecting ToF histograms at multiple locations, tomographic brain images can be reconstructed with depth information and improved imaging quality compared to conventional DOT implemented with continuous wave light sources [2]. The pitch of optical sources and detector arrays should be minimal to improve the spatial resolution of TD-DOT. The current standard of pitch is below 15 mm [15]. Temporal resolution is another important criterion of TD-DOT, which is directly related to the performance of TDC and should be in the range of milliseconds [15]. Recently, wearable TD-DOT systems with a kilohertz frame rate built with high-density mini-light emitting

diodes (LED) and  $\mu$ m pitch SPADs was proposed to improve the imaging quality with minimal impact from tissue variations and motion artifacts from the imaging subjects [14].



Figure 1-2: (Left) A simplified illustration of a DOT imaging system consists of an array of LEDs (red) and detectors (blue). (Right) Time-correlated single photon counting methods (TCSPC).

Due to the recent advances in CMOS technology, miniaturized SPAD pixels and components such as TDC promote the recent rise of implantable neural imagers in recent years [3], [16], [17], [18]. As an example of the many implantable neural imagers proposed in recent years, a simple illustration of implantable neural imaging probes is shown in Figure 1-3 below.



Figure 1-3: A simple illustration of implantable neural imaging. A neural probe integrated with both mini-LED (blue) and SPAD imagers (green).

Implantable neural imaging probes are equipped with miniaturized light sources such as laser diodes and LEDs to activate fluorescence markers with single or multiple photons. Emitted fluorescence photons are detected by the SPAD pixels and converted into imaging data off-probe by peripheral circuits, including TDC. Compared to conventional microscopical brain/neuron imaging technologies, implantable neural imagers can imaging deep brain in moving subjects in real-time [3]. However, implantable neural imagers proposed new challenges for SPAD and peripheral circuits. The biggest challenge is the power consumption of the SPAD array and peripheral circuits, which must remain low to prevent heat damage to brain tissue. Further miniaturization of integrated light sources and detectors is crucial for the spatial resolution of implantable neural imagers.

### **1.2.** Components of Single Photon Detectors

#### **1.2.1. Single Photon Avalanche Diodes**

#### A. Operation Principle

SPAD is a special type of photodiode capable of working with a biasing voltage higher than its breakdown voltage. The basic operating principle is shown in Figure 2-7. The SPAD is first charged to a voltage  $V_{SPAD}$ , which is above the breakdown voltage  $V_{BR}$  by an excess voltage  $V_{EX}$  (state 1). An avalanche current is triggered by an incident photon or a dark carrier (state 2). After the rapid current pulse, a quench circuit brings  $V_{SPAD}$  down to below the breakdown voltage (state 3). After quenching, the SPAD will be recharged back to its initial biasing voltage (stage 1) by the reset circuit (stage 1). This process is then cycled to detect the following incident photon.

The time it takes for a SPAD to switch from state 2 to state 3 and then back to state 1 is known as the dead time or quench and reset (QR) time of the SPAD. Ultimately, the dead time limits the maximum imaging rate of the SPAD pixel can achieve. There are two main types of QR circuits for SPAD, passive quench and reset (PQR) and active quench and reset (AQR). For the PQR SPAD, the SPAD is charged through a large resistor (R<sub>Q</sub>) beyond its breakdown voltage once an incident photon triggers a large reverse current. The current will lead to the voltage drop across the large quenching resistor R<sub>Q</sub>. This leads the SPAD bias to drop below the breakdown voltage and stops the avalanche. The SPAD will again be recharged through the quenching resistor to prepare for the subsequent photon detection. Although the PQR circuit is a simple structure, it compromises its detection speed since it requires a long charge time due to the large RC time constant, where C is the depletion capacitance of the SPAD.



Figure 1-4: Operating principle of single-photon avalanche diode (SPAD). The SPAD is first charged to a voltage  $V_{SPAD} = V_{BR} + V_{EX}$ . (1-2) The arrival of an incident photon induces an avalanche current. (2-3) The avalanche current lead to the voltage drops across the quenching resistor to quench the SPAD. (3-1) The SPAD is reset and charged to  $V_{SPAD}$  again for next photon arrival.

In the AQR SPAD, after the photon-induced avalanche In the AQR SPAD, after the photon-induced avalanche current is generated, the SPAD is discharged through a parallel quenching switch, implemented mainly by a MOSFET, to quickly bring voltage across the SPAD below the breakdown voltage. The SPAD is then recharged by directly connecting to the supply voltage through the reset switch. The AQR configuration allows the SPAD to operate at a higher speed than the SPAD with the PQR circuit since it avoids the RC charging time constant.

However, the fast QR time of AQR SPAD comes at the cost of a reduced photosensitive area compared to a PQR SAPD due to the larger chip area required in each pixel to accommodate the extra transistors for controlling the QR process. The extra chip area needed impacts the fill factor (FF) of AQR SPAD, which is the ratio between the active area and the entire pixel area. Since PQR SPAD has a much simpler structure and contains fewer components than AQR SPAD, the PQR SPAD often yields better FF.

The dark count rate (DCR) is an important performance metric of SPADs. DCR measures the rate of false counts from the SPAD when no incident photons are present. The DCR of a SPAD is dependent on multiple factors, such as the  $V_{EX}$ , operating temperature, and the doping profiles of the CMOS technology used to fabricate the SPAD [18]. For

implantable neural imagers, the DCR of the SPAD should be as low as possible to avoid dark count covering up fluorescence signals.

#### **1.2.2.** Time-to-Digital Converters

#### A. Operation Principle

In single photodetectors, TDC is used to convert the time interval between two incident pulses into the corresponding digital codes. The two signals are often called start and stop, and one of the two signals is a reference clock in practice. In SPAD imagers, TDC is used to quantize the arrival time of incident photons from the output pulses of SPAD pixels or SPAD arrays in real time.

Here we introduce the performance metrics used to evaluate TDC.

**Dynamic Range:** The dynamic range (DR) is the maximum input time interval TDC can convert. In different imaging applications, DR should be large enough to cover the range of incident arrival time for the reference signal. For example, the DR requirement for PET is in the range of nanoseconds depending on specific fluorescent dye [8]. DR also determines the TDC architectures and component requirements, such as the reference clock frequency. In most cases, the designed DR of a TDC is an integer multiple of the reference clock period ( $T_{CLK}$ ).

**Resolution:** The resolution, or least significant bit (LSB) of a TDC, is the shortest time interval it can convert. Resolution is determined by the reference clock period and number of bits of the TDC. When measuring the quantization characteristics of a TDC, the output code of the TDC versus the input time interval assembles a rising staircase shape. The average unit delay, or step width from a large amount of measurements, gives the actual resolution of the TDC. State-of-the-art TDC is capable of achieving resolution below 10 picoseconds [19], [20], [21].

*Precision:* In reality, the actual unit delay deviates from the designed unit delay. The precision of the TDC is the standard deviation of the measured unit delay. The main source of low precision in TDC is the intrinsic jitters from the reference clock and circuit mismatches in the TDC. In general, increasing the drive strength of delay cells by upscaling

transistor sizes can lead to high precision at the cost of resolution, power consumption, and silicon area. As a rule of thumb, the precision of TDC should be at least 1 order of magnitude lower than the resolution for optimal TDC performance [8].

*Nonlinearity:* The resolution and precision of the TDC differ from the designed conditions due to nonidealities. Common sources of nonidealities include reference clock jitters, process, voltage, temperature (PVT) variations, and mismatches. Nonlinearity describes the nonideality of TDC response in two categories: Differential nonlinearity (DNL) and integrated nonlinearity (INL) in LSB. Differential nonlinearity (DNL) is the difference between the ideal step response or the resolution of the TDC with the actual step response. INL can be found by integrating the DNL TDC response to represent the total loss of LSB over the entire DR.

*Sampling Rate:* Conversion time, or dead time, is the time needed for TDC to fully convert the input time difference. Any new input during the conv dead time would be ignored by the TDC. The sample rate is determined by both the conversion time and operational frequency of the TDC. For biomedical imaging applications, TDC should have a high sampling rate for several reasons. A high sampling rate TDC can enable SPAD-imagers to capture more fluorescence events due to the random nature of biomarker excitation. It is also beneficial for large SPAD arrays since every TDC could be shared by multiple SPADs. *Power and Area:* Power consumption is another design consideration that varies from case to case. In implantable neural imagers, where high power consumption can damage brain tissue, the power for the entire data conversion and readout could be less than 10 mW. The Silicon footprint of TDC should also be kept as low as possible to maximize the FF. Power and area limitation also promotes the need for TDC sharing. TDCs can quickly dominate most of the chip area for a 1:1 convert to SPAD ratio as the SPAD array scales up. In recent publications of high-performance TDCs, the power consumptions are below 10 mW and areas are kept under 10 mm<sup>2</sup>.

### **1.3. Research Contributions**

This research focused on designing a standard CMOS technology-based high-performance TDC. The proposed time-amplified, stochastic phase interpolation (TASPI) TDC is

targeted to achieve high-resolution TDC with low nonlinearity. The proposed architecture can potentially be rescaled into different specifications for a wide range of biomedical imaging applications.

- A comprehensive literature review on brain and neural imaging techniques. This review provides a detailed study of the fundamental concepts of SPAD-based imaging techniques. We discussed the operation principles of PET imaging, conventional electrophysiological neural recording, microscopic neural imaging, and the most recent advances in implantable neural imagers.
- Design and measurement results of a prototype TDC using time amplification and stochastic phase interpolation in the TSMC 65 nm standard CMOS process. Conventional TDC used for SPAD imagers for biomedical imaging applications has an LSB in the range of tenths picoseconds. New challenges are proposed for next generation TDC to achieve a high resolution and low nonlinearity within a minimum silicon footprint for biomedical imaging applications such as implantable neural imaging and ToF PET scan. The proposed TASPI-TDC introduced in this thesis achieved a competing LSB and INL with current state-ofthe-art TDCs used in SPAD imagers while only occupies a silicon area of 0.06 mms<sup>2</sup> and consume less than 6 mW of power. The proposed TASPI-TDC is the first to combine both time amplifier and stochastic phase interpolation. The performance of both the coarse and fine stage of the delay line are enhanced using DLL and hardware redundancy method from SPI-TDC. The measurements show the proposed 8-bit TDC achieved high linearity with 16.93 ps resolution and INL of 0.94 LSB. Such TASPI-TDC architecture has potential to be used in many fields of biomedical imaging, such as implantable neural imagers and ultra-high resolution ToF PET scanner.

### **1.4.** Thesis Organization

In Chapter 1, we identify applications of SPAD-based photon counting imagers in biomedical imaging, including PET, DOT, and implantable neural imaging. We also

provide a detailed description of the operating principle of SPAD and TDC in single-photon counting imagers. Finally, we outline the research contribution and organization of the thesis.

In Chapter 2, a comprehensive review of the evolution of brain and neural imaging technologies is presented. From the electrophysiological methods to the latest evolution of SPAD-based implantable neural imagers. We introduced the architecture and building blocks of the SPAD imagers used in these techniques are described in detail. Following this, we also describe some common imaging reconstruction techniques and identify future research challenges for those implantable neural imagers.

In Chapters 3. We first review the operating principle of common TDC architectures in detail, including key inspirational architectures for the proposed design such as timeamplified TDC and stochastic phase interpolation TDC. Our study found that the three architectures can work in coherence with one another, providing high resolution and linearity. In Chapter 4, components of the proposed TDC design are presented. In the end, we presented the simulation results of the major circuit blocks, such as the delay-locked loop and top SPI-TDC, which prove the effectiveness of nonlinearity reduction in all process corners.

The design and measurement results of the proposed TDC is discussed in Chapter 4. In the end, we presented the simulation results of the major circuit blocks, such as the delay-locked loop and top SPI-TDC, which prove the effectiveness of nonlinearity reduction in all process corners. Finally. the measuring methods and results of the proposed TDC fabricated in TSMC 65 nm standard CMOS technology. We also presented the Figure of merits we used to compare our proposed TDC with published results in recent years.

In Chapter 5, a summary of the work that was performed and the achieved results are given. Based on our work, we outlined several challenges that were identified for future research.

# **Chapter 2 Evolution of Brain and Neural Imaging**

### 2.1. Introduction

The brain is the most complex organ of all mammals. The human brain has over 86 billion interconnected neurons to maintain brain activity and various body functions [22]. The coordinated activities of neurons in different brain regions are crucial for neurological function. Insights into neural activities at specific brain regions are essential for understanding brain and neuron functionality, which have many applications, such as the diagnosis and management of neurodegenerative diseases [23] and brain-robot interfacing [24]Various regions of the brain can be accessed using different extracranial and intracranial technologies, with the external electroencephalogram, or EEG, being the gold standard. Despite recent advances in electrophysiology (Ephys), simultaneous recording and stimulation of brain circuits at the cellular level in the deep brain has yet to be achieved.

To gain further insight into neural activities deep inside the brain, Michigan-style neural probes and Utah-style microelectrode arrays (MEA) that are capable of interfacing dense groups of neurons simultaneously by measuring the local field potential were proposed in the last decade [25], [26], [27], [28], [29], [30]. As a maturing technology, surgical protocols involving these probes/MEAs [31], [32], [33] have also been established and practiced to help patients regain control of their bodies to conduct simple tasks such as texting and handwriting [34], [35].

In addition to electrophysiology, microscopic neural imaging provides another perspective for neuroscience research. This includes recent advances in functional imaging, genetically encoded indicators of neural activities (GINA) [36], [37], [38] and optogenetic actuators [39], [40], [41]. Microscopic neural imaging techniques have also evolved over the years to image neural activities in a large field-of-view (FoV) using specially

engineered illumination profiles and scanning patterns [42], [43], [44], [45].

Implantable neural imagers are the newest addition to the imaging arsenal. These compact devices are packed with hundreds of high-performance semiconductor photodetectors on a probe/array [46], [47], [48]. Because of their invasive implantation, implantable neural imagers can image brain regions deeper than microscopic neural imaging techniques. They also allow the possibility of imaging neural activities in mobile subjects at more than 1000 frames-per-second (fps).

This chapter describes the current state-of-the-art electrophysiology for fast, accurate recording and stimulation and advances in microscopic neural imaging techniques that map brain activities in high resolution. Next, implantable photonic devices provide options to probe deeper into the brain to track and stimulate neural activities with microscopic probes and arrays. Also, we provide detailed performance comparisons of several representative methods and techniques using the same figures of merit. Finally, novel research can benefit the future design of implantable photonic devices.

### 2.2. Microscopic Brain/Neural Imaging & Optogenetics

The pioneering exploration of fluorescence protein by Nobel Laureates O. Shimomura, M. Chalfie, and Tsien in the late 1990's [49], [50], [51] open up the possibility for recording neural activities optically (shown in Figure 2-3). For example, multiphoton microscopy achieved long-term high resolution 3D imaging from recording thousands of individual neurons *in vivo* [52], [53], [54]. Later discovery and progress in optogenetics filled the gap to achieve bi-directional interfacing with the brain optically [55]. In the following sections, we will discuss several significant advances in neural activity indicators and actuators, as well as microscopic imaging and optogenetic systems that could be inspirational for future implantable neuron imagers.



Figure 2-1: Fluorophore is excited from the ground state to the excited state by a single photon (left), two photons (middle), or three photons (right). Fluorescence is then released.

#### 2.2.1. Microscopic Neural Imaging

Depending on the type of GINA expressed by the imaging sample, microscopic neural imaging can be categorized into single or multiphoton imaging techniques. Single-photon confocal microscopy techniques (Figure 2-8) are used in tandem with fluorescence reporters that can be excited by a single blue light photon [56], [57]. In these techniques, the sample is bathed in a light field generated by passing the excitation light through an objective lens. Fluorescence emissions are redirected through a dichroic mirror to imagers placed perpendicular to the excitation sources. Single-photon confocal microscopy can achieve a high volumetric imaging rate since multiple sources in an entire section of the sample is illuminated simultaneously. However, several limitations exist for single-photon microscopy. First being the imaging depth, since the short wavelength nature of single photon excitation cannot image a thick tissue sample. Furthermore, attenuation of brain tissue can scatter the majority of the excitation light outside of the intended imaging plane, which is also an issue when collecting emitted fluorescence signal. Photobleaching and phototoxicity is another concern for long-term neural imaging with single-photon confocal microscopy due to the high energy load from the excitation light onto the imaging tissue sample [58].



Figure 2-2: Single photon confocal microscopy. A light field is generated by passing the excitation light through an objective lens. An entire section of the sample is excited and emit fluorescence.

Laser-scanning multiphoton microscopy (MPM) pioneered by Denk *et al.* significantly decreased the excitation intensity and imaging with lower photobleaching than the conventional single-photon microscopic approaches [59]. Since then, novel MPM methods using different illumination patterns have been proposed to improve the imaging rate, depth, and field-of-view (FoV) in complex animal structures.

Single-beam MPMs using a single laser source were the earliest proposed MPM methods for long-term *in vivo* neural imaging [43], [59], [60], [61]. However, early point scanning methods are yet to achieve large-volume 3D imaging in the range of hundreds of frames per second (FPS). As new fast GINA can approach sub-ms reset time, advancing neural imaging systems need to reach kHz imaging rates [38]. Random-access scanning (Figure 2-9a) was proposed by moving the focal beam spot in the imaging volume arbitrarily by using an acousto-optic deflector (AOD) [62], [63]. An acousto-optic deflector (AOD) can rapidly change the direction of the laser beam by acoustic waves of different frequencies. By aligning two orthogonal AOD pairs to propagate sound waves in the opposite directions, 3D random-access single-beam MPM imaging >1,000 fps is achieved by stacking 2D images acquired from multiple depths (Figure 2-10a) [44], [64]. However, the imaging rate of AOD-based single-beam MPMs degrades significantly as the number of recording neurons increases. Imaging at >1,000 fps in both the 2D and 3D space is only



achievable when monitoring less than 100 neurons simultaneously in vivo.

Figure 2-3: Illustration of different multiphoton microscopy techniques. **a.** Single-beam random access multiphoton microscopy (MPM). Using acousto-optic deflectors, The focal spot of the laser beam traverse randomly in the imaging volume. **b.** Multi-region MPM. Multiple regions of the sample are imaged simultaneously with two set of microscope devices. **c.** Multi-beam MPM using spatiotemporal multiplexing. The source laser is divided into multiple beams each with a slight delay from others. The focal spots of lasers are spatially separated in the imaging volume, providing a large field of view (FoV). **d.** Scanned line angular projection (SLAP) microscopy, a tomographic like microscopy imaging technique. Lasers are projected as multiple scan lines and scan the sample sequentially. In this approach, imaging rate is independent from the number of recording neurons.

Imaging over a large FoV in the millimeter range is another target for MPM imaging methods developed in the last 15 years. State-of-the-art single-beam MPM using galvo scanners to traverse the sample with remote focusing supports more than  $4 \times 4 mm^2$  FoV at 1.9 fps [65]. To image large volumes at >100 fps, increasing the number of illumination patterns is needed. Multi-beam MPM (Figure 2-5.c, d) was proposed to achieve such a goal by providing multiple excitation beams [58], [66], [67]. Spatiotemporal multiplexing was proposed to clearly distinguish each fluorescence event with respect to its source and reduce the optical crosstalk between the excitation beams [68]. In spatiotemporal multiplexing, multiple laser beams are generated with a delay from each other longer than the decay time of the fluorophore. Typical beam generation approaches divide a single source laser beam into several beams [66]. Then, the image planes acquired from each beam are "stitched" together to form the FoV (Figure 2-10b). Multi-beam MPM also demonstrated simultaneous dual region recording to study the dynamics of neuron activities between multiple brain regions [58], [67], [69]. Current state-of-the-art multi-beam MPM can image up to 5000 neurons over a  $> 9 mm^2$  FoV simultaneously [67]. Similar to single-beam

MPM, the imaging rate of multi-beam MPM is also strongly dependent on the number of recording neurons.



Figure 2-4: Illustration of common image reconstruction methods used in microscopic brain and neural imaging. (a) 3D image of the imaging volume is reconstructed by stacking multiple 2D images acquired from multiple depth. (b) Large FoV imaging achieved by overlapping multiple images from different regions. This method is used in. (c) Imaging volume is reconstructed by stacking images of multiple oblique section.

Scanned line angular projection microscopy (SLAP) is an MPM technique adopted by tomographic scanning techniques (Fig 3(f)). SLAP uses multiple scan lines to cover a 250  $\mu m \times 250 \ \mu m$  FoV at multiple angles with a customizable random access scanning pattern to avoid scanning non-interested regions [42]. The use of line scans enables SLAP to cover a wider imaging area simultaneously than conventional point-by-point scanning methods. Since the scanning speed of SLAP is only proportional to the diameter of the FoV, it can achieve more than 1kfps imaging rate when measuring 100 sources with only four scan lines [42]. However, SLAP systems require increased line angles to maintain the imaging performance when scanning high density samples, which limit the imaging flexibility of SLAP across the entire brain region [42].

#### 2.2.2. Light Sheet Fluorescence Microscopy

Large volumetric imaging with single-beam or multi-beam MPM is very limited since 3D images are acquired by stacking multiple 2D images from multiple depths. Light sheet fluorescence microscopy (LSFM) is one approach that minimizes the drawbacks of the conventional MPM. LSFM generates a thin sheet of light by passing an excitation laser beam through a cylindrical lens (Figure 2-6.a) [70], [71], [72] or uses a rapid digitally controlled micrometer scale laser beam (Figure 2-6.b) [73]. The planar illumination pattern

allows LSFM to optical section the sample precisely, thus minimizing background noise [71], [74].



Figure 2-5: Illustration of different light-sheet microscopy techniques. **a.** Light sheet fluorescence microscopy. A laser beam is projected as a light sheet using a cylindrical lens to selectively illuminate a single plane inside the imaging volume. **b.** Laser-scanning light sheet fluorescence microscopy. Light sheet is formed by using digitally controlled micrometer scale laser beams to illuminate the imaging plane. **c.** Swept confocally aligned planar excitation (SCAPE) microscopy. Light sheet is projected obliquely into the imaging volume at different angles using a rotating scanning mirror.

Unlike conventional MPM system setups where the imagers and excitation sources can be placed on the same axis, imagers are often placed perpendicular to the excitation sources in LSFM. This limits the application of conventional LSFM to image in small animal models, such as larval zebrafish [75], [76] and *ex vivo* specimens such as optically clear mammalian brains [77], [78]. A special type of LSFM, swept confocally aligned planar excitation (SCAPE) microscopy shown in Figure 2-6.c is proposed with both excitation sources and imagers placed on the same axis. In SCAPE microscopy, the light sheet is projected obliquely onto the sample and moves laterally across the sample using a rotating scanning mirror [79]. Fluorescence emissions are reflected onto a stationary imaging plane through the same objective lens [79]. SCAPE microscopy can capture the entire imaging volume by rapidly scanning multiple oblique sections of the sample, while each frame corresponds to a specific oblique section in the sample (Fig 31) [79], [80]. Using a highspeed CMOS camera, the latest SCAPE image can reach more than 300 volume elements per second (vps) voxel-by-voxel imaging with <1 mm spatial resolution in real-time, effectively achieving 4D neural imaging [45].

#### 2.2.3. Comparison and Discussion

Table I lists the key performance characteristics of selected functional imaging techniques published in the last fifteen years. In order to evaluate the listed techniques, we proposed a figure-of-merit (FoM) for imaging quality ( $FoM_{IO}$ ) in Equation 2-1 below.

$$FoM_{IQ} = \frac{FoV \ (mm) \times Depth \ (mm)}{Spatial \ Resolution \ (\mu m)} \times Max. Img. Rate \ (kfps \ or \ vps)$$
(2-1)

In the proposed  $FoM_{IQ}$ , FoV, depth, spatial resolution, and maximum imaging rate are considered. In this paper, performances measured in specially prepared animal structures, such as agarose and cleared organ slices, are excluded. Data measured in rodents are primarily considered for the purpose of consistency. Another caveat is that most microscopic neural imaging techniques reconstruct 3D images by stacking multiple 2D images at different depths (Figure 2-10c), instead of true voxel-by-voxel 3D imaging.
| Reference, year                        | Technique                            | Max Min.<br>Imaging rate<br>(fps or vps) | # of recorded<br>neurons <sup>b</sup> | Imaging<br>dimension | Max FoV (mm)                                                                                             | Spatial<br>resolution (µm)                      | Depth (mm)       | Detection unit | FoM <sub>IQ</sub> |
|----------------------------------------|--------------------------------------|------------------------------------------|---------------------------------------|----------------------|----------------------------------------------------------------------------------------------------------|-------------------------------------------------|------------------|----------------|-------------------|
| Prevedel et al., 2014<br>[56]          | Single-photon Light Field            | 50 - 5                                   | 74                                    | 3D °                 | $0.7{\times}0.7{\times}0.2~^{\rm f}$                                                                     | $1.4\times2.6~^{\rm \$f}$                       | 0.2 <sup>f</sup> | CMOS camera    | 0.0013            |
| Grewe et al., 2010 [62]                | Single-beam MPM Random<br>Access     | 490 - 180                                | 34 - 91                               | 2D                   | 0.305 × 0.305                                                                                            | 1.0	imes 4.1 §                                  | 0.3              | РМТ            | 0.0033            |
| Cotton et al., 2013 [64]               | Single-beam MPM Random<br>Access     | 1800 - 121                               | 27 - 411                              | 3D °                 | $0.2 \times 0.1$                                                                                         | 0.5 × 3 <sup>§</sup>                            | 0.3              | РМТ            | 0.01              |
| Sofroniew <i>et al.</i> , 2016<br>[65] | Single-beam MPM Random<br>Access     | 21 - 0.7                                 | 3179 @ 1.9<br>fps                     | 3D °                 | 4.4 × 4.2 @ 1.9 fps                                                                                      | 0.66 × 4.09 *                                   | 0.6              | РМТ            | 0.09              |
| Cheng et al., 2011 [66]                | Multi-beam MPM                       | 250                                      | 100 - 200                             | 3D °                 | 0.4 	imes 0.4                                                                                            | 0.5 × 1.5 * §                                   | 0.18             | APD            | 0.0096            |
| Stirman <i>et al.</i> , 2016<br>[67]   | Multi-beam MPM                       | 30 - 0.1                                 | 5361 - \                              | 2D                   | 3.5 × 2.74 *                                                                                             | 1.2 × 12.1 §                                    | 0.71             | РМТ            | 0.0141            |
| Kazemipour <i>et al.</i> , 2019 [42]   | SLAP Microscopy<br>(Tomographic MPM) | 1016                                     | 100 – 500 °                           | 3D <sup>e</sup>      | $0.25 \times 0.25 \times 0.25$                                                                           | $0.43 \times 1.62$ §                            | 0.3              | SiPM           | 0.0068            |
| Holekamp <i>et al.</i> , 2008<br>[72]  | LSFM                                 | 200                                      | 88                                    | 3D °                 | $0.43 \times 0.5$                                                                                        | $2 \times 5$ §                                  | 0.15             | CCD camera     | 0.0006            |
| Ahrens et al., 2013 [5]                | LSFM                                 | 0.8                                      | 869                                   | 3D                   | $0.8\times0.6\times0.2$                                                                                  | 0.65 	imes 5 §                                  | 0.2              | CMOS camera    | 0.0047            |
| Bouchard <i>et al.</i> , 2015<br>[79]  | SCAPE Microscopy                     | 20 - 10 ª                                | \                                     | 3D                   | $0.6 \times 1.0 \times 0.55$ g                                                                           | $2.5 \times 3.25 \times 3.6$                    | 0.3              | CMOS camera    | 0.0677            |
| Voleti et al., 2019 [45]               | SCAPE Microscopy                     | 321 - 5.96 ª                             | 113 <sup>d</sup>                      | 4D                   | $\begin{array}{c} 0.39 \times 0.30 \times 0.041 \ ^{d} \\ 8.5 \times 9.5 \times 0.46 \ ^{h} \end{array}$ | $0.75 \times 0.24 \times 0.1$<br>9 <sup>i</sup> | 1.8 <sup>i</sup> | CMOS camera    | 0.081             |

Table 2-1: Selected Publications for Different Microscopic Neural Imaging Techniques

\* Estimated from figures and data. § Spatial resolution measured in lateral × axial.

a. Imaging rate measured in volumes per second (*VPS*). Imaging rate (*VPS*) = Imaging rate (*fps*)/# of scan angles [45], [79]. b. Number of recorded neurons are dependent on imaging rate, except SLAP microscopy. c. Number of line angles is increased from 4 (100 neurons) to 8 (500 neurons) [42]. d. Measurement taken in the head of a living, immobilized *C. elegans* worm [45]. e. 3D image is reconstructed by stacking 2D images acquired from multiple depths. f. Measured in *C. elegans* and larval zebrafish brain [56]. g. Measured in non-scattering agarose [79]. h. Measured in cleared mouse brain slice [45]. i. Measured in mouse spinal cord slice [45].

From the  $FoM_{IQ}$  listed in Table I, it is shown that each technology evolves in four major directions. Kilohertz imaging rate; millimeter FoV in all axes; imaging depth close to 1 millimeter; and sub-micrometer resolution. Several designs [42], [64], [65] met at least one of the above targets and achieved a  $FoM_{IQ}$  greater than 0.01. The implementation of high-speed CMOS cameras rather than PMT as a detection unit allows the novel imaging system to reach higher FPS ceilings. It is also worth noting that voxel-by-voxel imaging such as SCAPE microscopy [45], [79] should be compared with each other since the imaging rate is measured in VPS, and the spatial resolution is measured in *x*, *y*, and *z* directions, instead of the lateral and axial axes.

By analyzing both Table I and  $FoM_{IQ}$ , large-FoV imaging at a depth of more than 2 mm is yet to be achieved. Some of the best reported imaging qualities are also measured in transparent animal structures. The actual performances of several of the listed imaging techniques may degrade when imaging neural circuits in the brains of living animals. Due to the highly scattering and absorptive nature of brain tissue, delivering excitation light to the animal brains and detecting fluorescence emission from the external imaging plane will be severely attenuated. This ultimately limits microscopic neural imaging techniques to record neural activities in the deep brain regions [68], [81], [82], [83]. Since the brain tissue is extremely thermally sensitive, it is also impractical to further scale up the laser power used in microscopic imaging systems as they are bulky and often require head fixation or anesthesia for imaging neural activities in live animals.

# 2.3. Implantable Neural Imager

With the rapid advancement of the fabrication technologies for flexible materials and miniaturized CMOS photodetectors in last decade, imaging neural activities with implantable photonic devices became possible to unlock complex behaviors in freely moving mammals. Using the aforementioned Michigan-style probe and Utah-style MEA as the base platform, a dense array of emitters and photodetectors can be integrated in a

compact form factor for ultra-fast neural imaging and optogenetics. Excitation and collection can be brought deep into the targeted brain region with minimal tissue displacement. As the excitation sources are denser and more precise than microscopy imaging, implantable photonic devices can offer high-frequency illumination patterns within the transient time of the fluorophore to produce high SNR images [3]. Another approach is multimodal implantable neural devices that exploit both Ephys and photonics for precise control and recording of neural circuits.

The need for fully integrated implantable neural imagers with onboard light emitters and photodetectors to record neural activities in close proximity to the targeted brain region is highly desirable. Since the light intensity of the fluorescence emission against the average intensity can easily be interfered with by background noise [36], implantable neural imagers require highly sensitive photodetectors. Single-photon avalanche diodes (SPADs), a particular type of photodiode that is capable of detecting even a single incident photon, has been widely used in brain function imaging techniques, such as positron emission tomography (PET), fluorescence lifetime imaging (FLIM), and diffuse optical tomography (DOT) [9], [85]. The high photosensitive and built-in amplification characteristics of SPADs make them the most commonly used photodetectors for implantable neural imagers proposed in the last five years. SPADs can also be fabricated using widely available standard CMOS technologies and conveniently integrated with other required integrated circuit (IC) components to create a SPAD-based imaging system.

Angle-sensitive SPAD (A-SPAD) is a special SPAD that combines multiple layers of  $\mu$ m-scale metal diffractive gratings and the SPAD in a standard CMOS technology [17]. A-SPAD is a key innovation that fuels the application of SPAD in implantable neural imagers. The structure of A-SPAD is shown in Figure 2-12a. Two layers of metal gratings, Talbot grating and analyzer grating, are placed above the active area of the SPAD with a slight gap between the alignment of the two gratings. This is called the diffractive pattern [17]. This diffractive pattern will only allow photons coming with a matching angle to reach the active area of the SPAD and blocks the rest. In addition to the geometry of the two gratings, angular information regarding the incident photon is also related to the rotation of

the gratings and the wavelength of the incident photons [46]. By using an array of A-SPADs, each with a different grating geometry, lens-less 3D imaging can be achieved onchip (Figure 2-12b) [17], [46], [48]. However, a major drawback of A-SPAD imagers is the low photon detection probability (PDP) due to photon blocking from the diffractive gratings. The best PDP performance achieved by the state-of-the-art A-SPAD implantable neural imagers is only 12.4% [48].



Figure 2-6: Structure of angle-sensitive SPAD, (A-SPAD). Two layers of grating, talbot and analyser grating is placed on top of the SPAD. A diffractive pattern is formed depending on the grating pitch and vertical gap between the two gratings. Incident light can only be detected when it approaches the A-SPAD with an angle matching the diffractive pattern.

False counting generated by unwanted excitation light outside of the fluorescence emission window is a major challenge for SPAD imagers used in implantable neural imagers. Instead of using dichroic mirrors and optical filters to separate the excitation and emission of light spectrally, SPAD-based implantable neural imagers uses time-gating single photon counting (TGSPC) [46], [47], [48]. The TGSPC method exploits the decaying fluorescence intensity of the fluorophore, as shown in Figure 2-13. Photons generated from fluorescence emission are counted in sequentially enabled detection gates after initial excitation. The TGSPC can improve the signal-to-background ratio (SBR) and reduce the DCR of the SPAD [47], [86]. When using TGSPC with A-SPAD of different grating geometries, spectral filtering can also be achieved in implantable neural imagers without masking the probe with an optical filter [46]. However, SPAD imagers using TGSPC often require higher power consumption for the gating mechanism [86].



Figure 2-7: Time gating single photon counting (TGSPC). Several consecutive time-gate activated after the fluorescence excitation.

Implantable neural imagers have several advantages over conventional microscopic neural imaging techniques, since the shank of the neural imaging probe can easily penetrate the brain region, similar to Michigan-style Ephys neural probes. Minimal invasive neural imaging at a depth of more than one millimeter can be easily achieved using an implantable neural imaging probe. Multiple shanks filled with imager arrays can be fabricated on the implantable probe, thus allowing a large combined FoV to be recorded at the deep brain region [46], [48].

Co-integration of emitters and SPADs is a major advance in implantable neural imagers. Early design required external lasers to provide excitation (Figure 2-14a), which limits the imaging depth of the probe to only 83% of the shank length [48]. A later approach featured a dual laser diode facing each other to generate a double cone shape illumination profile on top of the A-SPAD array for 3D volumetric imaging (Figure 2-14b) [87]. Another design [47] with a structure similar to an electrode array for electrocorticography (ECoG) integrated  $\mu$ LED arrays, imagers, and supporting circuits vertically on a flexible substrate. This design proposed an innovative packaging method by placing the  $\mu$ LED arrays on top of the optical filters for the imagers. Bi-directional neural interfacing is another highlight of this design since the two  $\mu$ LED arrays are responsible for both fluorescence excitation and optogenetics (Figure 2-14c). This design was further developed in [88] to employ

global shuttering for high dynamic range imaging and to achieve higher framerates at lower power consumption.



Figure 2-8: Common implantable neural imaging probe and excitation source setup. **a**. Implantable neural imager with SPAD array integrated on-probe that require external laser for excitation. **b**. Two laser diodes (LDs) are placed at the end and tip of the probe. **c**. SPAD array and micro light emitting diodes ( $\mu$ LEDs) are co-integrated on the probe.

| SPAD Array Performance                          |                             |                 |                          |                                 |                  |                                                                                |                                                                 |                      |                     |                            |               |
|-------------------------------------------------|-----------------------------|-----------------|--------------------------|---------------------------------|------------------|--------------------------------------------------------------------------------|-----------------------------------------------------------------|----------------------|---------------------|----------------------------|---------------|
| Reference, year                                 | CMOS<br>Technology          | Туре            | FF (%)                   | Max<br>PDP (%)                  | DCR<br>(Hz)      | Imager<br>Dimension /<br>Shank (mm)                                            | Array<br>Size /<br>Shank                                        | Pixel Pitch (µm)     | Power<br>(mW)       | Integrated<br>Light Source | ADI<br>PI     |
| Choi et al., 2019 [48]                          | 130 nm                      | A-SPAD Probe    | 6.3                      | 12.4 ª                          | 40               | $\begin{array}{c} 2 \text{ shanks} \\ 4.1 \times 0.12 \times 0.04 \end{array}$ | $\begin{array}{c} 2 \text{ shanks} \\ 4 \times 128 \end{array}$ | 25.3                 | 6.24                | No                         | 26.02<br>1.69 |
| Choi et al., 2020 [87]                          | 130 nm                      | A-SPAD Probe    | 6.3                      | 0.6 * <sup>a</sup>              | 37.9             | $1.6 \times 4.2 \times 0.02$ $^{\rm b}$                                        | 8 × 64                                                          | 25.3 (x)<br>51.2 (y) | 14.4                | Yes                        | 3.81<br>0.07  |
| Moazeni <i>et al.</i> , 2021<br>[47]            | 130 nm                      | SPAD Surface    | 5                        | 12                              | 26               | $8.0 \times 8.0 \times 0.25$                                                   | 160 × 160                                                       | 30                   | 40                  | Yes                        | 1.6<br>0.91   |
| Taal <i>et al.</i> , 2022 [46]                  | 130 nm                      | A-SPAD Probe    | 8                        | -                               | 232              | 4 shanks<br>$0.12 \times 4 \times 0.05$                                        | $\begin{array}{c} 4 \text{ shanks} \\ 2 \times 128 \end{array}$ | 24.5 (x)<br>92 (y)   | -                   | No                         | - 10.67       |
| Pollmann et al., 2022<br>[88]                   | 130 nm                      | SPAD Surface    | 11.5                     | 9                               | 17               | $7.8\times6.4\times0.2$                                                        | 256 × 192                                                       | 35                   | 18                  | Yes                        | 4.92<br>2.63  |
| Imaging Quality                                 |                             |                 |                          |                                 |                  |                                                                                |                                                                 |                      |                     |                            |               |
| Reference, yearFoV $x \times y \times z \ (mm)$ |                             | Best Spa        | tial Resolutio           | on (µm)                         | Max. Imaging Dis | tance (mm)                                                                     | Max Frame rate (fps)                                            |                      | FoM <sub>IQ</sub> * |                            |               |
| Choi et al., 2019 [48]                          | $3.4 \times 0.6 \times 0.4$ |                 |                          | 64 	imes 40 	imes 65            |                  | 0.4                                                                            |                                                                 | 51 k <sup>f</sup>    | 0.25                |                            |               |
| Choi et al., 2019 [87]                          | $1.6\times0.4\times0.4$ *   |                 | $40 \times 35 \times 73$ |                                 | 0.4 <sup>e</sup> |                                                                                | 44.4                                                            | 0.00011              |                     |                            |               |
| Moazeni <i>et al.</i> , 2021<br>[47]            | 5.4                         | × 5.4           |                          | $60 \times 60$                  |                  | 0.2                                                                            |                                                                 | 125 0.205            |                     |                            |               |
| Taal et al., 2022 [46]                          | 0.8 	imes                   | $4 \times 0.15$ | 5                        | $52 \times 52 \times - c^{c d}$ |                  | 0.4                                                                            |                                                                 | 10                   | 0.0047 <sup>g</sup> |                            |               |
| Pollmann <i>et al.</i> , 2022<br>[88]           | 5.1                         | × 6.8           |                          | $50 \times 50$                  |                  | -                                                                              |                                                                 | 400                  | -                   |                            |               |

| Table 2-2. Selected Fublications for Different implantable neural imagers |
|---------------------------------------------------------------------------|
|---------------------------------------------------------------------------|

CMOS: Complimentary metal-oxide semiconductor. SPAD: Single-photon avalanche diode. A-SPAD: Angle-sensitive single-photon avalanche diode. FF: Fill factor.

PDP: Photon-detection probability. DCR: Dark count rate. FoV: Field of view. ADI: Array density index. PI: Photonic Index. \* Estimated from figures and data.

**a**. Includes losses due to diffractive gratings. **b**. Substrate (20  $\mu$ m) and laser diode (100  $\mu$ m) thickness [87]. **c**. Lateral (*x* and *y*) resolution taken from full width half maximum (FWHM) of the point spread function (PSF) at 150  $\mu$ m depth [46]. **d**. Axial (*z*) resolution is the size of the rectangular grid since it is not reported [46]. **e**. Spatial resolution reported < 150  $\mu$ m away from imager. Spatial resolution varies at different depth [87]. **f**. Reported theoretical maximum frame rate. Experiments are conducted at 0.8 fps and 1 fps [48]. **g**. *FoM*<sub>*IQ*</sub><sup>\*</sup> of this imager is calculated as a 2D imagers (exclude FoV<sub>*Z*</sub>) since axial resolution is not reported [46].

To analyze the progress of implantable neural imagers in the last five years, SPAD performance and imaging quality of a selection of representative fluorescence imagers and implantable neural imagers are listed in above Table II. In order to compare the imaging quality of implantable neural imagers with conventional microscopy neural imaging techniques, a modified version of  $FoM_{IQ}$ ,  $FoM_{IQ}^*$  in Equation 2-2 is used to account for the difference between implantable and microscopic neural imaging.

$$FoM_{IQ}^{*} = \frac{Imaging \ Volume \ (mm^{3})}{Spatial \ Resolution \ (\mu m)} \times \ Max. Frame \ Rate \ (fps)$$
(2-2)

Imaging Volume =  $FoV_x \times FoV_y \times Max$  Img. Dist or Z Range

Two implantable neural imagers [47], [48] achieved a  $FoM_{IQ}^*$  more than double of the best  $FoM_{IQ}$  from microscopic neural imaging systems. The two imagers both achieved a frame rate higher than 100 fps and a large FoV. Comparing parameters between Table II and Table I, the most significant advantage of implantable neural imaging probes is imaging depth. These probes can reach a depth of more than 2 mm, which is yet to be achieved by microscopic neural imaging systems. Another advantage of implantable neural imagers is the imaging speed since most signal processing and readout circuits can be integrated on probe. Large FoV imaging (>25 mm<sup>2</sup>) can be achieved in implantable neural imagers, especially for ECoG style imagers [47], [88]. However, the spatial resolution of the implantable neural imaging systems can achieve a spatial resolution less than 1 µm since microscopic systems can use a large pixel density imager such as in CMOS cameras. State-of-the-art SCAPE microscopic [45] has a spatial resolution more than ten times smaller than the spatial resolution of all implantable neural imagers.

Two FoMs, array density index (ADI) Equation 2-4 and photonic index (PI) Equation 2-5 are proposed for assessing key parameters of a SPAD array used in different implantable neural imagers.

$$ADI = \left(\frac{Array Size / shank}{Imager Dimension / Area (mm^3)}\right) / 1000$$
(2-4)

$$PI = \frac{(100 \times Max.PDP(\%)) \times (100 \times FF(\%))}{DCR(Hz) + Power(mW)}$$
(2-5)

The highest ADI of 26 is achieved by Choi *et al.* in [48]. This design also scored the second-highest PI. In terms of PI, the ECoG-style imager [88] from Pollmann *et al.* gave the highest PI since it had the highest FF while keeping the DCR at a low value. Higher array density, higher PDP, and higher FF while maintaining low DCR and low power consumption were the major development directions for implantable neural imagers in the past and will also be in future.

However, it is noted that keeping a low DCR has a higher design priority than having a high PDP and FF. An ultra-low DCR not only improves the SNR of the implantable neural imagers, but it also helps imagers to improve the maximum imaging distance from the SPAD pixel. Since the brain is highly thermally sensitive, even a 3°C temperature increase can damage brain tissue [89], [90]. A continuously lower power consumption has also been achieved over the last five years to reduce heat dissipated from the implantable neural imagers to the surrounding brain tissue.

## 2.4. Challenges and Future Perspectives

In the previous sections, state-of-the-art Ephys neural probes, microscopic neural imaging techniques, and implantable neural imagers were studied and analyzed. With a goal to build a complete hybrid neural interfacing platform in the near future, we summarize below several challenges for these neural interfacing systems to develop.

#### **2.4.1. Signal Characteristics.**

With respect to genetically encoded indicators or neural activity, the signal intensity of the fluorescence signal is important. The fluorescence signal intensity must be high enough so that it can be distinguished from noise. The SNR and the penetrative ability of dyes in tissue for visualization are key areas for improvement. Over the time course of lightexposure, excitation of photons may not be linear, and photostability is an important consideration, especially for long-term applications [36]. Limitations regarding the dynamic range should also be considered. A single indicator may not be capable of covering the full biological process [36] but improvements to the sensitivity of the indicator or using multiple indicators in a systematic manner can overcome this.

#### 2.4.2. Brain Tissue Characteristics.

A challenge with optogenetic actuators is the miniaturization of probes while ensuring the depth is sufficient to reach deep brain areas for stimulation. Although most optrode shafts are about 1 mm long, at least 4 mm has been recommended [91].

Another challenge in the context of recording electrical activity is that biological tissue that is modified for optogenetics is larger than what electrical recording technologies are typically capable of capturing [41], leading to challenges with more precise, neuron-level activity to be determined. However, such limitations are being overcome in more recent works [41], but they are not yet applicable to diverse contexts or brain regions. Other important considerations include the precision and stability of light delivery, especially with regards to wave-guiding structures, as well as temporal and spatial resolution. In order to encompass a variety of brain regions of to stimulate multiple layers of the brain, spatial resolution at the micrometer scale is required, with a sufficient pulse frequency for optimal temporal resolution and accuracy [92].

#### **2.4.3. SPAD Characteristics**

Improvement in SPAD performance can significantly benefit implantable neural imager design. As discussed in the previous section, low power consumption, ultra-low DCR, high PDP, high FF, and fast frame rate are the key goals for SPAD imagers used in implantable neural imagers. This requires better SPAD models to provide more accurate parameters of the SPAD pixel. A realistic SPAD model can help designers balance the trade-off between the operating voltage with DCR and PDP of the SPAD pixel [18]. Novel SPAD structures with improved QR mechanisms can also improve the frame rate for implantable neural imagers [93], [94]. Other structural innovations, such as a multi-time-gated SPAD array structure [9] capable of applying different TG windows across the entire SPAD array can be used. By employing such a multi-TG SPAD array on-probe, implantable neural imagers will be able to simultaneously track animals expressing

fluorophores in multiple colors and better mitigate the photobleaching effect during longterm recording.

## 2.4.4. System Integration and Requirement

On the system integration level, future implantable neural imagers should integrate a much higher density of emitters and detectors for large-scale neural interfacing. Despite the high detector array density achieved by several implantable neural imagers proposed in the last five years [46], [48], emitter density still remains low. The most advanced fully integrated implantable neural imager only had an emitter density of 0.96 emitters/mm<sup>2</sup> for bi-directional optical neural interfacing [88]. A high emitter density can offer more illumination patterns with a wide range of spatial and temporal separability to image more neurons simultaneously [95].

## 2.4.5. Wireless Communication

To better exploit the compact form factor of implantable neural imagers, wireless communication components will further accelerate the study of neural activities in mobile subjects. In order to achieve real-time imaging with implantable neural probes, a wireless neural imager would require novel system designs for transmitting high-resolution images from the probe while maintaining low power consumption. These improvements could significantly increase device density on the implantable device.

### 2.4.6. Cooling

Novel cooling mechanisms should be developed to reduce thermal dissipation from the device. Some probe designs [96], [97] feature flexible micro-fluid channel for *in vivo* fluid delivery. Such channels could also be implemented beneath the substrate of the neural probe as cooling systems to reduce surface heating from high power components such as SPAD arrays. These channels can also be used for drug delivery to reduce inflammation around the insertion area.

#### 2.4.7. Long-Term Recording

Although implantable neural imagers are most suited for neural imaging in freely moving animals, long-term imaging experiments in mammals are still lacking in implantable neural imagers proposed in the last five years. Implantable neural imaging probes also inherit several mechanical drawbacks since they use Michigan-style platforms, such as shank breakage and complex surgical procedures. Utah-style and Utah-Michigan hybrid-style platforms [98] are yet to be implemented in implantable neural imagers.

### 2.4.8. Fabrication Requirements

Similar to the developing trend for Ephys probes/arrays, biocompatible flexible material are starting to be adopted in the packaging of implantable photonic devices in recent years [47], [88]. The recent proposed fabrication protocol [99] for neural probes using 3D printing technology demonstrated promising results for future mass production of optogenetic probes. Recent designs also proposed flexible water-resistant organic LED [100] that could potentially be used in future ultra-flexible optogenetic probes and implantable neural imagers. However, SPAD arrays fabricated in standard CMOS process, would require innovative post processing solutions to be packaged into ultra-thin and flexible probe designs.

#### 2.4.9. Microscopic and Deep Brain Imaging

Microscopic neural imaging systems, Ephys probes/arrays, and implantable neural imagers can work in tandem to interface with neural systems in the future. Since microscopic neural imaging systems can implement higher resolution and high pixel count imagers such as CMOS imagers, they are most suited to provide large FoV, high-resolution images across the brain at a limited depth to identify the function and the neural activity dynamics between brain regions. Deep brain imaging in freely moving subjects with implantable neural imagers can further complement microscopic systems to image the brain with fine details.

#### 2.4.10. Data Processing

Customized data processing algorithms incorporating deep learning (DL) and machine learning (ML) algorithms can improve feature extraction time for both Ephys recording and neural imaging applications. Several automatic DL-based spike sorting algorithm were developed and tested multichannel MEA and probes with more than 4000 electrodes in real-time [101], [102], [103]. Implementation of these spike sorting algorithms significantly reduce the amount of manual spike labeling in laboratory environments. DL algorithms were also implemented in biomedical imaging applications, such as magnetic resonance imaging (MRI), computer tomography and microscopic imaging to accelerate human brain characterization [104], [105]. As the new generation of SPAD-based implantable neural imagers create the possibility of in-situ deep brain 3D imaging, novel DL-based image reconstruction algorithms need to be developed to help implantable neural imagers achieve imaging rate over 1000 fps.

#### 2.4.11. Implantable Device and Applications

The surgical procedures for implantable Ephys probes/arrays can also be shared with implantable neural imagers, which reduces the complexity in future clinical deployment. Implantable photonic devices can easily provide fast imaging at the targeted brain region in situ in both pre-surgical analysis and patient monitoring during surgical operations. Neural probes can also be used for chronic conditions such for neurological diseases, investigating brain and cognitive diseases as well as for brain mapping [106]. Since Ephys recording contains less power-hungry components than implantable neural imagers, Ephys probes/arrays will be a better candidate for long-term, safe recording in patients.

## 2.5. Conclusion

Implantable photonic devices have the potential to change the way neuroscience is studied and how neurological conditions are managed. This work provided a detailed review of existing technologies, the evolution of implantable photonic devices, a critical and comparative assessment of implantable photonic devices and a discussion on the outlook of technological advancements in this field. This work focused on the existing electrophysiology, emerging microscopic neural imaging and optogenetic technologies (including GINA, optogenetics, microscopic neural imaging and light sheet fluorescence microscopy). In addition to this, interfacing neural activities with implantable photonic devices (including miniscope and microendoscopy, implantable waveguide and micro-LED, and implantable neural imagers) were explored.

This review also outlined challenges pertaining to implantable neural imagers. One major challenge is to integrate high density emitter and dense photodetector array while keeping the patient safe from thermal dissipation and mechanical damage to the brain tissue. It would be even more difficult to maintain the mechanical and data integrity of the device for chronic implantation. Future applications such as distant patient monitoring will have new requirements for implantable photonic devices with ultra-low power consumption and wireless capabilities.

Although there are many challenges to overcome, novel ideas have been proposed in all areas of implantable photonic devices that can allow for meaningful advances to be made. From the continuously improving device modeling to components fabricated with polymer materials, implantable photonic devices could feature more than 10,000 photodetectors and integrated light emitting diodes in a flexible package to produce realtime, high-quality images while maintaining biocompatibility and low power consumption.

By combining new fast neural activity reporters and optogenetic actuators with implantable photonic devices, microscopic imaging techniques and electrophysiology recording, multi-modal neural interfacing systems can be developed. With new safe implantation procedures and diagnostic paradigms, implantable photonic devices can become a powerful toolbox for deciphering the hidden secrets of our brain. Additionally, fully integrated implantable photonic devices can also exploit the growing global semiconductor infrastructures, allowing more patients and researchers to afford such innovative healthcare solutions.

# Chapter 3 Fundamentals of CMOS Time-to-Digital Converters

# **3.1. Introduction**

The pulse-like operation of SPADs enables the detection of weak incident photons in picosecond temporal resolution. To SPADs can be coupled with time-to-digital converters (TDC) into SPAD-imagers capable of generate time-of-flight (ToF) signals. SPAD imagers can exploit the potential of SPADs in various biomedical imaging application such as PET, DoT and neural imaging.

TDC is a high-performance mixed-signal circuit used to digitize the time difference between the two input signals. With the advance of CMOS technology in the twenty-first century, TDC integrated SPAD-imagers became increasingly popular [8]. TDC has several important performance metrics including high resolution, high linearity and wide dynamic range. To maximize the performance of SPAD-imagers, integrated TDC should also consume minimum power and silicon footprint. Many novel TDC architectures have been proposed in recent years to meet the above design requirement. In the following section, we will introduce the fundamental operation of some common TDC architectures.

# **3.2.** Conventional TDC Structures

The most common design in TDC is the implementation of tapped delay lines (DLs) formed by a series connection of delay cells (DE) [9], [107], [108], [109], [110]. The basic operation is shown in Figure 3-1 below. The START signal is fed into the delay line, where replicas of the START signals, or phases, are generated. Each phase is delayed by a set amount,  $T_d$ , from the previous phase. Each phase of the DL is connected to a sampler array

which is commonly formed by DFF or arbiters. The arrival of STOP signal triggers the sample array to store phase information of the entire DL. The resulting output code represents the number of phases present between the time differences of START and STOP signal. Finally, the output codes from the sampler arrays are converted into the equivalent binary code by a thermal-to-binary encoder.



Figure 3-1: Block diagram of a conventional single stage TDC.

In the design of time-to-digital converters (TDCs), particularly those based on DL, each stage's delay is affected by process, voltage, and temperature (PVT) variations. PVT variation largely increase the nonlinearity of the TDC. To mitigate these effects, the delay line is often locked into a delay-locked loop (DLL) which is used to synchronize the delay along the line with a reference clock period.

## 3.3. Two-Step TDC

The conventional single delay line-based TDC has several major drawbacks since the  $2^N$  DEs are needed for an N-bits TDC, and the resolution of the TDC is limited to the unit delay of each DE. Several architectures were used to further improve PVT variation resilience of DL based TDC while maintaining high resolution. Increase transistor size of delay cell to increase the parasitic gate capacitance is the most straightforward approach in

DL based TDC. This approach leads to long unit delay, which is not favorable for high resolution TDC. Two-step TDC was introduced to resolve such problems by applying two different delay lines, each with a different unit delay [177], [178], [179].



Figure 3-2: Block diagram of a two-step TDC. Two coarse phases  $(P_C)$  and one fine phase  $(P_f)$  are present within the time interval between the START and STOP signals.

The operation of a two-step TDC is shown in Figure 3-3 above. the two-step TDC consists of coarse and fine delay lines. The coarse delay line  $(T_c)$  is much larger than the unit delay of the fine delay line  $(T_f)$ . A coarse result is first generated, representing a number of coarse phases  $(P_c)$  present within the input time interval. A remainder generator extracts the time difference between the last present coarse phases and the *STOP* signal as a remainder. It routes it into the fine delay line, which generates the fine results by counting the number of fine phases  $(P_f)$ .

Although it may seem beneficial to indefinitely increase the number of stages of two-step TDC into multi-step TDC, such scalability is limited. The remainder generation circuit cannot carry over the entire remainder duration, which can result in low precision output

after several stage. In summary, the two-step TDC utilizes the coarse stage for achieving a large dynamic range and the fine stage for high resolution [114], [115], [116], [117].

# **3.4. Vernier TDC**

Vernier TDC was proposed to overcome the limitations caused by PVT-variations to achieve higher performance [118], [119], [120]. As shown in the below Figure 3-2, Instead of interpolating the input timing difference by counting the number of replicated START phases when the STOP signal arrives. In Vernier TDC, both START and STOP signal propagate through independent DL, where the unit delay of the STOP signal ( $T_S$ ) is slightly shorter than the unit delay of the START signal ( $T_L$ ). The differences between  $T_S$  and  $T_L$  is the resolution of Vernier TDC. Due to the differences in unit delay, the STOP signal will eventually lead the START signal, indicating the end of the conversion. The number of lagging phases in the STOP DL are counted by a DFF sampler arrays to generate the result.



Figure 3-3: Block diagram of a Vernier delay line TDC

The Vernier TDC can achieve picosecond resolution by designing minimal delay difference between the START and STOP unit delay [119]. Although Vernier TDC can

achieve high resolution than single-DL TDC, it suffers from low DR and sampling rate in limited silicon area. Since the STOP signal require sufficient time to lead the START signal for the conversion.

To overcome shortcomings of conventional VDL TDCs. Recently proposed Vernier TDC start to recently adopt Vernier ring oscillator (VRO) or Vernier gated ring oscillator (V-GRO) as shown in the below Figure 3-4 [121], [122], [123]. Both START and STOP VDL are converted into a ring oscillator and can be activated by the corresponding input signal, where the STOP RO has a higher frequency than START RO due to the shorter unit delay. A loop counter is used to count the number of lagging STOP phases to convert the results. The implementation of loop counter allows VRO TDC to achieve high dynamic range and resolution without endlessly scaling the number of DE.



Figure 3-4: Block diagram of a Vernier ring oscillator TDC.

## **3.5. Pulse Shrinking TDC.**

Pulse shrinking (PS) TDC is another type of TDC topology proposed to achieve a high resolution [124], [125], [126]. Instead of propagating the START and STOP signal through DLs, a pulse width equal to the input time interval between the START and STOP signals are generated by a pulse generator as shown in the below Figure 3-5. Another difference between conventional DL TDC with PS TDC is the DE are designed with a mismatch between the rise and fall time. Therefore, the input pulse width will gradually shrink as it propagates through the PS DL. The input pulse width eventually shrinks below the detection limit of the DFF sampler array, indicating the end of the conversion. Since the

resolution is the rise and fall time difference of each TDC, PS TDCs are possible to achieve picosecond range resolution using asymmetrical transistor sizing or current starving methods.



Figure 3-5: Block diagram of a pulse-shrinking TDC.

There are several limitations for conventional PS TDC. Since the shrink rate of the PS TDC is determined by the rise and fall time mismatch of each DE, it become increasingly difficult to controls for high resolution PS TDC. As the pulse width gets narrower towards the minimum width that each DE can maintain consistently under PVT-variations and mismatches, the impact of nonuniform DE shrinkage will leads to premature ending of conversion.



Figure 3-6: Block diagram of pulse-shrinking Ring TDC.

Similar to VRO TDC, PS TDC also utilize ring structures to form pulse shrinking ring (PSR) TDC to overcome power and size limitations for high DR as shown in Figure 3-6. This is achieved by implementing a signal selector such as multiplexer at the beginning of the PS DL [123], [127]. However, PSR TDC still face nonuniform shrink rate issue due to circuit mismatches and PVT-variations. A dual PSR TDC structure was proposed in [124], [128] aimed to achieve high linearity. The START and STOP PSRs are routed to interconnect with one another, and the input pulse width is carried by an additional 50%

duty cycle pulse signal. The conversion is ended when the duty cycle of the pulse width falls below 50% instead of the minimum detectable width. Since the duty cycle of the looping pulse width are always kept more than 50%, the impact of nonuniform shrink rate are much lower than conventional single PSR TDC.

## **3.6. Time-Amplified TDC**

Since the unit delay in the fine delay line of a conventional two-step TDC must be small to generate high-resolution conversion. The transistor sizes of a fine DE are small and more sensitive to PVT-variations. Time-amplified (TA) TDC is one approach to overcome such limitation by amplify the remainder from the first stage using a time-amplifiers (TA) [129], [130], [131], [132], the amplified remainders will be further interpolated by the fine stage TDC for high resolution output.

The main function of the TA is to amplify a small-time interval into a larger one. Figure 3-7 below shows the operation of TA in a two-step TDC, which is amplifying remainder from the coarse stage and feed the result into the fine TDC as a *STOP* signal. TA allows the unit delay of fine delay line to be long, which generally has a better PVT-variation resistance than short unit delay DE. TA also allows both coarse and fine stages to use the same delay line setup and the fine bits are determined by the amplification factor to avoid extra calibration for a separate delay line.



Figure 3-7: Example of TA-TDC implemented with a 2 × time-amplifier.

However, conventional time amplifiers which utilize SR-latches suffer from uncontrolled amplification and require complex calibration [132]. To resolve such issue, a two-step TA-TDC that uses pulse-train TA fabricated in TSMC 65nm technology was introduced in [129], [130]. The pulse-train TA is able to achieve linear and programmable time amplification without a calibration circuit. Instead of SR-latch, pulse-train TA uses a buffer-based delay line to create multiple replicas of the remainder (Figure 3-8). The number of delay cells (*N-1*) is determined by the amplification factor of the TA. Each pulse replica is routed into a pseudo-OR gate to generate the amplified remainder. The advantage of this delay line setup in TA is that the amplification factor is free from PVT-variations since the unit delay is designed to be larger than the maximum remainder at all process corners. By enabling/disabling the number of active inputs in the pseudo-OR gate, the amplification factor of the pulse-train TA was first used in a 3.75 ps LSB, 0.476 ns dynamic range 200 MHz TDC proposed in [129]. Compactness is another benefit of pulse-train TA, which only occupies 0.0024 mm<sup>2</sup> in the standard 65 nm CMOS process [129].



Figure 3-8: Operation concept of the pulse-train amplifier

While pulse-train TA-TDC shows promising results in recent publications, it has several limitations. The first limitation is unstable amplification of narrow input pulses without dedicated calibration circuits and post-processing methods. The second issue is scalability. As the amplification factor of pulse-train scales up in order to achieve high resolution. The pulse train delay line and pseudo-OR gate will also scale up and consume large silicon area and power consumption of the entire chip. Due to the above limitations, the current state-of-the-art TA-TDC only has a maximum amplification factor of 8 [129].

## **3.7. Stochastic Phase Interpolation TDC**

Despite numerous architectural of TDC proposed in recent years in attempt to limit the impact of PVT-variations on TDC performance. The true resolution, precision and nonlinearity ultimately depend on consistent unit delay of the DL at all operating conditions. To counter this issue, approaches such as Vernier ring TDC and current-mode logic (CML) TDC were proposed [133], [134].

A recently proposed approach to mitigate delay variations caused by PVT-variation and circuit mismatches without additional calibration circuits is stochastic phase interpolation (SPI) TDC, first proposed in [21] and further analyzed in [135]. One major drawback of conventional TDC is the nonlinearity caused by circuit nonidealities, and mismatches can cause ununiform distribution of edges within TCLK; in other words, the number of edges is not equal to  $2^{N}$ . To address this issue, SPI-TDC consists of  $2^{N+K}$  delay cells.  $2^{K}$  delay cells are used as hardware redundancy, and K-bit interpolation results are eventually truncated to output the N-bit results from the TDC.



Figure 3-9: Illustration of a SPI-TDC. Arrows represent the positive edges of each phase.

As shown in above Figure 3-9, one caviar of the SPI-TDC is that the unit delay  $(T_d)$  is still  $\frac{T_{CLK}}{2^N}$  instead of  $\frac{T_{CLK}}{2^{N+K}}$  where  $T_{CLK}$  is the period of the input clock. The redundant  $2^K$ 

delay cells are used to generate additional K phases that align with the edge of the input clock, or "wrap-around" each clock period [135]. Therefore, after all phases in the delay line are initialized,  $2^{N+K}$  edges are simultaneously present within a single clock period. Therefore, within an input time interval between *START* and *STOP* signals, SPI-TDC can have more edges distributed than conventional TDC. The output of SPI-TDC is the digital code representing the number of edges after truncating K-bits.

A detailed study in [135] provides insight into separate sources of errors in SPI-TDC into deterministic and stochastic. Deterministic errors originate from errors originate from the mismatch between the actual and ideal unit delay, which often means a global shift of DL output from the reference clock edge. To reach the best linearity performance, it was found the deterministic errors can be lower by increase the phase density as it distribute evenly over more DE [135]. This is achieved by extending the number of delay cells into an integer multiply of the original.

To model deterministic errors, the total propagation delay (T) of the delay line in an SPI-TDC is described by  $T = nT_d$ , where *n* is the number of delay cells needed to reach  $T_{CLK}$  delay. In the ideal conditions, the phase density,  $nT_D/T_{CLK}$  should 1.  $\beta$  is used to represent the modulo part of  $\frac{nT_d}{T_{CLK}}$ . Which should be 0 ideally. However, deterministic errors in the delay line can deviate the number of edges distributed within a single  $T_{CLK}$  from the designed conditions, which causes phase density to deviate from 1 and a non-zero  $\beta$ . The resulting maximum integrated nonlinearity (INL) is described below in Equation 3-1 [135].

$$INL_{max} = \frac{2^{N}\beta(1-\beta)T_{CLK}}{nT_{d}}$$
(3-1)

The worst-case  $INL_{max}$  is reached when  $\beta$  is 0.5, therefore, the effective number of bits (*N'*) can be found by solving for *N* in Equation 3-1. Assuming the SPI-TDC is targeting a 1-bit  $INL_{max}$ . *N'* can be represented by the below Equation 3-2.

$$N' = 2 + \log_2(n) + \log_2\left(\frac{nT_d}{T_{CLK}}\right)$$
(3-2)

Equation 2-2 can also be rewritten to Equation 3-3 below to find the number of DE required for a SPI-TDC to reach N' with a maximum 1-bit  $INL_{max}$  when deterministic errors are considered.

$$n = 2^{\left[N'-2-\log_2\left(\frac{nT_d}{T_{CLK}}\right)\right]}$$
(3-3)

The above shows that to reduce nonlinearity caused by deterministic errors, it is important to scale up the phase density by extending the number of DE by an integer multiply in the DL.

The other type of error present in TDC is stochastic errors resulting from circuit nonidealities and mismatches. When stochastic errors are present, unit delay of each DE varies from one another and the total delay can only be described as a summation of every unit delay, as shown in the below Equation 3-4.

$$\sum_{i=1}^{n} T_d[i] = T_{CLK}$$
(3-4)

The unit delay can also be considered as a mean  $(\mu_{T_d})$  with variance due to jitter  $(\sigma_j^2)$ and mismatch  $(\sigma_m^2)$ . The relationship between targeted N' and the number of delay cells considering both jitter and mismatches are represented by below Equations 3-5 and 3-6 [135]. Both shows that to mitigate stochastic errors, *n* should be quadrupled to improve N' of the SPI-TDC by 1-bit.

$$N' = \frac{1}{2}\log_2(n) + \log_2(\mu_{T_d}/\sigma_j^2)$$
(3-5)

$$N' = \frac{1}{2}\log_2(n) + \log_2(\mu_{T_d}/\sigma_m^2) + \frac{1}{2}$$
(3-6)

It may seem beneficial for SPI-TDC to continuously improve its performance by adding more K-bit hardware. However, such scaling is practically impossible due to both power and chip area constraints. However, revisiting Equations 3-2, 3-5, and 3-6 shows that increasing n will reduce deterministic errors more effectively than stochastic errors. Eventually, stochastic errors will dominate deterministic errors as n scales up to a critical number n' defined below Equation 3-7 [135]. This means only a 1-bit redundancy (i.e.

double the DE) is needed if n < n' to reduce  $INL_{max}$  of the TDC by 1-bit. When n > n', the number of DE need to be quadrupled.

$$n' = \frac{1}{16} \frac{T_{CLK}}{\sigma_i^2 + \sigma_m^2/2}$$
(3-7)

The above result shows that the linearity improvement become costly for chip area and power consumption when the number of delay cells is below n', the boundary between deterministic- and stochastic- error. It is critical to keep the n as an integer to nullify the nonlinearity caused by deterministic errors, therefore, the SPI-TDC will only be effected by stochastic errors [135]. To achieve this, recently proposed SPI-TDC start to implement DLL to minimize deterministic errors.

## **3.8.** Comparison and Discussion

The recent advances in TDC topologies made them powerful components in various applications that requires precision stamping of incoming signals. Most of the conventional structure of the abovementioned TDCs uses either single or dual DLs to create replica phases of a reference clock. The high resolutions are offered by either having fine delay stage or implement multiple clocks with difference frequency for each DL of the TDC. The results are converted by counting the number of these replica phases within the input time interval.

The main source of nonlinearity and resolution degradation is the shift of DE unit delay due to mismatch and PVT-variation. Therefore, the use of DLL to lock phases with the reference clock became a common practice. Low power stochastic phase interpolation topologies were also proposed in recent years to limiting PVT-variation on DL TDC by increasing the phase densities and even out both deterministic and stochastic errors in the DL with a tolerable cost of silicon area.

RO-based TDC such as VRO-TDC and PSR-TDC are topological innovations for improving the performance of TDC compared to their DL counterparts. In this method, a ring oscillator (RO) is created using differential buffer elements. A stable oscillation is generated by routing the last phase of the DL back to the input DE. Chip area and power efficiency are the main advantages of RO-TDC compared to their DL counter parts. RO-TDC also showed ability to reject common mode noises.

The performances of representable TDCs using the beforementioned topologies are summarized as shown in the below Table 3-1. Each TDC In order to evaluate the listed techniques, we proposed a figure-of-merit for benchmarking TDC performance ( $FoM_{TDC}$ ) in Equation 3-8 below.

$$FoM_{TDC} = \frac{N_{Linear} \times F_s(GHz)}{LSB \ (ps) \times Power \ (mW) \times Area \ (mm^2)}$$
(3-8)

The proposed FoM reward TDC designs with fine LSB, a high  $N_{Linear}$ , and fast sampling frequency (F<sub>S</sub>) while consuming low power in a small footprint. The highest  $FoM_{TDC}$  is achieved by original SPI-TDC design, followed by VRO-TDC and TA-TDC design. The  $FoM_{TDC}$  shows the importance of using active methods such as SPI and VRO to suppress nonlinearity from circuit nonideality and PVT-variations, while TA-TDC shows the viability of approaches focus on achieving high-resolution. In the next chapter, we will discuss how the proposed TDC are designed upon the proven methodology developed by the above mentioned TDC architectures.

| N                   | Туре           | Tech<br>(nm) | LSB (ps) | DR (ns) | <sup>a</sup> INL<br>(LSB) | *N <sub>Linear</sub> | Fs (MHz) | Power<br>(mW) | Area<br>(mm <sup>2</sup> ) | FoM <sub>TDC</sub> |
|---------------------|----------------|--------------|----------|---------|---------------------------|----------------------|----------|---------------|----------------------------|--------------------|
| Y ear,<br>Reference |                | (1111)       |          |         | (LSD)                     |                      |          | (111 ** )     |                            |                    |
| 2019<br>[136]       | Two-step       | 180          | 5.3      | 0.7     | 2.8                       | 6.07                 | 30       | 1.1           | 0.05                       | 0.62               |
| 2018<br>[137]       | VDL            | 45           | 1.25     | 0.319   | 0.34                      | 7.58                 | 80       | 27.3          | 0.08                       | 0.22               |
| 2016<br>[138]       | VRO            | 65           | 15       | 3.44    | 0.83                      | 11.12                | 5        | 0.16          | 0.0013                     | 17.82              |
| 2016<br>[124]       | PSR            | 180          | 1.8      | 0.92    | 8.7                       | 2.72                 | 4.4      | 3.4           | 0.07                       | 0.03               |
| 2015 [21]           | SPI            | 14           | 1.17     | 2.39    | 2.3                       | 8.28                 | 100      | 0.78          | 0.036                      | 25.20              |
| 2020<br>[129]       | SPI (DLL)      | 180          | 63       | 15.30   | 1.47                      | 6.7                  | 60       | 25            | 1.13                       | 0.0003             |
| 2013<br>[129]       | TA (GDL)       | 65           | 3.75     | 0.476   | 2.3                       | 5.28                 | 200      | 3.6           | 0.02                       | 3.91               |
| 2019<br>[136]       | Two-step<br>TA | 180          | 5.3      | 1.36    | 2.8                       | 6.07                 | 30       | 1.1           | 0.05                       | 0.62               |

 Table 3-1: Summary of Performance of TDC from Selected Publications

<sup>a</sup>: maximum reported

**\*Effective linear bits:**  $N_{Linear} = Bits - log_2(INL + 1)$ 

VDL: Vernier delay line. VRO: Vernier ring oscillator. PSR: Pulse shrinking ring. SPI: Stochastic phase interpolation. DLL: Delay locked loop. TA: Time amplifier. INL: Integrated nonlinearity. DR: Dynamic range. Fs: Sampling rate.

# Chapter 4 Time Amplified, Stochastic Phase Interpolation Time-to-Digital Converter

# 4.1. Introduction

The increasing demand for high-performance SPAD imagers in biomedical imaging applications is driving the development of novel TDC (Time-to-Digital Converter) designs to produce more precise timing information. In fields such as brain neural imaging, these advanced TDCs could enable the simultaneous detection of multiple neural markers at different wavelengths, potentially revealing new insights into neural interactions within living organisms. New challenges are proposed for novel TDCs to show high performances within a compact silicon footprint and ultra-low power consumption in the future.

The performance of TDC is crucial for SPAD-imagers. First, the LSB of TDC directly translate to spatial resolution (TSR) of the SPAD imager as shown in the below Equation 4-1, where *C* is the speed of light. As state-of-the-art SPAD imagers approach millimeter-scale spatial resolution, achieving picosecond-level LSB becomes a significant challenge for TDC design in these imagers [85]. High-performance TDCs used in SPAD imagers for biomedical imaging applications achieve a LSB of 33 ps, which correspond to 10 mm translated spatial resolution of the scan [139].

Translated Spatial Resolution 
$$(mm) = LSB_{TDC}(ps) \times C$$
 (4-1)

INL is the most common metrics used to evaluate the linearity performance of TDC, it a measure of number of missing codes a TDC will lose along the quantization characteristics. High non-linearity in a TDC can result in false timestamping of photon events, which degrades imaging quality. For example, a TDC with a 10 ps resolution and 1 LSB of INL

could potentially cause a 3 mm degradation in the final image. To minimize such degradation, the INL of a TDC used in a SPAD imager should be less than 1 LSB, ensuring it matches the desired spatial resolution.

Silicon footprint is a critical design consideration when integrating TDCs into SPAD imagers. Photon Detection Probability (PDP) is a SPAD-specific parameter that describes the ratio of detected photons to incident photons in a SPAD imager. A TDC with a large silicon footprint can reduce the available space for active region of SPAD pixels, in turn, reduce the fill factor (FF) as shown in Equation 4-2. A low FF would lead to reduction in the photon detection efficiency (PDE) of the imager as shown in Equation 4-3.

$$FF = \frac{Active \ Region \ (mm^2)}{Total \ Chip \ Area \ (mm^2)} \times 100\%$$
(4-2)

$$PDE = PDP \times FF \tag{4-3}$$

The sampling rate ( $F_S$ ) of TDC is the inverse of the minimum time needed for the TDC to finish a conversion. TDC with a high  $F_S$  can be shared by multiple SPAD pixels to reduce the total number of TDCs required in a SPAD imager. TDC sharing reduces the total power consumption, silicon footprint, and improve the system PDE of the SPAD-imager. The maximum number of SPAD can be shared by a single TDC is determined by both the  $F_S$  of the TDC and the dead time of the SPAD pixel, which measures the amount of time required for the SPAD pixel to reset for detecting new photon event. If the  $F_S$  of the TDC is fast enough to cover 1,000 SPADs during the reset time, it is possible to use only two TDCs for a 1-megapixel SPAD imager. The current  $F_S$  standard for TDC in biomedical SPAD-imagers in the range of tenths to hundredths of MHz [8].

As the performance of SPAD pixels rapidly evolves in recent years, it is necessary to improve the performance of TDC as well. To compete with the state-of-the-art SPAD-imager performance, target performance metrics for the proposed TDC are listed in the below Table 4-1. The proposed TDC aim to achieve a sub-10 mm TSR, which translate to a < 30 ps LSB. The INL should be less than 1 LSB to minimize imaging quality degradation. The F<sub>s</sub> of the proposed TDC should be in the range of 100 MHz to have enough tolerance for pixel multiplexing circuits in future integration with SPAD pixels.

The power consumption and total area of the proposed TDC must be minimized for use in neural photonic implant applications. The critical heat flux for neural implants should remain below 40 mW/cm<sup>2</sup> to prevent damage to brain tissue [140]. The current state-ofthe-art SPAD-based implantable neural imagers consume 6.24 mW within a 0.48 mm<sup>2</sup> silicon footprint per shank [46], [48]. Assuming the proposed TDC is integrated with such implantable imagers, it should occupy less than 0.05 mm<sup>2</sup> and consume less than 6.24 mW of power to keep the total heat flux under 25 mW/mm<sup>2</sup>. This would leave sufficient power and area headroom for additional readout circuitry.

The target performance metrics for the proposed TDC is listed in the below Table 4-1. Although biomedical ICs are required to function at a temperature range from -40°C to 80°C and tenths of mV supply voltage variation. However, the performance of SPAD pixels such as PDP and DCR are highly sensitive to temperature and supply voltage variation. SPAD imaging system typically incorporate high precision voltage supply to minimize voltage-induced performance fluctuation. For implantable neural interfacing application, the brain temperature variation range is several degrees Celsius. With this in mind, the performance of the proposed TDC was simulated under worst-case scenarios to assess its potential and limitations across a broader range of biomedical applications in the future.

| LSB (ps) | INL (LSB) | Fs (MHz) | Power (mW) | Area (mm <sup>2</sup> ) |
|----------|-----------|----------|------------|-------------------------|
| < 33     | < 1       | 100      | < 6.24     | < 0.05                  |

Table 4-1: Target performance metrics for the proposed TDC.

# 4.2. Operating Principle

The performance of a TDC can degrade due to variations in process, voltage, and temperature (PVT). In recent years, TDC designs have implemented hardware redundancy and delay-locked loops (DLL) to mitigate these circuit non-idealities. However, both methods require significant chip area, which can negatively impact the fill factor when integrated with a SPAD. To achieve fine resolution and PVT tolerance while minimizing chip size, a compact Time-Amplified Stochastic Phase Interpolation (TASPI) TDC



structure, using a time amplifier, is proposed. A block diagram of the proposed TDC is shown in Figure 4-1.



The previous subsections introduced the detailed operating principles of TA-TDCs and SPI-TDCs. A key limitation of TA-TDCs is that interpolation errors from the coarse stage are amplified and passed to the fine stage as remainders, leading to high INL. On the other hand, SPI-TDCs require a large silicon footprint to accommodate redundant delay cells. To reduce INL by just 1 bit, the number of delay cells often needs to be quadrupled, especially when aiming for a resolution around 10 ps.

It is possible to combine the advantages of SPI-TDC and TA-TDC. In the proposed TDC design, we implement both SPI-TDC and a DLL in the coarse stages to reduce nonlinearity. To achieve a finer LSB, an 8× pulse-train TA is used to amplify the remainders through the remainder generation logics (R.Gen) from the coarse TDC, followed by further interpolation in the fine TDC stage. The number of GDE in the coarse TDC is designed to have an LSB of 125 ps, which is doubled to reduce the INL by 1-bit based on the analysis in Section 3.7. The results are interpolated and converted to the corresponding digital code by sampler array and thermal-to-binary encoder (TTBE).

# 4.3. Circuit Design

We begin by analyzing the system level design of the proposed TASPI-TDC architecture based on existing analysis mentioned in the chapter 3 to achieve the proposed

design targets [135]. Since the output frequency would be halved by the DFF-style sampler used in the proposed TDC, a reference clock of 250 MHz is used to achieve a theoretical maximum  $F_s$  of 125 MHz.

To meet the target < 33 ps LSB and avoid unit delay variation due to PVT-variation, a DE unit delay of 125 ps and TA amplification factor of 8 has been chosen to generate a theoretical LSB of 15.63 ps from the fine SPI-TDC. The unit delay of 125 ps translate to 5-bit, 32 DE coarse delay line. A bit-redundancy of 1 is chosen based on an iterative post-layout analysis of unit delay of the VCDL using methods described in section 3.7. As a result, a total number of 64 coarse DE is chosen to form the coarse VCDL.



Figure 4-2:  $INL_{max,fine}$  versus  $\beta$  (fractional part of  $\frac{nT_d}{T_{CLK}}$ ) for different  $\gamma$  (integer part of  $\frac{nT_d}{T_{CLK}}$ ).

A 3-bits fine stage SPI-TDC is used, which correspond to the 8x amplification factor, which means the maximum amplified remainder feed into the fine TDC is 1 ns. The proposed TDC does not recommend implementation of a SPI-TDC with lower LSB than the coarse stage as the approach can occupy more silicon footprint for additional biasing circuits and input reference clock signal. The number of GDEs used in the fine SPI-TDC ( $n_{fine}$ ) is determined by examining Equation 3-1. As shown in the above Figure 4-2,  $n_{fine}$  must be an integer multiply of 32 to eliminate deterministic error caused INL from the fine SPI-TDC ( $INL_{max,fine}$ ). A 32 GDEs fine VCDL is used to minimize silicon footprint of the proposed TDC.

The number of DE also reduces nonlinearity caused by stochastic errors including jitters and mismatches. To achieve minimum silicon footprint design target, the proposed TDC parameters should be designed to maximize the effective number of bits by only double the number of DE. By examining Equation 3-5, 3-6 and assuming  $\mu_{td} = \frac{T_{CLK}}{2^b}$ , we can find the critical standard deviation due to jitter ( $\sigma_{j,0}$ ) and mismatch ( $\sigma_{m,0}$ ) can be tolerated by the TDC to reach 0 LSB INL in Equation 4-4 and 4-5.

$$\sigma_{j,0} = \frac{T_{CLK}}{2^{\frac{3b-1}{2}}}$$
(4-5)

$$\sigma_{m,0} = \frac{T_{CLK}}{2^{\frac{3b}{2}}}$$
(4-6)



Figure 4-3: Critical standard deviation of unit delay due to jitter ( $\sigma_{j,0}$ ), mismatch ( $\sigma_{m,0}$ ) and their sum ( $\sigma_{total,0}$ ) for each number of bits (b)

The above equation is crucial for finding the theoretical maximum LSB of the proposed TASPI-TDC. In the above Figure 4-3,  $\sigma_{j,0}$ ,  $\sigma_{m,0}$  and the sum  $\sigma_{total,0}$  are plotted using operating conditions of the proposed TDC. It is evident that under the same clock period conditions,  $\sigma_{total,0}$  reduces logarithmically with b. Since the proposed DE design is unable to achieve  $\sigma_{total,0} < 1$ ps at the minimum device dimensions in simulation, the proposed TDC to operate at more than 8-bit mode without increasing the frequency of the 250 MHz reference clock. The best LSB that the proposed TDC can theoretically achieve is 15.63 ps.

Although the proposed TDC can achieve the design targets, above analysis indicates several limitations and trade-offs of the proposed TASPI-TDC architecture on a system level. First, the number of fine bits and DE are limited by the amplification factor of the TA which cannot scale high without costing INL due to gain errors. Second, the number of fine DE must be an integer multiply of the  $\frac{T_{CLK}}{T_d}$  to eliminate deterministic errors from the fine stage TDC, thus increase the silicon footprint of the proposed TDC. In order to minimize impact of stochastic errors on INL, the best LSB and number of bits TASPI-TDC can achieve is limited by the reference clock frequency.

#### 4.3.1. Delay Locked Loop

Delay locked loop (DLL) is the core component in the coarse SPI-TDC. The schematic of the DLL is shown in Figure 4-2. The 250 MHz reference clock (*CLK*) is fed into the 32-DE DLL and its redundancy replica for SPI. The DLL consists of a voltage-controlled delay line (VCDL), a phase frequency detector (PFD), and a charge pump (CP). The purpose of the DLL is to precisely control the  $V_{cn}$ , which changes the rising time delay of the VCDL. Control voltage  $V_{cn}$  and  $V_{cp}$  are determined at the beginning of the measurement to give the least time differences between the last phase and the rising edge of the reference clock. If the VCDL falls out of phase with the reference clock, the control voltage Vcn adjusts and locks the VCDL to align with the rising edge of the reference clock. The second VCDL is used for hardware redundancy.



Figure 4-4: Schematic of the 205 MHz SPI delay locked loop in coarse and fine TDC.

The VCDL consists of 32 GDEs as shown in Figure 4-5. The delay of each GDE can be finely tuned within a certain range to counter PVT variations using the analog control

voltage  $V_{cp}$  and  $V_{cn}$ . The VCDL is designed for cover the entire ~4 ns period of *CLK* across 32 GDE, which requires a  $\mu_{td}$  of 125 ps delay per GDE. One additional GDE is placed at both the start and the end of the VCDL to match the load capacitance of each GDE.



Figure 4-5: Schematic of the current-starved gated delay cell replicated to form the voltage-controlled delay lines.

In the delay cell shown in Figure 3-4, M<sub>3</sub>, M<sub>4</sub>, M<sub>7</sub>, and M<sub>8</sub> are two series connected inverters that construct a buffer. A standard cell inverter is used for M<sub>7</sub> and M<sub>8</sub> inverter pair since to reduce the design complexity and improve layout matching with other standard cells. The transistors M<sub>3</sub> and M<sub>4</sub> are used as control the current flowing through the first inverter, such that the propagation delay through the buffer can be tuned by adjusting the control voltages  $V_{cn}$  and  $V_{cp}$ . Gating is achieved by employing transistors M<sub>1</sub>, and M<sub>6</sub> to receive enable and disable signal. The unit delay of any DE is determined by both the on current of the inverter I<sub>ON</sub>, and the sum of the load capacitance (C<sub>L</sub>) [141]. When design single GDE, the load capacitance is taken as the sum of gate oxide capacitance of both M<sub>3</sub> and M<sub>4</sub> denoted as Co<sub>X,G</sub>. However, capacitance from wire interconnects C<sub>wire</sub> will add into the total load capacitance and increase the unit delay when delay cells are placed inside the 5-bit VCDL. The size of M<sub>3</sub> and M<sub>4</sub> in both inverter pairs are designed to produce ~50 ps delay in the Typical-Typical (TT) process corners at  $V_{cn}$  of 750 mV and  $V_{cp}$  of 250 mV to allow for a ~150 ps tunning range of VCDL unit delay in various PVT conditions.
The bang-bang phase-frequency detector (PFD) shown in Figure 4-6a is used in the DLL to determine phase differences between the input phase P[i] and CLK [116]. Based on these phase differences, UP and DOWN signals are generated to control the charge pump. When P[i] lags the CLK signal, the DOWN signal is activated until the phase difference reaches a minimum. Conversely, when P[i] leads the CLK signal, the UP signal is turned on. The UP and DOWN signals are then fed into the charge pump (CP) shown in Figure 4-6b [116]. M<sub>5</sub> is utilized to release  $V_{cn}$ , and a 200 fF capacitor  $C_{LF}$  is used to for mitigating control voltage fluctuations.



Figure 4-6: Schematic of the (a) phase-frequency detector and (b) charge pump used in the delay-locked loop. To meet the system requirement of <1 LSB The accumulated delay of the delay line should closely follow the idea value to reduce inherited nonlinearity from the VCDL. The highest deviation from idea accumulated delay would typically occur at the last phase of the DLL [135]. Due to a floor planning issue, only the first 8 GDE are connected to the remainder generation logic array, which effectively reduce the DR of the proposed TDC to 1.13 ns.</li>

In the below Figure 4-7, pre- and post-layout simulation of accumulated delay difference of the DLL in various standard PVT conditions, worst-case scenarios of SS, 80°C, 90% V<sub>DD</sub>, and FF, -30°C, 110% V<sub>DD</sub> are plotted. The DLL has a minimum operating temperature of -30°C, which is higher than the -40°C standard for biomedical IC. Because the VCDL can be re-biased for different PT variation, demonstrating a deviation of 5.71 ps from the ideal value in these conditions. The highest deviation of 20 ps occurs in the two

worst-case scenarios when  $V_{CN}$  and  $V_{CP}$  also vary by 10%. This highlights the importance of incorporating on-chip biasing circuitry in the future, such as a combination of a DAC and PTAT voltage references, to generate  $V_{CN}$  and  $V_{CP}$  that are independent of supply VDD variations [142].



Figure 4-7: Deviation of accumulated delay of the DLL from the ideal value measured in both pre-layout (left) and post-layout (right) simulation.  $V_{CN}$  and  $V_{CP}$  are adjusted to meet the target delay in process and temperature corner simulation. Simulation temperature is 27°C and supply voltage is 100% V<sub>DD</sub> if not otherwise specified.

## 4.3.2. Sampler Array

In the TDC, an array of samplers is used to store the state of the GDLs for each conversion stage. The DFF-based sampler is shown in the below Figure 4-6, are connected to the output of each stage of the GDLs.



Figure 4-8: Schematic of the sampler used for each delay cell, consisting of two DFF.

The first DFF is used to detect the presence of phase P[i] within the input time interval. However, the sampler output will prematurely switch when detecting time intervals greater than 2 ns, as illustrated in the below Figure 4-8. To address this issue, sampler outputs must be updated at a slower frequency than the reference clock. The second DFF hold the sampler result S[i] and outputs it at a quarter of the 250 MHz clock, which is generated by another DFF frequency divider. Such frequency reduction issue should be captured in future system-level model of the proposed TDC architecture.



Figure 4-9: Example of sampler output S[i] prematurely switched when STOP signal is larger than 2 ns.

To meet the 15 ps LSB requirement. The minimum detectable time interval of the sampler array should be less than the LSB of 15.625 ps. Since all variant of DFFs in the standard library have the same timing constraints of 4 ps under the worst-case scenario of (SS corner, 90%  $V_{DD}$ , 125°C), the smallest DFF cells are used to meet the silicon footprint requirement.

As both pre- and post-layout simulation results shown in the below Table 4-2, the DFFstyle sampler cell used in the proposed TDC satisfied the above design requirements. A maximum detectable time interval of 2.74 ps is observed in normal PVT conditions, while a maximum of 6.21 ps is recorded in the worst-case scenario. For future generation TASPI-TDC aiming for < 5 ps, alternatives such as SR-latch style samplers should be considered to fulfill the LSB requirement [143].

Table 4-2: Minimum detectable time interval of sampler cell measured from both pre- and post-layout simulations in various PVT conditions

| Conditions                     | Pre-Layout (ps) | Post-Layout (ps) | Max Fs (MHz) |  |
|--------------------------------|-----------------|------------------|--------------|--|
| TT, 27°C, 100% V <sub>DD</sub> | 1.76            | 1.90             | 62.5         |  |

| SF, 27°C, 100% V <sub>DD</sub>  | 1.02 | 1.63 |  |
|---------------------------------|------|------|--|
| FS, 27°C, 100% V <sub>DD</sub>  | 1.37 | 2.11 |  |
| FF, 27°C, 100% V <sub>DD</sub>  | 1.09 | 1.50 |  |
| SS, 27°C, 100% V <sub>DD</sub>  | 1.59 | 2.74 |  |
| TT, -40°C, 100% V <sub>DD</sub> | 2.81 | 3.17 |  |
| SS, 80°C, 90% V <sub>DD</sub>   | 5.18 | 6.21 |  |

## **4.3.3. Remainder Generation Logic**

After the input time interval is interpolated by the coarse TDC stage, a remainder needs to be generated for fine resolution interpolation in the fine stage of the TDC. The schematic and output pulse of the remainder generation logic (R.Gen) are illustrated in the below Figure 4-10 [129]. R.Gen is activated only when the sampler result of the current phase S[i] is positive, and the sampling result of the next phase S[i+1] is negative. To prevent pulse shrinking during replication by the TA, the inverted P[i+3] is injected to ensure a minimum remainder output pulse width of 250 ps. Outputs from the remainder generation array are then routed to an 8-input pseudo-*OR* gate to generate a single remainder for TA to amplify [129].



Figure 4-10: Schematic of the remainder generation logic used for each GDE.

The generated remainder must closely match the actual remainder within the LSB of 15.63 ps. We select the minimum size *XOR* and inverter to minimize the load capacitance

added to both connected GDEs and samplers. The AND cell model with minimum propagational delay are chosen. The maximum errors of the remainder extracted from both pre- and post-layout simulations are presented in Table 4-3. These results demonstrate that the R.Gen cell used in the proposed TDC meet the required INL. However, further improvement can be made to reduce the maximum remainder error in the worst-case scenario to below 1 ps for future generations of the proposed TDC aiming for < 5 ps resolution.

| Conditions                             | Pre-Layout (ps) | Post-Layout (ps) |
|----------------------------------------|-----------------|------------------|
| TT, 27°C, 100% V <sub>DD</sub>         | 0.20            | 0.28             |
| SF, 27°C, 100% V <sub>DD</sub>         | 0.79            | 0.88             |
| FS, 27°C, 100% V <sub>DD</sub>         | 0.34            | 0.38             |
| FF, 27°C, 100% V <sub>DD</sub>         | 0.57            | 0.66             |
| <b>SS</b> , 27°C, 100% V <sub>DD</sub> | 0.24            | 0.46             |
| TT, -30°C, 100% V <sub>DD</sub>        | 0.21            | 0.25             |
| TT, 80°C, 100% V <sub>DD</sub>         | 0.58            | 0.75             |
| FF, -40C, 110% V <sub>DD</sub>         | 2.86            | 3.08             |
| SS, 80C, 90% V <sub>DD</sub>           | 3.03            | 3.27             |

Table 4-3: Maximum remainder error measured in both pre- and post-layout simulations under various PVT conditions.

### 4.3.4. Time Amplifier

In this work, a calibration-free pulse-train TA is implemented in this TDC [129]. The pulse-train TA is designed to generates 8 replicas of the input pulse, which are then passed to the fine stage TDC as the STOP signal for further interpolation.

The structure of the pulse-train TA is shown in Figure 4-11. To generate 8 replicas of the input pulse, the input time interval propagates through an array of 7 TA delay elements (TA-DE). The output from each TA-DE is connected to an 8-input pseudo-*OR* gate, which consists of NMOS and PMOS using the standard sizing of the standard cell to reduce layout mismatching with other components in the TDC. To reduce calibration complexity and timing mismatches between the TA DL and fine-stage TDC, the coarse 5-bit VCDL is used

for the proposed TDC architecture. Since the pulse width of remainder ranges from 250 to 375 ps, replicas are generated from every 4 GDEs of the 5-bit VCDL, with each pulse replica spaced 500 ps away apart.



Figure 4-11: Schematic of the pulse-train time amplifier. Inset showing the schematic of the pseudo-OR gate.
Gain error is a key performance requirement for the proposed pulse-train TA. It refers to the difference between the actual amplification factor of the TA and its designed value.
To meet the <1 LSB INL requirement, gain error of the TA should not exceed <4.17% when amplifying maximum input remainder.</li>

Since a 5-bit VCDL is used and can be locked with the biasing conditions of the coarse DLL, the TA gain error can be minimized across various PVT conditions. Gain errors measured from both pre- and post-layout are presented in the below Figure 4-12. A maximum gain error of 2.9% is observed in the normal PVT conditions, while 6.4% is found in the worst-case scenario. These simulation results indicate that the pulse-train TA used in the proposed TASPI-TDC meets the INL design requirement in the expected operational environment.



Figure 4-12: Gain error of TA used in the proposed TASPI-TDC in various PVT conditions. Extracted from both pre- (left) and post-layout (right) simulations. Simulation temperature is  $27^{\circ}$ C and supply voltage is 100% V<sub>DD</sub> if not otherwise specified.

## 4.3.5. Thermal-to-binary Encoder

A MUX-style 5-bit thermal-to-binary encoder (TTBE), shown in Figure 4-13, is used to convert the number of detected phases into the corresponding binary code [144]. The proposed TDC utilizes the MUX-TTBE due to its simplicity in design and minimal transistor usage compared to other TTBE types, such as the Wallace tree encoder and ROM encoder [135].



Figure 4-13: Schematic of the 5-bit MUX TTBE. The output bit order is shown as B[i].

The smallest MUX standard cells are chosen to minimize the silicon footprint of the TTBE. Since the MUX TTBE serves as the final output stage of the TDC, it is crucial that it does not introduce any nonlinearity into the system. As both pre- and post-layout simulations shown in Figure 4-14, the MUX TTBE used in the proposed TDC accurately converts input signals under all PVT conditions.



Figure 4-14: Response of MUX TTBE to number of input phases in decimal. (a) pre-layout and (b) postlayout simulation results at different PVT conditions. Simulation temperature is  $27^{\circ}$ C and supply voltage is  $100\% V_{DD}$  if not otherwise specified.

Since each phase in the VCDL has a unit delay, output of the sampler array switches rapidly at the start of the conversion. As a result, an initialization time is required for the MUX TTBE to generate the correct binary code, which ultimately determine the dead-time of the proposed TDC. The maximum initialization time occurs when all 63 phases are present in the input time interval. As shown in the below Table 4-4, a maximum initialization time of 6.41 ns is observed from both pre- and post-layout simulations. TTBE used in future generation TDCs should focus on eliminating the metastable initialization period from the output. To achieve this, a register circuit can be implemented to store the input signals and ensure that the stabilized conversion result is produced after the maximum initialization time.

| Conditions                     | *Pre-Layout Delay (ns) | *Post-Layout Delay (ns) |
|--------------------------------|------------------------|-------------------------|
| TT, 27°C, 100% V <sub>DD</sub> | 6.22                   | 6.31                    |
| SF, 27°C, 100% V <sub>DD</sub> | 6.21                   | 6.31                    |
| FS, 27°C, 100% V <sub>DD</sub> | 6.22                   | 6.31                    |
| FF, 27°C, 100% V <sub>DD</sub> | 6.18                   | 6.25                    |
| SS, 27°C, 100% V <sub>DD</sub> | 6.28                   | 6.41                    |
| FF, -40C, 110% VDD             | 6.18                   | 6.25                    |
| SS, 80C, 90% VDD               | 6.28                   | 6.41                    |

Table 4-4: Initialization time of the 5-bit MUX TTBE at different PVT conditions.

\*Max. delay measured when all 63 phases are present.

# 4.4. Measurement Setup

In the following subsections, the measurement results of the proposed TASPI-TDC are presented. The TASPI-TDC chip, as shown in Figure 4-11, was fabricated using the general-purpose TSMC 65 nm CMOS process and occupies a total silicon footprint of 1 mm<sup>2</sup>. However, by extensively utilizing standard cell libraries provided by TSMC, the area dedicated to the core circuits is minimized to only 0.06 mm<sup>2</sup>. Overall, the proposed TASPI-TDC is competitive in size with the most compact topologies of recently proposed TA-TDCs and SPI-TDCs

Additionally, a testbench circuit consist of a 5-bit VCDL, 3 samplers and 1 R.Gen cells is incorporated into the final chip. This testbench is used to determine the biasing voltage of the DLL and evaluate the timing characteristics of the generated remainder pulses. The testbench circuit consume approximately 0.5 mW of power and occupies 0.01 mm<sup>2</sup> of chip area. As the testbench plays a critical role in the operation of the proposed TDC, its area and power consumption are factored into the final FoM calculations.



Figure 4-15: Annotated layout of the complete TDC in the TSMC 65 nm CMOS process. A custom PCB was designed for measuring the TDC. The clock (CLK) and STOP signals are generated using an Anritsu MP1652A digital pulse pattern generator. To avoid reflections from an impedance mismatch, the CLK and STOP signals are connected to the test board using SMA connectors and 50  $\Omega$  termination resistors. The complete measurement setup for TDC characterization is illustrated in Figure 4-12. Two Agilent E3645A DC power supplies supply the 1 V V<sub>DD</sub>, control voltage  $V_P$ , and  $V_N$ . A LeCroy WaveRunner 625Zi mixed-signal oscilloscope was used for collecting and saving the output of the TDC in digital format for further analysis. For all measurements, the oscilloscope was programmed using MATLAB in order to automate the measurement process.



Figure 4-16: Block diagram of the measurement setup used for the TDC characterization.

## 4.5. Measurement Results

The first measurement was conducted to assess the delay response of the coarse DLL. The approximate range for control voltage  $V_{CP}$  and  $V_{CN}$  is first determined by post-layout simulation. Then the actual  $V_P$  and  $V_N$  are found by sweeping this range until the phase difference between the 62-nd phase and CLK rising edge is below 15 ps.

After the coarse DLL biasing point,  $V_{CP}$  and  $V_{CN}$  of the TDC chip are found. A statistical code density test is performed to measure the TDC's performance characteristics, including the resolution, dynamic range, and nonlinearity performance. For the proposed 8-bit TDC, a statistical code density test consisting of 41 sweeps of 1 ns DR, which includes ~7,500 individual measurements. A total of 4 chips were tested and the chip conditions and number of measurements are listed in the below Table 4-2. A total of 7,500 measurements are carried out on the two functional chips.

| Chip ID | Chip Condition Number of Measurements |       | $V_{P}(mV)/V_{N}(mV)$ | Estimated corner |  |
|---------|---------------------------------------|-------|-----------------------|------------------|--|
| Chip 1  | Functional                            | 3,750 | 220 / 714             | TT               |  |
| Chip 4  | Functional                            | 3,750 | 260 / 658             | FS               |  |

Table 4-5. Tested chip conditions and number of measurements

The step width of the TDC response is calculated using Equations (5-1) [145], [146].  $\tau_i$  is the *i*<sup>th</sup> step-width,  $N_i$  is the number of counts observed for that code,  $N_{total}$  is the total number of counts. The full quantization characteristics of the TDC was plotted where the average step width gives the resolution, and the difference between the first and last codes gives the dynamic range. The difference between each step width of the TDC response and the designed LSB gives the DNL, and the differences between the ideal output code and the measured code gives the INL for each input delay.

$$\tau_i = \frac{DR \times N_i}{N_{total}} \tag{5-1}$$

The result of the statistical code density test for the entire is shown in Figure 4-13. The TDC achieves a resolution of 16.33 ps over a tested dynamic range of ~1 ns. Due to nonlinearity, the effective number of bits,  $N_{Linear}$  (Equation 4-3) will be smaller than the designed number of bits. The non-linearity of the TDC is plotted in Figure 4-14. The DNL was determined to be 0.55 LSB, while the INL was determined to be 0.94 LSB. The proposed TDC only achieved a 6.04  $N_{Linear}$  within the 1 ns tested range (7-bits).

$$N_{Linear} = b - \log_2(INL + 1) \tag{4-3}$$



Figure 4-17: Quantization characteristics of the 8-bit TDC.



Figure 4-18: Nonlinearity performance of the 8-bit TDC. Differential nonlinearity (DNL) is shown at the top, and the integrated nonlinearity (INL) is at the bottom.

Since the TDC computes the 8-bit output by combining the result of the 5-bit coarse stage and the 3-bit fine stage, the TDC response for the 5-bit coarse stage was also assessed. The step response of the 5-bit coarse stage is plotted in Figure 4-15, which shows the coarse-TDC response closely follows the expected response. The resolution (LSB) is reduced to 125 ps in this case. However, the nonlinearity is reduced. Figure 4-16 shows that the DNL and INL are improved to 0.04 LSB and 0.09 LSB, respectively. The nonlinearity difference between the 5-bit coarse TDC and the 8-bit full TDC shows the source of the nonlinearity lies between the remainder generation circuits and the time amplifier.



Figure 4-20: Nonlinearity performance of the 5-bit coarse TDC within the 1 ns testing range. Differential nonlinearity (DNL) is shown at the top, and the integrated nonlinearity (INL) is shown at the bottom.

# 4.6. Conclusion and Future Works

A summary of the TDC performance compared to recently proposed TA-TDC and SPI-

TDC is provided in Table 4-2. The proposed TDC achieves a LSB of 16.93 ps over a DR of 1.13 ns, with a total power consumption of 5.16 mW and silicon area of 0.03 mm<sup>2</sup>. The Fs of the proposed TDC was measured at 60.5 MHz, which is lower than the targeted 100 MHz. The proposed 8-bit TDC demonstrate a maximum INL of 0.94 LSB, resulting in an  $N_{Linear}$  of 6.04 bits. Further investigation into the nonlinearity shows that the primary source of INL are nonidealities of accumulated delay of the VCDL in both the coarse TDC and TA due to an approximately  $\pm$  3mV supply voltage variation.

|                                                     | Carimatto et      | Carimatto et    | Henderson et    | Manuzzato et    | Conca et al.,     | This Work           | This Work           |
|-----------------------------------------------------|-------------------|-----------------|-----------------|-----------------|-------------------|---------------------|---------------------|
| Ref.                                                | al., 2015 [147]   | al., 2018 [148] | al., 2019 [139] | al., 2019 [149] | 2020 [150]        | (Simulated)         | (Measured)          |
| Туре                                                | GRO               | PO              | GRO             | GRO             | VDL               | TASPI               | TASPI               |
| Tech.<br>(nm)                                       | 350 <sup>HV</sup> | 40              | 40              | 150             | 350 <sup>HV</sup> | 65                  | 65                  |
| Application                                         | Biomedical        | Biomedical      | Biomedical      | Biomedical      | Biomedical        | Biomedical          | Biomedical          |
| LSB                                                 | 48.5              | 40              | 33              | 80              | 78                | 15.63               | 16.93               |
| (ps)                                                |                   |                 |                 |                 |                   |                     |                     |
| DR                                                  | 6,360             | 1,000           | 135             | 81.8            | 10                | 1.125               | 1.135               |
| (ns)                                                |                   |                 |                 |                 |                   |                     |                     |
| INL<br>(LSB)                                        | 4                 | < 1             | 5.64            | 2.4             | 0.58              | 0.43 <sup>d</sup>   | 0.94                |
| N <sub>Linear</sub>                                 | 14.68             | 15              | 5.79            | 9.51            | 6.34              | 7.32                | 6.04                |
| Fs<br>(MHz)                                         | 40                | 80              | 0.186           | 0.2*            | 100               | 62.5                | 60.5                |
| Power<br>(mW)                                       | 0.5*              | 0.171           | 1*              | 1.9*            | 35*               | 4.99 °              | 5.16 °              |
| Area/10 <sup>3</sup><br>pixel<br>(mm <sup>2</sup> ) | 3.6               | 1.5*            | 1.69            | 0.025*          | 0.004             | 0.034 <sup>ab</sup> | 0.034 <sup>ab</sup> |
| FoM <sub>2</sub>                                    | 0.115             | 7.80            | 0.0006          | 0.022           | 15.79             | 54.83               | 21.67               |

Table 4-6: Comparison Table of TDC performance with selected publications.

Assume performance of the proposed TDC would not change when integrating with SPAD pixels.

Effective linear bits:  $N_{Linear} = Bits - log_2(INL + 1)$ . \* Estimated from figures and tables. **a**. Assume integrate with a 35 µm x 35 µm SPAD [88]. **b**. Include 0.01 mm2 testbench circuit. **c**. Include 0.5 mW (simulated) of testbench circuit power consumption. **d**. Estimated with the accumulated delay of the DLL at 1 ns DR. **e**. Expected max DR based LSB extracted from the measured 1 ns DR.

To evaluate the integrability of TDC with SPAD, FoM<sub>2</sub> in Equation 5-3 was adopted and modified from the FoM proposed in [8], [138]. Here we The area metric is normalized to per 1,000 pixels, which rewards high-Fs TDCs that can be shared across multiple SPAD pixels. The proposed TASPI-TDC has the highest FoM<sub>2</sub> compared to other 5 TDCs used in state-of-the-art SPAD-imager, assuming no performance degradation when integrated with actual SPAD pixels. However, its power consumption is the highest, primarily due to the TA and DLL. The major factor contributing to the high FoM of the proposed TDC is its best LSB and INL compared to all referenced work. Although the proposed TDC ranked  $3^{rd}$  in F<sub>s</sub> compared to state-of-the-art TDCs, it has the potential to operate at 125 MHz with improvements to the sampling logic design. Additionally, the silicon footprint of the proposed TDC can be further reduced by rework the DLL and TA design to reduce the number of testbench circuit required.

$$FoM_{2} = \frac{F_{S} (MHz)}{LSB \times INL \times Power (mW) \times Area/10^{3} pixels (mm^{2})}$$
(5-3)

In summary, the following improvements should be made to optimize the architecture of the proposed TASPI-TDC to expand its capabilities in a wider range of biomedical imaging applications.

- A system level mathematical model of the proposed TASPI-TDC structure should be developed to account for all performance parameters. As discussed in the previous sections, the unit delay of single GDE varies depending on factors such as the length of the delay line and other connected components such as samplers and R.Gen cells. Such system model would help identify component compatibility and reduce the design complexity of the GDE and VCDL.
- 2. Since the proposed TASPI-TDC aimed for an ultra-fine resolution, it paid the DR trade-off. Then proposed TDC has the lowest DR compared to other state-of-the-art TDCs. A high DR operating mode should be developed in future generations to meet the DR requirements for a broader range of biomedical imaging applications. The architecture of the proposed TDC could incorporate components such as clock cycle counters and GRO control logics for additional coarse interpolation to achieve a higher DR.
- 3. The proposed TDC architecture should be further optimized to TDC control logics. Currently, the operation of the TDC chip requires additional testbench circuits to determine the correct biasing conditions and timing parameters of core components. These testbench circuits increase the total silicon footprint and power consumption of the proposed TDC, which should be minimized in future generations.

- 4. For the current TASPI-TDC architecture, the number of fine bits is limited by the amplification factor of the pulse-train time-amplifier. It would be beneficial to implement a control logic to dynamically change the amplification factor of the proposed TDC depending on different operating conditions.
- 5. Out of the 4 tested chip, two chip suffer from biasing issue since the basing voltage was directly fed into the DL from a high-precision external voltage source. However, this is nonpractical in real-lift application. To further improve the TDC PVT-variation resilience, a more robust on-chip biasing circuit should be implemented. A combination of R-2R digital-to-analog converter (DAC) and threshold voltage based reference (VTR) can be used to generate biasing voltage for the VCDL independent of temperature and supply voltage variation are well suited for future generations of TASPI-TDC [142], [151].
- 6. Due to a knowledge gap in chip floor planning prior to the tape-out, the required space was underestimated, and not all sampling results are connected for conversion. The full 4 ns DR of the proposed TDC was not able to test. In future generations, proper floor planning should be conducted to accurately estimate the required chip space.
- 7. The integrability of the TDC with SPAD pixels should be further evaluated in two key areas. First, the maximum number of channels that the proposed TDC can support which directly relate to the size of the SPAD pixel array. Secondly, the noise performance of the TDC should be studied and optimized in the future to compete with state-of-the-art SPAD imagers for brain imaging applications.

# Chapter 5 Conclusions and Future Work

# 5.1. Conclusions

This thesis research focussed on the design and measurement of TA-SPI TDCs in standard CMOS technology for biomedical imaging applications. The avalanche nature of SPAD can be integrated seamlessly with TDC into SPAD-based imagers. TDC is responsible for converting the arrival time of incident photons detected by the SPAD into high-resolution digital codes in these images. These digital codes can be translated into ToF information, providing an additional axis for imaging applications such as PET and FLIM. Due to their compact form factor and integrability with modern-day CMOS technology, SPAD imagers used for high-performance implantable neural activity imaging became an active field of research.

The thesis starts with a review of state-of-the-art neural activity detection methods, from conventional electrophysiological methods using MEAs and Michigan-style probes to implantable neural imagers, which have emerged as a groundbreaking addition to neural activity imaging. Implantable probes and arrays equipped with SPAD-based imagers can see deeper brain regions and track neural activities in mobile subjects at hundreds of frames per second.

Chapter 3 discusses the fundamental concepts and performance metrics of TDC. The two inspirational TDC architectures for the proposed TASPI-TDC are introduced in detail. TA-TDC is proposed to overcome the common DR/resolution trade-off in traditional TDC by amplifying remainders prior to fine stage TDC. SPI-TDC was proposed to achieve high linearity, another priority metric for TDC with hardware redundancy.

The TDC design presented in Chapter 4 combines the benefits of both TA-TDC and SPI-TDC. The proposed TDC utilizes both DLL and 1-bit redundancy SPI-TDC to

significantly reduce the nonlinearity of coarse-stage TDC. The remainder from the coarse stage TDC is then amplified and passed into the fine stage TDC for high-resolution interpolation. The proposed design was fabricated in TSMC 65 nm standard CMOS technology with a designed resolution of 15 ps over a 2 ns dynamic range within a 0.02 mm<sup>2</sup> silicon area. The measured results of the fabricated TDC chip achieve an INL of 0.94 LSBrms and a 6.04 effective number of bits within the 1 ns measurable dynamic range. Future iterations of this TDC design should focus on improving compatibility with the measuring instruments for full dynamic range testing and linearity. Methods include incorporating SPI-TDC for the fine stage TDC and further layout optimization to reduce mismatch and jitters. Comparison between the proposed TDC with 2-bit redundancy TASPI-TDC can also reveal key performance trade-offs of the proposed TDC architecture.

## 5.2. Future Work

The ability to digit input time intervals with sub-nano seconds resolution at high frequency makes TDC the perfect on-chip data converter for SPAD-based imagers. SPAD-based imagers have the capability of providing ToF information, which has a wide range of applications in biomedical imaging. Our study of innovative SPAD-based biomedical imaging technologies and TDC architectures proposed in recent years identified several future challenges for our proposed TDC in biomedical imaging.

#### A. Integration with SPAD pixels

The proposed TDC is designed to work as data converters in SPAD imagers. Further optimization should be performed to integrate the proposed TDC with SPAD pixels of different specifications in the future. Recent works on multiplexed SPAD array-TDC setup can also be tested to understand the compatibility of the proposed TDC better for integration with PET imaging applications [152], [153]. Multiple SPAD pixels can share the same TDC with an additional multiplexing circuit to improve the overall fill factor of the images. In implantable neural imaging, multiplexed TDC can also reduce the power consumption of the images and create more power headroom for SPAD pixels.

The proposed TDC still need many design improvement for real-life deployment in

#### SPAD-imager based

#### B. Readout

High-performance SPAD imagers propose new design challenges for data readout. As the density of SPADs increases with the spatial resolution of imagers. Off-chip data channels can be easily overflowed in the worst-case scenario when all SPADs generate incident pulses for TDC to convert. A combination of high-speed embedded on-chip memory, such as random access memories (RAMs) and off-chip data storage, should be added for future SPAD-imagers using the proposed TDC [154].

### C. FPGA-Based TDCs

Another approach to ASIC TDC, field-programmable gate array (FPGA) based TDC, has risen recently. Compared to ASIC counterparts such as the one proposed in this thesis, FPGA TDC offers quick design iterations and a lower cost per design. Most FPGA TDCs use tapped delay line and were able to achieve picoseconds resolution [155], [156], [157].

Since delay cells in FPGA are not designed to maintain consistent unit delay in long delay lines. Recently proposed high-bit FPGA-TDC use techniques such as pseudo-segmented delay lines and time coding lines to reduce nonlinearity from clock skew and mismatches [155], [156], [157]Another major area of research for FPGA-based TDC is logical resource efficiency. Similar to ASIC TDC, the amount of logical resources occupied by TDC and peripheral memory units on the FPGA scales with the number of delay cells, which reduces the amount of resources for other data processing units. Potential research could be done to increase the number of logic circuits necessary with TDC for biomedical imaging.

# References

- [1] W. Jiang, Y. Chalich, and M. J. Deen, "Sensors for Positron Emission Tomography Applications," *Sensors*, vol. 19, no. 22, Art. no. 22, Jan. 2019, doi: 10.3390/s19225019.
- [2] Y. H. M.d and Y. Yamada, "Overview of diffuse optical tomography and its clinical applications," J. Biomed. Opt., vol. 21, no. 9, p. 091312, Jul. 2016, doi: 10.1117/1.JBO.21.9.091312.
- [3] L. C. Moreaux *et al.*, "Integrated Neurophotonics: Toward Dense Volumetric Interrogation of Brain Circuit Activity—at Depth and in Real Time," *Neuron*, vol. 108, no. 1, pp. 66–92, Oct. 2020, doi: 10.1016/j.neuron.2020.09.043.
- [4] J. Lecoq, N. Orlova, and B. F. Grewe, "Wide. Fast. Deep: Recent Advances in Multiphoton Microscopy of In Vivo Neuronal Activity," *J. Neurosci.*, vol. 39, no. 46, pp. 9042–9052, Nov. 2019, doi: 10.1523/JNEUROSCI.1527-18.2019.
- [5] P. J. Keller and M. B. Ahrens, "Visualizing Whole-Brain Activity and Development at the Single-Cell Level Using Light-Sheet Microscopy," *Neuron*, vol. 85, no. 3, pp. 462–483, Feb. 2015, doi: 10.1016/j.neuron.2014.12.039.
- [6] A. Zhou, B. C. Johnson, and R. Muller, "Toward true closed-loop neuromodulation: artifact-free recording during stimulation," *Curr. Opin. Neurobiol.*, vol. 50, pp. 119–127, Jun. 2018, doi: 10.1016/j.conb.2018.01.012.
- [7] J. Zhang *et al.*, "Integrated device for optical stimulation and spatiotemporal electrical recording of neural activity in light-sensitized brain tissue," *J. Neural Eng.*, vol. 6, no. 5, p. 055007, Sep. 2009, doi: 10.1088/1741-2560/6/5/055007.
- [8] R. Scott, W. Jiang, and M. J. Deen, "CMOS Time-to-Digital Converters for Biomedical Imaging Applications," *IEEE Rev. Biomed. Eng.*, 2021, doi: 10.1109/RBME.2021.3092197.
- [9] R. Scott, W. Jiang, X. Qian, and M. J. Deen, "A Multi-Time-Gated SPAD Array with Integrated Coarse TDCs," *Electronics*, vol. 11, no. 13, Art. no. 13, Jan. 2022, doi: 10.3390/electronics11132015.
- [10] L. H. C. Braga *et al.*, "A Fully Digital 8\,\times\,16 SiPM Array for PET Applications With Per-Pixel TDCs and Real-Time Energy Output," *IEEE J. Solid-State Circuits*, vol. 49, no. 1, pp. 301–314, Jan. 2014, doi: 10.1109/JSSC.2013.2284351.
- [11] S. Mandai and E. Charbon, "A 4 × 4 × 416 digital SiPM array with 192 TDCs for multiple high-resolution timestamp acquisition," *J. Instrum.*, vol. 8, no. 05, p. P05024, May 2013, doi: 10.1088/1748-0221/8/05/P05024.
- [12] V. C. Spanoudaki and C. S. Levin, "Photo-Detectors for Time of Flight Positron Emission Tomography (ToF-PET)," *Sensors*, vol. 10, no. 11, Art. no. 11, Nov. 2010, doi: 10.3390/s101110484.
- [13] D. A. Boas, A. M. Dale, and M. A. Franceschini, "Diffuse optical imaging of brain activation: approaches to optimizing image sensitivity, resolution, and accuracy,"

*NeuroImage*, vol. 23, pp. S275–S288, Jan. 2004, doi: 10.1016/j.neuroimage.2004.07.011.

- [14] S. Moazeni, K. Renehan, E. H. Pollmann, and K. L. Shepard, "An Integrated-Circuit Node for High-Spatiotemporal Resolution Time-Domain Near-Infrared Diffuse Optical Tomography Imaging Arrays," *IEEE J. Solid-State Circuits*, vol. 58, no. 5, pp. 1376–1385, May 2023, doi: 10.1109/JSSC.2022.3223854.
- [15] M. D. Wheelock, J. P. Culver, and A. T. Eggebrecht, "High-density diffuse optical tomography for imaging human brain function," *Rev. Sci. Instrum.*, vol. 90, no. 5, p. 051101, May 2019, doi: 10.1063/1.5086809.
- [16] T. Kim *et al.*, "Injectable, Cellular-Scale Optoelectronics with Applications for Wireless Optogenetics," *Science*, vol. 340, no. 6129, pp. 211–216, Apr. 2013, doi: 10.1126/science.1232437.
- [17] C. Lee, B. Johnson, T. Jung, and A. Molnar, "A 72 × 60 Angle-Sensitive SPAD Imaging Array for Lens-less FLIM," *Sensors*, vol. 16, no. 9, Art. no. 9, Sep. 2016, doi: 10.3390/s16091422.
- [18] X. Qian, W. Jiang, A. Elsharabasy, and M. J. Deen, "Modeling for Single-Photon Avalanche Diodes: State-of-the-Art and Research Challenges," *Sensors*, vol. 23, no. 7, Art. no. 7, Jan. 2023, doi: 10.3390/s23073412.
- [19] R. Granja, M. Santos, J. Guilherme, and N. Horta, "11.7b Time-To-Digital Converter with 0.82ps resolution in 130nm CMOS Technology," in 2018 14th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), IEEE, Jul. 2018, pp. 29–32. doi: 10.1109/PRIME.2018.8430374.
- [20] J.-C. C. Lai and T.-Y. Y. Hsu, "Cost-Effective Time-to-Digital Converter Using Time-Residue Feedback," *IEEE Trans. Ind. Electron.*, vol. 64, no. 6, pp. 4690–4700, Jun. 2017, doi: 10.1109/TIE.2017.2669883.
- [21] S.-J. Kim, W. Kim, M. Song, J. Kim, T. Kim, and H. Park, "15.5 A 0.6V 1.17ps PVT-tolerant and synthesizable time-to-digital converter using stochastic phase interpolation with 16× spatial redundancy in 14nm FinFET technology," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. 1–3. doi: 10.1109/ISSCC.2015.7063035.
- [22] F. A. C. Azevedo *et al.*, "Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain," *J. Comp. Neurol.*, vol. 513, no. 5, pp. 532–541, 2009, doi: 10.1002/cne.21974.
- [23] D. Petit, J.-F. Gagnon, M. L. Fantini, L. Ferini-Strambi, and J. Montplaisir, "Sleep and quantitative EEG in neurodegenerative disorders," *J. Psychosom. Res.*, vol. 56, no. 5, pp. 487–496, May 2004, doi: 10.1016/j.jpsychores.2004.02.001.
- [24] R. Bauer, M. Fels, M. Vukelić, U. Ziemann, and A. Gharabaghi, "Bridging the gap between motor imagery and motor execution with a brain-robot interface," *NeuroImage*, vol. 108, pp. 319–327, Mar. 2015, doi: 10.1016/j.neuroimage.2014.12.026.
- [25] G. N. Angotzi *et al.*, "SiNAPS: An implantable active pixel sensor CMOS-probe for simultaneous large-scale neural recordings," *Biosens. Bioelectron.*, vol. 126, pp. 355–364, Feb. 2019, doi: 10.1016/j.bios.2018.10.032.

- [26] S.-Y. Park, J. Cho, K. Na, and E. Yoon, "Modular 128-Channel Δ ΔΣ Analog Front-End Architecture Using Spectrum Equalization Scheme for 1024-Channel 3-D Neural Recording Microsystems," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 501–514, Feb. 2018, doi: 10.1109/JSSC.2017.2764053.
- [27] B. C. Raducanu *et al.*, "Time Multiplexed Active Neural Probe with 1356 Parallel Recording Sites," *Sensors*, vol. 17, no. 10, Art. no. 10, Oct. 2017, doi: 10.3390/s17102388.
- [28] J. J. Jun *et al.*, "Fully integrated silicon probes for high-density recording of neural activity," *Nature*, vol. 551, no. 7679, Art. no. 7679, Nov. 2017, doi: 10.1038/nature24636.
- [29] C. Mora Lopez *et al.*, "A Neural Probe With Up to 966 Electrodes and Up to 384 Configurable Channels in 0.13 μm SOI CMOS," *IEEE Trans. Biomed. Circuits Syst.*, vol. 11, no. 3, pp. 510–522, Jun. 2017, doi: 10.1109/TBCAS.2016.2646901.
- [30] M. Leber *et al.*, "Advances in Penetrating Multichannel Microelectrodes Based on the Utah Array Platform," in *Neural Interface: Frontiers and Applications*, X. Zheng, Ed., in Advances in Experimental Medicine and Biology., Singapore: Springer, 2019, pp. 1–40. doi: 10.1007/978-981-13-2050-7\_1.
- [31] A. C. Paulk *et al.*, "Large-scale neural recordings with single neuron resolution using Neuropixels probes in human cortex," *Nat. Neurosci.*, vol. 25, no. 2, Art. no. 2, Feb. 2022, doi: 10.1038/s41593-021-00997-0.
- [32] J. E. Chung *et al.*, "High-Density, Long-Lasting, and Multi-region Electrophysiological Recordings Using Polymer Electrode Arrays," *Neuron*, vol. 101, no. 1, pp. 21-31.e5, Jan. 2019, doi: 10.1016/j.neuron.2018.11.002.
- [33] R. J. J. van Daal *et al.*, "Implantation of Neuropixels probes for chronic recording of neuronal activity in freely behaving mice and rats," *Nat. Protoc.*, vol. 16, no. 7, Art. no. 7, Jul. 2021, doi: 10.1038/s41596-021-00539-9.
- [34] T. Milekovic *et al.*, "Stable long-term BCI-enabled communication in ALS and locked-in syndrome using LFP signals," *J. Neurophysiol.*, vol. 120, no. 1, pp. 343– 360, Jul. 2018, doi: 10.1152/jn.00493.2017.
- [35] F. R. Willett, D. T. Avansino, L. R. Hochberg, J. M. Henderson, and K. V. Shenoy, "High-performance brain-to-text communication via handwriting," *Nature*, vol. 593, no. 7858, Art. no. 7858, May 2021, doi: 10.1038/s41586-021-03506-2.
- [36] F. Luo, Y. Wei, Z. Wang, M. Luo, and J. Hu, "Genetically Encoded Neural Activity Indicators," *Brain Sci. Adv.*, vol. 4, no. 1, pp. 1–15, Sep. 2018, doi: 10.26599/BSA.2018.9050007.
- [37] J. Nakanishi, T. Takarada, S. Yunoki, Y. Kikuchi, and M. Maeda, "FRET-based monitoring of conformational change of the β2 adrenergic receptor in living cells," *Biochem. Biophys. Res. Commun.*, vol. 343, no. 4, pp. 1191–1196, May 2006, doi: 10.1016/j.bbrc.2006.03.064.
- [38] M. Z. Lin and M. J. Schnitzer, "Genetically encoded indicators of neuronal activity," *Nat. Neurosci.*, vol. 19, no. 9, pp. 1142–1153, 2016, doi: 10.1038/nn.4359.
- [39] J. Joshi, M. Rubart, and W. Zhu, "Optogenetics: Background, Methodological Advances and Potential Applications for Cardiovascular Research and Medicine,"

*Front. Bioeng. Biotechnol.*, vol. 7, 2020, Accessed: May 08, 2023. [Online]. Available: https://www.frontiersin.org/articles/10.3389/fbioe.2019.00466

- [40] D. Krueger, E. Izquierdo, R. Viswanathan, J. Hartmann, C. Pallares Cartes, and S. De Renzis, "Principles and applications of optogenetics in developmental biology," *Development*, vol. 146, no. 20, p. dev175067, Oct. 2019, doi: 10.1242/dev.175067.
- [41] L. Zou *et al.*, "Self-assembled multifunctional neural probes for precise integration of optogenetics and electrophysiology," *Nat. Commun.*, vol. 12, no. 1, Art. no. 1, Oct. 2021, doi: 10.1038/s41467-021-26168-0.
- [42] A. Kazemipour *et al.*, "Kilohertz frame-rate two-photon tomography," *Nat. Methods*, vol. 16, no. 8, Art. no. 8, Aug. 2019, doi: 10.1038/s41592-019-0493-9.
- [43] I. Veilleux, J. A. Spencer, D. P. Biss, D. Cote, and C. P. Lin, "In Vivo Cell Tracking With Video Rate Multimodality Laser Scanning Microscopy," *IEEE J. Sel. Top. Quantum Electron.*, vol. 14, no. 1, pp. 10–18, Jan. 2008, doi: 10.1109/JSTQE.2007.912751.
- [44] K. M. N. S. Nadella *et al.*, "Random-access scanning microscopy for 3D imaging in awake behaving animals," *Nat. Methods*, vol. 13, no. 12, Art. no. 12, Dec. 2016, doi: 10.1038/nmeth.4033.
- [45] V. Voleti *et al.*, "Real-time volumetric microscopy of in vivo dynamics and largescale samples with SCAPE 2.0," *Nat. Methods*, vol. 16, no. 10, Art. no. 10, Oct. 2019, doi: 10.1038/s41592-019-0579-4.
- [46] A. J. Taal, C. Lee, J. Choi, B. Hellenkamp, and K. L. Shepard, "Toward implantable devices for angle-sensitive, lens-less, multifluorescent, single-photon lifetime imaging in the brain using Fabry–Perot and absorptive color filters," *Light Sci. Appl.*, vol. 11, no. 1, Art. no. 1, Jan. 2022, doi: 10.1038/s41377-022-00708-9.
- [47] S. Moazeni *et al.*, "A Mechanically Flexible, Implantable Neural Interface for Computational Imaging and Optogenetic Stimulation over 5.4×5.4mm2 FoV," *IEEE Trans. Biomed. Circuits Syst.*, vol. 15, no. 6, pp. 1295–1305, Dec. 2021, doi: 10.1109/TBCAS.2021.3138334.
- [48] J. Choi et al., "A 512-Pixel, 51-kHz-Frame-Rate, Dual-Shank, Lens-Less, Filter-Less Single-Photon Avalanche Diode CMOS Neural Imaging Probe," *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 2957–2968, Nov. 2019, doi: 10.1109/JSSC.2019.2941529.
- [49] O. Shimomura, "Structure of the chromophore of Aequorea green fluorescent protein," *FEBS Lett.*, vol. 104, no. 2, pp. 220–222, 1979, doi: 10.1016/0014-5793(79)80818-2.
- [50] M. Chalfie, Y. Tu, G. Euskirchen, W. W. Ward, and D. C. Prasher, "Green Fluorescent Protein as a Marker for Gene Expression," *Science*, vol. 263, no. 5148, pp. 802–805, Feb. 1994, doi: 10.1126/science.8303295.
- [51] R. Heim, A. B. Cubitt, and R. Y. Tsien, "Improved green fluorescence," *Nature*, vol. 373, no. 6516, Art. no. 6516, Feb. 1995, doi: 10.1038/373663b0.
- [52] A. Attardo, J. E. Fitzgerald, and M. J. Schnitzer, "Impermanence of dendritic spines in live adult CA1 hippocampus," *Nature*, vol. 523, no. 7562, Art. no. 7562, Jul. 2015, doi: 10.1038/nature14467.

- [53] A. Holtmaat *et al.*, "Long-term, high-resolution imaging in the mouse neocortex through a chronic cranial window," *Nat. Protoc.*, vol. 4, no. 8, Art. no. 8, Aug. 2009, doi: 10.1038/nprot.2009.89.
- [54] J. T. Trachtenberg *et al.*, "Long-term in vivo imaging of experience-dependent synaptic plasticity in adult cortex," *Nature*, vol. 420, no. 6917, Art. no. 6917, Dec. 2002, doi: 10.1038/nature01273.
- [55] E. S. Boyden, F. Zhang, E. Bamberg, G. Nagel, and K. Deisseroth, "Millisecondtimescale, genetically targeted optical control of neural activity," *Nat. Neurosci.*, vol. 8, no. 9, Art. no. 9, Sep. 2005, doi: 10.1038/nn1525.
- [56] R. Prevedel *et al.*, "Simultaneous whole-animal 3D imaging of neuronal activity using light-field microscopy," *Nat. Methods*, vol. 11, no. 7, Art. no. 7, Jul. 2014, doi: 10.1038/nmeth.2964.
- [57] M. Broxton *et al.*, "Wave optics theory and 3-D deconvolution for the light field microscope," *Opt. Express*, vol. 21, no. 21, pp. 25418–25439, Oct. 2013, doi: 10.1364/OE.21.025418.
- [58] J. Lecoq *et al.*, "Visualizing mammalian brain area interactions by dual-axis twophoton calcium imaging," *Nat. Neurosci.*, vol. 17, no. 12, Art. no. 12, Dec. 2014, doi: 10.1038/nn.3867.
- [59] W. Denk, J. H. Strickler, and W. W. Webb, "Two-Photon Laser Scanning Fluorescence Microscopy," *Science*, vol. 248, no. 4951, pp. 73–76, Apr. 1990, doi: 10.1126/science.2321027.
- [60] J. W. Lichtman and W. Denk, "The Big and the Small: Challenges of Imaging the Brain's Circuits," *Science*, vol. 334, no. 6056, pp. 618–623, Nov. 2011, doi: 10.1126/science.1209168.
- [61] J. T. Trachtenberg *et al.*, "Long-term in vivo imaging of experience-dependent synaptic plasticity in adult cortex," *Nature*, vol. 420, no. 6917, Art. no. 6917, Dec. 2002, doi: 10.1038/nature01273.
- [62] B. F. Grewe, D. Langer, H. Kasper, B. M. Kampa, and F. Helmchen, "High-speed in vivo calcium imaging reveals neuronal network activity with near-millisecond precision," *Nat. Methods*, vol. 7, no. 5, Art. no. 5, May 2010, doi: 10.1038/nmeth.1453.
- [63] V. Iyer, T. M. Hoogland, and P. Saggau, "Fast Functional Imaging of Single Neurons Using Random-Access Multiphoton (RAMP) Microscopy," *J. Neurophysiol.*, vol. 95, no. 1, pp. 535–545, Jan. 2006, doi: 10.1152/jn.00865.2005.
- [64] R. J. Cotton, E. Froudarakis, P. Storer, P. Saggau, and A. Tolias, "Three-dimensional mapping of microcircuit correlation structure," *Front. Neural Circuits*, vol. 7, 2013, Accessed: Mar. 20, 2023. [Online]. Available: https://www.frontiersin.org/articles/10.3389/fncir.2013.00151
- [65] N. J. Sofroniew, D. Flickinger, J. King, and K. Svoboda, "A large field of view twophoton mesoscope with subcellular resolution for in vivo imaging," *eLife*, vol. 5, p. e14472, Jun. 2016, doi: 10.7554/eLife.14472.
- [66] A. Cheng, J. T. Gonçalves, P. Golshani, K. Arisaka, and C. Portera-Cailliau, "Simultaneous two-photon calcium imaging at different depths with spatiotemporal

multiplexing," *Nat. Methods*, vol. 8, no. 2, Art. no. 2, Feb. 2011, doi: 10.1038/nmeth.1552.

- [67] J. N. Stirman, I. T. Smith, M. W. Kudenov, and S. L. Smith, "Wide field-of-view, multi-region, two-photon imaging of neuronal activity in the mammalian brain," *Nat. Biotechnol.*, vol. 34, no. 8, pp. 857–862, 2016, doi: 10.1038/nbt.3594.
- [68] M. Oheim, E. Beaurepaire, E. Chaigneau, J. Mertz, and S. Charpak, "Two-photon microscopy in brain tissue: parameters influencing the imaging depth," *J. Neurosci. Methods*, vol. 111, no. 1, pp. 29–37, Oct. 2001, doi: 10.1016/S0165-0270(01)00438-1.
- [69] W. Yang *et al.*, "Simultaneous Multi-plane Imaging of Neural Circuits," *Neuron*, vol. 89, no. 2, pp. 269–284, Jan. 2016, doi: 10.1016/j.neuron.2015.12.012.
- [70] J. Huisken, J. Swoger, F. Del Bene, J. Wittbrodt, and E. H. K. Stelzer, "Optical Sectioning Deep Inside Live Embryos by Selective Plane Illumination Microscopy," *Science*, vol. 305, no. 5686, pp. 1007–1009, Aug. 2004, doi: 10.1126/science.1100035.
- [71] R. M. Power and J. Huisken, "A guide to light-sheet fluorescence microscopy for multiscale imaging," *Nat. Methods*, vol. 14, no. 4, Art. no. 4, Apr. 2017, doi: 10.1038/nmeth.4224.
- [72] T. F. Holekamp, D. Turaga, and T. E. Holy, "Fast Three-Dimensional Fluorescence Imaging of Activity in Neural Populations by Objective-Coupled Planar Illumination Microscopy," *Neuron*, vol. 57, no. 5, pp. 661–672, Mar. 2008, doi: 10.1016/j.neuron.2008.01.011.
- [73] P. J. Keller, A. D. Schmidt, J. Wittbrodt, and E. H. K. Stelzer, "Reconstruction of Zebrafish Early Embryonic Development by Scanned Light Sheet Microscopy," *Science*, vol. 322, no. 5904, pp. 1065–1069, Nov. 2008, doi: 10.1126/science.1162493.
- [74] R. Tomer, K. Khairy, F. Amat, and P. J. Keller, "Quantitative high-speed imaging of entire developing embryos with simultaneous multiview light-sheet microscopy," *Nat. Methods*, vol. 9, no. 7, Art. no. 7, Jul. 2012, doi: 10.1038/nmeth.2062.
- [75] M. B. Ahrens, M. B. Orger, D. N. Robson, J. M. Li, and P. J. Keller, "Whole-brain functional imaging at cellular resolution using light-sheet microscopy," *Nat. Methods*, vol. 10, no. 5, pp. 413–420, 2013, doi: 10.1038/nmeth.2434.
- [76] X. Chen *et al.*, "Brain-wide Organization of Neuronal Activity and Convergent Sensorimotor Transformations in Larval Zebrafish," *Neuron*, vol. 100, no. 4, pp. 876-890.e5, Nov. 2018, doi: 10.1016/j.neuron.2018.09.042.
- [77] E. A. Susaki *et al.*, "Whole-Brain Imaging with Single-Cell Resolution Using Chemical Cocktails and Computational Analysis," *Cell*, vol. 157, no. 3, pp. 726– 739, Apr. 2014, doi: 10.1016/j.cell.2014.03.042.
- [78] R. Tomer, L. Ye, B. Hsueh, and K. Deisseroth, "Advanced CLARITY for rapid and high-resolution imaging of intact tissues," *Nat. Protoc.*, vol. 9, no. 7, Art. no. 7, Jul. 2014, doi: 10.1038/nprot.2014.123.
- [79] M. B. Bouchard *et al.*, "Swept confocally-aligned planar excitation (SCAPE) microscopy for high-speed volumetric imaging of behaving organisms," *Nat. Photonics*, vol. 9, no. 2, Art. no. 2, Feb. 2015, doi: 10.1038/nphoton.2014.323.

- [80] E. M. C. Hillman, V. Voleti, W. Li, and H. Yu, "Light-Sheet Microscopy in Neuroscience," Annu. Rev. Neurosci., vol. 42, no. 1, pp. 295–313, 2019, doi: 10.1146/annurev-neuro-070918-050357.
- [81] M. Wang, C. Wu, D. Sinefeld, B. Li, F. Xia, and C. Xu, "Comparing the effective attenuation lengths for long wavelength in vivo imaging of the mouse brain," *Biomed. Opt. Express*, vol. 9, no. 8, pp. 3534–3543, 2018, doi: 10.1364/BOE.9.003534.
- [82] D. G. Ouzounov *et al.*, "In vivo three-photon imaging of activity of GCaMP6-labeled neurons deep in intact mouse brain," *Nat. Methods*, vol. 14, no. 4, Art. no. 4, Apr. 2017, doi: 10.1038/nmeth.4183.
- [83] M. Yildirim, H. Sugihara, P. T. C. So, and M. Sur, "Functional imaging of visual cortical layers and subplate in awake mice with optimized three-photon microscopy," *Nat. Commun.*, vol. 10, no. 1, Art. no. 1, Jan. 2019, doi: 10.1038/s41467-018-08179-6.
- [84] K. Podgorski and G. Ranganathan, "Brain heating induced by near-infrared lasers during multiphoton microscopy," *J. Neurophysiol.*, vol. 116, no. 3, pp. 1012–1023, Sep. 2016, doi: 10.1152/jn.00275.2016.
- [85] W. Jiang, Y. Chalich, and M. J. Deen, "Sensors for Positron Emission Tomography Applications," *Sensors*, vol. 19, no. 22, Art. no. 22, Jan. 2019, doi: 10.3390/s19225019.
- [86] W. Jiang, Y. Chalich, R. Scott, and M. J. Deen, "Time-Gated and Multi-Junction SPADs in Standard 65 nm CMOS Technology," *IEEE Sens. J.*, vol. 21, no. 10, pp. 12092–12103, May 2021, doi: 10.1109/JSEN.2021.3063319.
- [87] J. Choi et al., "Fully Integrated Time-Gated 3D Fluorescence Imager for Deep Neural Imaging," *IEEE Trans. Biomed. Circuits Syst.*, vol. 14, no. 4, pp. 636–645, Aug. 2020, doi: 10.1109/TBCAS.2020.3008513.
- [88] E. H. Pollmann, Y. Gilhotra, H. Yin, and K. L. Shepard, "Fully Implantable 192×256 SPAD Sensor with Global-Shutter and Micro-LEDs for Bidirectional Subdural Optical Brain-Computer Interfaces," pp. 205–208, Nov. 2022, doi: 10.1109/ESSCIRC55480.2022.9911350.
- [89] S. Kim, P. Tathireddy, R. A. Normann, and F. Solzbacher, "Thermal impact of an active 3-D microelectrode array implanted in the brain," *IEEE Trans. Neural Syst. Rehabil. Eng.*, vol. 15, no. 4, pp. 493–501, Dec. 2007, doi: 10.1109/TNSRE.2007.908429.
- [90] H. Wang *et al.*, "Brain temperature and its fundamental properties: a review for clinical neuroscientists," *Front. Neurosci.*, vol. 8, 2014, Accessed: May 08, 2023.
   [Online]. Available: https://www.frontiersin.org/articles/10.3389/fnins.2014.00307
- [91] B. Fischl and A. M. Dale, "Measuring the thickness of the human cerebral cortex from magnetic resonance images," *Proc. Natl. Acad. Sci.*, vol. 97, no. 20, pp. 11050– 11055, Sep. 2000, doi: 10.1073/pnas.200033797.
- [92] H. Zhao, "Recent Progress of Development of Optogenetic Implantable Neural Probes," Int. J. Mol. Sci., vol. 18, no. 8, Art. no. 8, Aug. 2017, doi: 10.3390/ijms18081751.

- [93] W. Jiang, R. Scott, and M. J. Deen, "Differential Quench and Reset Circuit for Single-Photon Avalanche Diodes," *J. Light. Technol.*, vol. 39, no. 22, pp. 7334– 7342, Nov. 2021, doi: 10.1109/JLT.2021.3111119.
- [94] W. Jiang, R. Scott, and M. J. Deen, "High-Speed Active Quench and Reset Circuit for SPAD in a Standard 65 nm CMOS Technology," *IEEE Photonics Technol. Lett.*, vol. 33, no. 24, pp. 1431–1434, Dec. 2021, doi: 10.1109/LPT.2021.3124989.
- [95] D. Yatsenko, L. C. Moreaux, J. Choi, A. S. Tolias, K. L. Shepard, and M. L. Roukes, "Signal separability in integrated neurophotonics," Oct. 05, 2020, *bioRxiv*. doi: 10.1101/2020.09.27.315556.
- [96] J.-W. Jeong *et al.*, "Wireless Optofluidic Systems for Programmable In Vivo Pharmacology and Optogenetics," *Cell*, vol. 162, no. 3, pp. 662–674, Jul. 2015, doi: 10.1016/j.cell.2015.06.058.
- [97] Y. N. Kang, N. Chou, J.-W. Jang, H. K. Choe, and S. Kim, "A 3D flexible neural interface based on a microfluidic interconnection cable capable of chemical delivery," *Microsyst. Nanoeng.*, vol. 7, no. 1, Art. no. 1, Aug. 2021, doi: 10.1038/s41378-021-00295-6.
- [98] A. N. Zorzos, C. G. Fonstad, J. Scholvin, and E. S. Boyden, "Three-dimensional multiwaveguide probe array for light delivery to distributed brain circuits," *Opt. Lett. Vol 37 Issue 23 Pp 4841-4843*, vol. 37, no. 23, pp. 4841–4843, Dec. 2012, doi: 10.1364/OL.37.004841.
- [99] K. E. Parker *et al.*, "Customizable, wireless and implantable neural probe design and fabrication via 3D printing," *Nat. Protoc.*, vol. 18, no. 1, Art. no. 1, Jan. 2023, doi: 10.1038/s41596-022-00758-8.
- [100] C. Keum, C. Murawski, E. Archer, S. Kwon, A. Mischok, and M. C. Gather, "A substrateless, flexible, and water-resistant organic light-emitting diode," *Nat. Commun.*, vol. 11, no. 1, Art. no. 1, Dec. 2020, doi: 10.1038/s41467-020-20016-3.
- [101] J. E. Chung *et al.*, "A Fully Automated Approach to Spike Sorting," *Neuron*, vol. 95, no. 6, pp. 1381-1394.e6, Sep. 2017, doi: 10.1016/j.neuron.2017.08.030.
- [102] P. Yger *et al.*, "A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo," *eLife*, vol. 7, p. e34518, Mar. 2018, doi: 10.7554/eLife.34518.
- [103] M. Saif-ur-Rehman *et al.*, "SpikeDeep-classifier: a deep-learning based fully automatic offline spike sorting algorithm," *J. Neural Eng.*, vol. 18, no. 1, p. 016009, Feb. 2021, doi: 10.1088/1741-2552/abc8d4.
- [104] F. Xing, Y. Xie, H. Su, F. Liu, and L. Yang, "Deep Learning in Microscopy Image Analysis: A Survey," *IEEE Trans. Neural Netw. Learn. Syst.*, vol. 29, no. 10, pp. 4550–4568, Oct. 2018, doi: 10.1109/TNNLS.2017.2766168.
- [105] A. Abrol *et al.*, "Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning," *Nat. Commun.*, vol. 12, no. 1, Art. no. 1, Jan. 2021, doi: 10.1038/s41467-020-20655-6.
- [106] G. Kook, S. W. Lee, H. C. Lee, I.-J. Cho, and H. J. Lee, "Neural Probes for Chronic Applications," *Micromachines*, vol. 7, no. 10, Art. no. 10, Oct. 2016, doi: 10.3390/mi7100179.

- [107] R. B. Staszewski, S. Vemulapalli, P. Vallur, J. Wallberg, and P. T. Balsara, "1.3 V 20 ps time-to-digital converter for frequency synthesis in 90-nm CMOS," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 53, no. 3, pp. 220–224, Mar. 2006, doi: 10.1109/TCSII.2005.858754.
- [108] O. Bourrion and L. Gallin-Martel, "An integrated CMOS time-to-digital converter for coincidence detection in a liquid xenon PET prototype," *Nucl. Instrum. Methods Phys. Res. Sect. Accel. Spectrometers Detect. Assoc. Equip.*, vol. 563, no. 1, pp. 100– 103, Jul. 2006, doi: 10.1016/j.nima.2006.01.071.
- [109] M. Zanuso, P. Madoglio, S. Levantino, C. Samori, and A. L. A. L. Lacaita, "Timeto-Digital Converter for Frequency Synthesis Based on a Digital Bang-Bang DLL," *IEEE Trans. Circuits Syst. Regul. Pap.*, vol. 57, no. 3, pp. 548–555, Mar. 2010, doi: 10.1109/TCSI.2009.2023945.
- [110] M. Kanoun, M. W. Ben Attouch, Y. Bérubé-Lauzière, and R. Fontaine, "A 10-Bit, 12 ps Resolution CMOS Time-to-Digital Converter Dedicated to Ultra-Fast Optical Timing Applications," *Circuits Syst. Signal Process.*, vol. 34, no. 4, pp. 1129–1148, 2015, doi: 10.1007/s00034-014-9901-7.
- [111] M. Zhang, C.-H. Chan, Y. Zhu, and R. P. Martins, "3.5 A 0.6V 13b 20MS/s Two-Step TDC-Assisted SAR ADC with PVT Tracking and Speed-Enhanced Techniques," in 2019 IEEE International Solid-State Circuits Conference - (ISSCC), Feb. 2019, pp. 66–68. doi: 10.1109/ISSCC.2019.8662350.
- [112] R. Enomoto, T. Iizuka, T. Koga, T. Nakura, and K. Asada, "A 16-bit 2.0-ps Resolution Two-Step TDC in 0.18- μ m CMOS Utilizing Pulse-Shrinking Fine Stage With Built-In Coarse Gain Calibration," *IEEE Trans. Very Large Scale Integr. VLSI Syst.*, vol. 27, no. 1, pp. 11–19, Jan. 2019, doi: 10.1109/TVLSI.2018.2867505.
- [113] T. E. Rahkonen and J. T. Kostamovaara, "The use of stabilized CMOS delay lines for the digitization of short time intervals," *IEEE J. Solid-State Circuits*, vol. 28, no. 8, pp. 887–894, Aug. 1993, doi: 10.1109/4.231325.
- [114] D. M. Santos, S. F. Dow, J. M. Flasck, and M. E. Levi, "A CMOS delay locked loop and sub-nanosecond time-to-digital converter chip," *IEEE Trans. Nucl. Sci.*, vol. 43, no. 3, pp. 1717–1719, Jun. 1996, doi: 10.1109/23.507177.
- [115] J. G. Maneatis, "Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques".
- [116] B. Razavi, "The Delay-Locked Loop [A Circuit for All Seasons]," *IEEE Solid-State Circuits Mag.*, vol. 10, no. 3, pp. 9–15, 2018, doi: 10.1109/MSSC.2018.2844615.
- [117] T. E. Rahkonen and J. T. Kostamovaara, "The use of stabilized CMOS delay lines for the digitization of short time intervals," *IEEE J. Solid-State Circuits*, vol. 28, no. 8, pp. 887–894, Aug. 1993, doi: 10.1109/4.231325.
- [118] N. U. Andersson and M. Vesterbacka, "A Vernier Time-to-Digital Converter With Delay Latch Chain Architecture," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 61, no. 10, pp. 773–777, Oct. 2014, doi: 10.1109/TCSII.2014.2345289.
- [119] P. Dudek, S. Szczepanski, and J. V. Hatfield, "A high-resolution CMOS time-todigital converter utilizing a Vernier delay line," *IEEE J. Solid-State Circuits*, vol. 35, no. 2, pp. 240–247, Feb. 2000, doi: 10.1109/4.823449.

- [120] B. Markovic, S. Tisa, F. A. Villa, A. Tosi, and F. Zappa, "A High-Linearity, 17 ps Precision Time-to-Digital Converter Based on a Single-Stage Vernier Delay Loop Fine Interpolation," *IEEE Trans. Circuits Syst. Regul. Pap.*, vol. 60, no. 3, pp. 557– 569, Mar. 2013, doi: 10.1109/TCSI.2012.2215737.
- [121] Z. Cheng, M. J. Deen, and H. Peng, "A Low-Power Gateable Vernier Ring Oscillator Time-to-Digital Converter for Biomedical Imaging Applications," *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 2, pp. 445–454, Apr. 2016, doi: 10.1109/TBCAS.2015.2434957.
- [122] J. Yu, F. F. Dai, and R. C. Jaeger, "A 12-Bit Vernier Ring Time-to-Digital Converter in 0.13 \mu\hbox m CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 830–842, Apr. 2010, doi: 10.1109/JSSC.2010.2040306.
- [123] P. Lu, A. Liscidini, and P. Andreani, "A 3.6 mW, 90 nm CMOS Gated-Vernier Timeto-Digital Converter With an Equivalent Resolution of 3.2 ps," *IEEE J. Solid-State Circuits*, vol. 47, no. 7, pp. 1626–1635, Jul. 2012, doi: 10.1109/JSSC.2012.2191676.
- [124] T. Iizuka, T. Koga, T. Nakura, and K. Asada, "A fine-resolution pulse-shrinking time-to-digital converter with completion detection utilizing built-in offset pulse," in 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC), Nov. 2016, pp. 313– 316. doi: 10.1109/ASSCC.2016.7844198.
- [125] R. Enomoto, T. Iizuka, T. Koga, T. Nakura, and K. Asada, "A 16-bit 2.0-ps Resolution Two-Step TDC in 0.18- μ m CMOS Utilizing Pulse-Shrinking Fine Stage With Built-In Coarse Gain Calibration," *IEEE Trans. Very Large Scale Integr. VLSI Syst.*, vol. 27, no. 1, pp. 11–19, Jan. 2019, doi: 10.1109/TVLSI.2018.2867505.
- [126] E. Raisanen-Ruotsalainen, T. Rahkonen, and J. Kostamovaara, "A low-power CMOS time-to-digital converter," *IEEE J. Solid-State Circuits*, vol. 30, no. 9, pp. 984–990, Sep. 1995, doi: 10.1109/4.406397.
- [127] S.-J. Kim, W. Kim, M. Song, J. Kim, T. Kim, and H. Park, "15.5 A 0.6V 1.17ps PVT-tolerant and synthesizable time-to-digital converter using stochastic phase interpolation with 16× spatial redundancy in 14nm FinFET technology," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, Feb. 2015, pp. 1–3. doi: 10.1109/ISSCC.2015.7063035.
- [128] R. Enomoto, T. Iizuka, T. Koga, T. Nakura, and K. Asada, "A 16-bit 2.0-ps Resolution Two-Step TDC in 0.18- μ m CMOS Utilizing Pulse-Shrinking Fine Stage With Built-In Coarse Gain Calibration," *IEEE Trans. Very Large Scale Integr. VLSI Syst.*, vol. 27, no. 1, pp. 11–19, Jan. 2019, doi: 10.1109/TVLSI.2018.2867505.
- [129] K. Kim, Y.-H. H. Kim, W. Yu, and S. Cho, "A 7 bit, 3.75 ps Resolution Two-Step Time-to-Digital Converter in 65 nm CMOS Using Pulse-Train Time Amplifier," *IEEE J. Solid-State Circuits*, vol. 48, no. 4, pp. 1009–1017, Apr. 2013, doi: 10.1109/JSSC.2013.2237996.
- [130] K. Kim, W. Yu, and S. Cho, "A 9 bit, 1.12 ps resolution 2.5 b/stage pipelined time-to-digital converter in 65 nm CMOS using time-register," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 1007–1016, Apr. 2014, doi: 10.1109/JSSC.2013.2297412.
- [131] T. Nakura, S. Mandai, M. Ikeda, and K. Asada, "Time difference amplifier using closed-loop gain control," in 2009 Symposium on VLSI Circuits, Jun. 2009, pp. 208–

209. Accessed: Nov. 04, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/5205372

- [132] M. Lee and A. A. Abidi, "A 9 b, 1.25 ps resolution coarse-fine time-to-digital converter in 90 nm CMOS that amplifies a time residue," in *IEEE Journal of Solid-State Circuits*, 2008. doi: 10.1109/JSSC.2008.917405.
- [133] J. Yu, F. F. Dai, and R. C. Jaeger, "A 12-Bit Vernier Ring Time-to-Digital Converter in 0.13 \mu\hbox m CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 830–842, Apr. 2010, doi: 10.1109/JSSC.2010.2040306.
- [134] S. Ur Rehman, M. M. Khafaji, C. Carta, and F. Ellinger, "A 25-Gb/s 270-mW Timeto-Digital Converter-Based 8\times Oversampling Input-Delayed Data-Receiver in 45-nm SOI CMOS," *IEEE Trans. Circuits Syst. Regul. Pap.*, vol. 65, no. 11, pp. 3720–3733, Nov. 2018, doi: 10.1109/TCSI.2018.2851294.
- [135] K. Gammoh, C. K. Peterson, D. A. Penry, and S.-H. W. Chiang, "Linearity Theory of Stochastic Phase-Interpolation Time-to-Digital Converter," *IEEE Trans. Circuits Syst. Regul. Pap.*, vol. 67, no. 12, pp. 4348–4359, 2020, doi: 10.1109/TCSI.2020.3013709.
- [136] H. Molaei and K. Hajsadeghi, "A 5.3-ps, 8-b Time to Digital Converter Using a New Gain-Reconfigurable Time Amplifier," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 66, no. 3, pp. 352–356, Mar. 2019, doi: 10.1109/TCSII.2018.2853187.
- [137] H. Wang, F. F. Dai, and H. Wang, "A Reconfigurable Vernier Time-to-Digital Converter With 2-D Spiral Comparator Array and Second-Order \Delta \Sigma Linearization," *IEEE J. Solid-State Circuits*, vol. 53, no. 3, pp. 738–749, Mar. 2018, doi: 10.1109/JSSC.2017.2788872.
- [138] N. Roy, F. Nolet, F. Dubois, M.-O. Mercier, R. Fontaine, and J.-F. Pratte, "Low Power and Small Area, 6.9 ps RMS Time-to-Digital Converter for 3-D Digital SiPM," *IEEE Trans. Radiat. Plasma Med. Sci.*, vol. 1, no. 6, pp. 486–494, Nov. 2017, doi: 10.1109/TRPMS.2017.2757444.
- [139] R. K. Henderson *et al.*, "A 192\times128 Time Correlated SPAD Image Sensor in 40-nm CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 54, no. 7, pp. 1907– 1916, Jul. 2019, doi: 10.1109/JSSC.2019.2905163.
- [140] W. M. Reichert, *Indwelling neural implants : strategies for contending with the in vivo environment.* in Frontiers in neuroengineering. Boca Raton: CRC Press, 2008.
- [141] C. Hu, *Modern Semiconductor Devices for Integrated Circuits*. Prentice Hall, 2010. [Online]. Available: https://books.google.com/books?id=PosRbWdafnsC
- [142] S. Wang, Z. Lu, K. Xu, H. Dai, Z. Wu, and X. Yu, "A Sub-1-V Nanopower MOS-Only Voltage Reference," J. Low Power Electron. Appl., vol. 14, no. 1, Art. no. 1, Mar. 2024, doi: 10.3390/jlpea14010013.
- [143] R. Jacob Baker, "CMOS Circuit Design, Layout, and Simulation, 3rd Edition (IEEE Press Series on Microelectronic Systems)," 2010.
- [144] Bui Van Hieu, Seunghyun Beak, Seunghwan Choi, Jongkook Seon, and T. T. Jeong, "Thermometer-to-binary encoder with bubble error correction (BEC) circuit for Flash Analog-to-Digital Converter (FADC)," *Int. Conf. Commun. Electron. 2010*, pp. 102–106, Aug. 2010, doi: 10.1109/ICCE.2010.5670690.

- [145] S. Tancock, E. Arabul, and N. Dahnoun, "A Review of New Time-to-Digital Conversion Techniques," *IEEE Trans. Instrum. Meas.*, vol. 68, no. 10, pp. 3406– 3417, Oct. 2019, doi: 10.1109/TIM.2019.2936717.
- [146] M. Mota and J. Christiansen, "A high-resolution time interpolator based on a delay locked loop and an RC delay line," *IEEE J. Solid-State Circuits*, vol. 34, no. 10, pp. 1360–1366, Oct. 1999, doi: 10.1109/4.792603.
- [147] A. Carimatto *et al.*, "11.4 A 67,392-SPAD PVTB-compensated multi-channel digital SiPM with 432 column-parallel 48ps 17b TDCs for endoscopic time-of-flight PET," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, Feb. 2015, pp. 1–3. doi: 10.1109/ISSCC.2015.7062996.
- [148] A. Carimatto et al., "Multipurpose, Fully Integrated 128 \times 128 Event-Driven MD-SiPM With 512 16-Bit TDCs With 45-ps LSB and 20-ns Gating in 40-nm CMOS Technology," *IEEE Solid-State Circuits Lett.*, vol. 1, no. 12, pp. 241–244, Dec. 2018, doi: 10.1109/LSSC.2019.2911043.
- [149] E. Manuzzato *et al.*, "A 16\times8 Digital-SiPM Array With Distributed Trigger Generator for Low SNR Particle Tracking," *IEEE Solid-State Circuits Lett.*, vol. 2, no. 9, pp. 75–78, Sep. 2019, doi: 10.1109/LSSC.2019.2934598.
- [150] E. Conca *et al.*, "Large-Area, Fast-Gated Digital SiPM With Integrated TDC for Portable and Wearable Time-Domain NIRS," *IEEE J. Solid-State Circuits*, vol. 55, no. 11, pp. 3097–3111, Nov. 2020, doi: 10.1109/JSSC.2020.3006442.
- [151] V. F. de Lima and H. Klimach, "A 37 nW MOSFET-Only Voltage Reference in 0.13 µm CMOS," in 2020 33rd Symposium on Integrated Circuits and Systems Design (SBCCI), Aug. 2020, pp. 1–6. doi: 10.1109/SBCCI50935.2020.9189914.
- [152] M.-A. Tetrault, A. C. Therrien, E. D. Lamy, A. Boisvert, R. Fontaine, and J.-F. Pratte, "Dark Count Impact for First Photon Discriminators for SPAD Digital Arrays in PET," *IEEE Trans. Nucl. Sci.*, vol. 62, no. 3, pp. 719–726, Jun. 2015, doi: 10.1109/TNS.2015.2420795.
- [153] F. Arvani, T. C. Carusone, and E. S. Rogers, "TDC sharing in SPAD-based direct time-of-flight 3D imaging applications," in *Proceedings - IEEE International Symposium on Circuits and Systems*, IEEE, May 2019, pp. 1–5. doi: 10.1109/ISCAS.2019.8702586.
- [154] C. Veerappan *et al.*, "A 160x128 single-photon image sensor with on-pixel 55ps 10b time-to-digital converter," in 2011 IEEE International Solid-State Circuits Conference, IEEE, Feb. 2011, pp. 312–314. doi: 10.1109/ISSCC.2011.5746333.
- [155] R. Szplet and A. Czuba, "Two-Stage Clock-Free Time-to-Digital Converter Based on Vernier and Tapped Delay Lines in FPGA Device," *Electronics*, vol. 10, no. 18, Art. no. 18, Jan. 2021, doi: 10.3390/electronics10182190.
- [156] M. Parsakordasiabi, I. Vornicu, A. Rodríguez-Vázquez, and R. Carmona-Galán, "A Low-Resources TDC for Multi-Channel Direct ToF Readout Based on a 28-nm FPGA," *Sensors*, vol. 21, no. 1, Art. no. 1, Jan. 2021, doi: 10.3390/s21010308.
- [157] P. Kwiatkowski and R. Szplet, "Efficient Implementation of Multiple Time Coding Lines-Based TDC in an FPGA Device," *IEEE Trans. Instrum. Meas.*, vol. 69, no. 10, pp. 7353–7364, Oct. 2020, doi: 10.1109/TIM.2020.2984929.