diff --git a/VLSI24/submitted_notebooks/LearnAFE/LearnAFE.ipynb b/VLSI24/submitted_notebooks/LearnAFE/LearnAFE.ipynb index c1bba19..6f92b89 100644 --- a/VLSI24/submitted_notebooks/LearnAFE/LearnAFE.ipynb +++ b/VLSI24/submitted_notebooks/LearnAFE/LearnAFE.ipynb @@ -21,7 +21,7 @@ "source": [ "## Abstract\n", "\n", - "This project presents a circuit-algorithm co-design of a learnable audio analog front-end (AFE) for Keyword Spotting (KWS). Instead of the traditional approach where the AFE for feature extraction is designed separately from the classifier design, this project showcase the inclusion of the AFE design within the neural network classifier training and verification loop. More specifically, the transistors' transconductance ($g_m$) and capcitances in a differential super source-follower bandpass filter (DSSF-BPF) are considered in the SNR-aware training of the Depthwise Separable Convolutional Neural Network (DSCNN) for KWS. This involves a new system-level loss function, ($L_{BPF}$), to include both the classifier and filter performance to achieve a system-level optimal. The availbility of open-source process development kit (PDK) and circit design tools such as ngspice enable this synergistic approach of training a classifier together with its feature extractor. Using sky130, the proposed framework achieved a DSSF-BPF capable of achieving $>90\\%$ KWS accuracy through a DSCNN with verification of the filter performance through SPICE.\n", + "This project presents a circuit-algorithm co-design of a learnable audio analog front-end (AFE) for Keyword Spotting (KWS). Instead of the traditional approach where the AFE for feature extraction is designed separately from the classifier design, this project showcase the inclusion of the AFE design within the neural network classifier training and verification loop. More specifically, the transistors' transconductance ($g_m$) and capcitances in a differential super source-follower bandpass filter (DSSF-BPF or DSF-BFF) are considered in the SNR-aware training of the Depthwise Separable Convolutional Neural Network (DSCNN) for KWS. This involves a new system-level loss function, ($L_{BPF}$), to include both the classifier and filter performance to achieve a system-level optimal. The availbility of open-source process development kit (PDK) and circit design tools such as ngspice enable this synergistic approach of training a classifier together with its feature extractor. Using sky130, the proposed framework achieved a DSSF-BPF capable of achieving $>90\\%$ KWS accuracy through a DSCNN with verification of the filter performance through SPICE.\n", "\n", "
" ] @@ -41,7 +41,7 @@ "source": [ "Always-on keyword spotting (KWS) is an emerging human-machine interface for mobile and Internet of Things (IoT) devices. Key performance metrics for KWS module include classification accuracy and power consumption. Feature extraction (FEx) is a pivotal function in KWS system. The use of analog FEx based on bandpass filterbank offers the advantage of higher energy efficiency compared to its digital counterpart, which requires analog-to-digital conversion (ADC) [[1]](#ref1). However, analog FEx also faces the challenges such as channel overlapping, gain variation, and large on-chip capacitors [[2]](#ref2). Conventional filterbank design follows Mel frequency scale with uniform filter gain across all channels. However, achieving this result requires larger bias current to boost the transistor transconductance ($g_m$) for those high frequency channels.\n", "\n", - "Inspired by the recent work on learnable digital filter [[3]](#ref3), this project introduces a circuit-algorithm co-design framework for learnable audio analog front-end. The analog circuit parameters of a differential super source-follower bandpass filter (DSSF-BPF) filterbank are optimized together with the neural network classifier in a signal-to-noise ratio (SNR)-aware training process for optimum overall system performance.\n", + "Inspired by the recent work on learnable digital filter [[3]](#ref3), this project introduces a circuit-algorithm co-design framework for learnable audio analog front-end. The analog circuit parameters of a differential super source-follower bandpass filter (DSSF-BPF or DSF-BFF) filterbank are optimized together with the neural network classifier in a signal-to-noise ratio (SNR)-aware training process for optimal system-level performance.\n", "\n", "\n", "
" @@ -70,7 +70,7 @@ "source": [ "![architecture.png](attachment:architecture.png)\n", "\n", - "The following diagram shows the architecture of the proposed learnable audio AFE and the schematic of the super source-follower bandpass filter (SSF-BPF). The meticulously selected DSSF-BPF is strategically adopted for its diverse adjustability, exceptional adaptability, and efficiency. Voice signal from microphone goes through FEx to extract the channel energy and convert to digital features for classification. A depthwise separable CNN (DSCNN) network is used for 12-classes classification. Each FEx channel comprises a tunable SSF-BPF and a spectrogram generator. The spectrogram generated is then processed using a half-wave rectifier (HWR) and integrate-and-fire (IAF) layer before the classifer, removing the need for a ADC [[2]](#ref2). The schematic of the 2nd-order DSSF-BPF is shown in the bottom of the following figure. Transistors $M_{1,2}$ and $M_{3,4}$ serve as the super source-followers, and floating capacitors $C_{1,2}$ determine the pole locations. The transistor transconductance for $M_{1,2}$ can be expressed in [(1)](#eq1) and [(2)](#eq2), where $n$ is subthreshold slope factor, and $U_T$ is thermal voltage at room temperature. \n", + "The following diagram shows the architecture of the proposed learnable audio AFE and the schematic of the super source-follower bandpass filter (SSF-BPF). The DSSF-BPF is strategically adopted for its diverse adjustability, exceptional adaptability, and efficiency. Voice signal from microphone goes through FEx to extract the channel energy and convert to digital features for classification. A depthwise separable CNN (DSCNN) network is used for 12-classes classification. Each FEx channel comprises a tunable SSF-BPF and a spectrogram generator. The spectrogram generated is then processed using a half-wave rectifier (HWR) and integrate-and-fire (IAF) layer before the classifer, removing the need for a ADC [[2]](#ref2). The schematic of the 2nd-order DSSF-BPF is shown in the bottom of the following figure. Transistors $M_{1,2}$ and $M_{3,4}$ serve as the super source-followers, and floating capacitors $C_{1,2}$ determine the pole locations. The transistor transconductance for $M_{1,2}$ can be expressed in [(1)](#eq1) and [(2)](#eq2), where $n$ is subthreshold slope factor, and $U_T$ is thermal voltage at room temperature. \n", "\n", "\n", "\\begin{equation} \n", @@ -84,7 +84,7 @@ "\\tag{2}\n", "\\end{equation} \n", "\n", - "Through small signal model analysis, neglecting body effects, the filter transfer function $H(s)$, central frequency $f_c$, quality factor $Q$ and passband gain $A$ can be derived as shown in [(3)](#eq3) - [(6)](#eq6). Transfer function [(3)](#eq3) reveals that all parameters ($g_{m1}$, $g_{m2}$, $C_1$, $C_2$) within this structure can be individually adjusted. Operating in the subthreshold region, the DSF-BPF utilizes the capability to modulate $g_m$ through external current adjustments, along with the ability to finely tune in-chip capacitors, thus conveniently regulating gain, center frequency, and Q-factor. \n", + "Through small signal model analysis, neglecting body effects, the filter transfer function $H(s)$, central frequency $f_c$, quality factor $Q$ and passband gain $A$ can be derived as shown in [(3)](#eq3) - [(6)](#eq6). Transfer function [(3)](#eq3) reveals that all parameters ($g_{m1}$, $g_{m2}$, $C_1$, $C_2$) within this structure can be individually adjusted. Operating in the subthreshold region, the DSSF-BPF utilizes the capability to modulate $g_m$ through external current adjustments, along with the ability to finely tune in-chip capacitors, thus conveniently regulating gain, center frequency, and Q-factor. \n", "\n", "\n", "\\begin{equation} \n", @@ -110,7 +110,7 @@ "\\tag{6}\n", "\\end{equation} \n", "\n", - "It can be observed that the key performance of each DSF-BPF can be largely defined by 4 circuit parameters ($g_{m1}, g_{m2}, C_1, C_2$). These parameters are included in the system training process using back propagation. The initial circuit values are based on the reference design in SPICE circuit simulation. The trained values are feedback to update the circuit parameters. Instead of just achieving local optimums when optimizing the FEx front-end and classification back-end separately, the proposed approach seeks to achieve global optimum including both front-end and back-end.\n", + "It can be observed that the key performance of each DSSF-BPF can be largely defined by 4 circuit parameters ($g_{m1}, g_{m2}, C_1, C_2$). These parameters are included in the system training process using back propagation. The initial circuit values are based on the reference design in SPICE circuit simulation. The trained values are feedback to update the circuit parameters. Instead of just achieving local optimums when optimizing the FEx front-end and classification back-end separately, the proposed approach seeks to achieve global optimum including both front-end and back-end.\n", "\n", "![workflow.png](attachment:workflow.png)\n", "\n", @@ -7376,7 +7376,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The following table shows the comparison between fixed and learnable AFE on hardware resource utilization and KWS accuracy. For a fixed AFE, $\\phi_I$, $\\phi_C$, and $Q$ are same across 16 channels, while learnable AFE shows nonuniform gains and Q-factor across the 16 channels. Through circuit-algorithm co-design, The presented architecture achieves notable reductions of 8.7% and 12.9% in DSF-BPF power and area consumption, respectively, while maintaining outstanding classification accuracy ranging from 89.4% to 92.4% under 5 dB to 20 dB SNR. \n", + "The following table shows the comparison between fixed and learnable AFE on hardware resource utilization and KWS accuracy. For a fixed AFE, $\\phi_I$, $\\phi_C$, and $Q$ are same across 16 channels, while learnable AFE shows non-uniform gains and Q-factor across the 16 channels. Through circuit-algorithm co-design, The presented architecture achieves notable reductions of 8.7% and 12.9% in DSSF-BPF power and area consumption, respectively, while maintaining outstanding classification accuracy ranging from 89.4% to 92.4% under 5 dB to 20 dB SNR. \n", "\n", "\\begin{array}{|c|c|c|}\n", "\\hline\n", @@ -7409,7 +7409,7 @@ "source": [ "## Conclusion\n", "\n", - "In conclusion, a learnable audio AFE with a DSCNN optimizes DSF-BPF for always-on KWS is presented." + "In conclusion, a learnable audio AFE with a DSCNN optimizes DSSF-BPF for always-on KWS is presented." ] }, { diff --git a/VLSI24/submitted_notebooks/LearnAFE/README.md b/VLSI24/submitted_notebooks/LearnAFE/README.md index b20501b..1760349 100644 --- a/VLSI24/submitted_notebooks/LearnAFE/README.md +++ b/VLSI24/submitted_notebooks/LearnAFE/README.md @@ -119,8 +119,10 @@ _background_noise_/white_noise.wav ## License -This project is licensed under the MIT License. See LICENSE for more details. +This project is licensed under the MIT License. See LICENSE for more details.
+Additionally the first version of this work has been accepted for publication in DAC. The following citation can be modified once the proceeding is made available.
+J. Hu, Z. Zhang, C. S. Leow, W. L. Goh, and Y. Gao, “Late Breaking Results: Circuit-Algorithm Co-design for Learnable Audio Analog Front-End,” in 61st ACM/IEEE Design Automation Conf. (DAC), accepted, 2024. ## Acknowledgement -This work was supported by the Agency for Science, Technology and Research (A*STAR), Singapore under the Nanosystems at the Edge programme, grant No. A18A1b0055. We thank Professor Zhengya Zhang for his insightful comments to strengthen this work. +This work was supported by the Agency for Science, Technology and Research (A*STAR), Singapore under the Nanosystems at the Edge programme, grant No. A18A1b0055. We thank Professor Zhengya Zhang for his insightful comments.