On-scalp printing of personalized electroencephalography e-tattoos
Luize Scalco de Vasconcelos, Yichen Yan, Pukar Maharjan, Satyam Kumar, Minsu Zhang, Bowen Yao, Hongbian Li, Sidi Duan, Eric Li, Eric Williams, Sandhya Tiku, Pablo Vidal, and
13 more authors
On-scalp digital printing of custom-designed, temporary-tattoo-like sensors represents a groundbreaking advancement in noninvasive brain-monitoring technologies, advancing the fields of neuroscience, clinical diagnostics, and brain-computer interfaces (BCIs). Traditional electroencephalography (EEG) systems involve time-consuming manual electrode placement, conductive liquid gels, and cumbersome cables, which are prone to signal degradation and discomfort during prolonged use. Our approach overcomes these limitations by combining material innovations with non-contact, on-body digital printing techniques to fabricate e-tattoos that are self-drying, ultrathin, and compatible with hairy scalps. These skin-conformal EEG e-tattoo sensors enable comfortable, long-term, high-quality brain activity monitoring without the discomfort associated with traditional EEG systems. Using individual 3D head scans, custom sensor layout design, and a 5-axis microjet printing robot, we have created EEG e-tattoos with precise, tailored placement over the entire scalp. The inks for electrodes and interconnects have slightly different compositions to achieve low skin contact impedance and high bulk conductivity, respectively. After printing and self-drying, the inks form conductive, stretchable, and breathable thin films that ensure high signal fidelity, even over extended periods. This technology paves the way for non-invasive, high-performance, and user-friendly brain monitoring that will enhance both patient care and the understanding of the human brain. The broader significance of this technology lies in its potential applications beyond traditional EEG use. On-scalp printed ultrathin e-tattoos could play a pivotal role in developing BCIs for various industries, including prosthetics, virtual reality (VR), and human-robot teaming. This work also opens the possibility of on-body digital manufacture of other types of e-tattoo devices in areas beyond the head, leading to large-area, skin-covered yet deformable and breathable functional e-tattoos.
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS-SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TS-SpGEMM algorithm attains 5\texttimes performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement multi-source breadth-first search and sparse graph embedding algorithms and demonstrate their scalability up to 512 Nodes (or 65,536 cores) on NERSC Perlmutter.
Resource-constrained system-on-chips (SoCs) are increasingly heterogeneous with specialized accelerators for various tasks. Acceleration taxes due to control and data movement, however, diminish end-to-end speedups from hardware acceleration. Meanwhile, emerging workloads are increasingly task-diverse with several, potentially shared, fine-grained acceleration candidates. This motivates a paradigm of parallel and disaggregated acceleration. Compared to a monolithic accelerator, disaggregation provides higher flexibility, reuse, and utilization, but at the cost of higher control and data acceleration taxes. We propose a novel SoC architecture, Mozart, that enables efficient accelerator disaggregation by leveraging shared-memory to tame control and data acceleration taxes. To address the control tax, Mozart includes a lightweight, modular, and general accelerator synchronization interface (ASI). ASI eliminates the typical CPU-centric accelerator control in favor of a decentralized, uniform synchronization interface through shared-memory. This enables accelerators to directly and transparently synchronize with each other (or CPUs) using the same shared-memory interface as CPUs. To address the data tax, Mozart leverages the Spandex-FCS heterogeneous coherence protocol, which supports decentralized data movement and per-word coherence specialization. We demonstrate the first RTL implementation of Spandex-FCS and the first evaluation of its benefits for a heterogeneous SoC with fixed-function accelerators, running real-world applications with Linux. Mozart simultaneously enables, for the first time, (1) finer-grained acceleration than previously possible, (2) programmable and transparent composition of fine-grained, disaggregated accelerators, (3) efficient accelerator pipelining through shared-memory and decentralization, and (4) a performance-competitive disaggregated alternative to specialized monolithic accelerators. We demonstrate these capabilities of Mozart with a comprehensive one-of-a-kind evaluation of more than 70 hardware configurations prototyped on an FPGA employing various accelerators, running real-world applications on Linux, and a scalability analysis with up to 15 accelerators. We also present an analytical performance model to understand and explore system design choices and to validate the results.
Touch-sensitive stretchable electronic skins (e-skins) hold promise for soft robots, prosthetics, bio-mimetics, and bio-sensors. However, a long-standing challenge has been the interference of stretching in pressure readings. Addressing this, we introduce an intrinsically stretchable hybrid response pressure sensor (SHRPS) composed of a laminate featuring a barely conductive porous nanocomposite and an ultrathin dielectric layer situated between two stretchable electrodes. The combined piezoresistive and piezocapacitive responses of the SHRPS enable ultrahigh pressure sensitivity while effectively negating stretch-induced interference. Our findings are underpinned by an experimentally validated electromechanical model. In practical applications, SHRPS mounted on inflatable probes demonstrated safe and precise palpation on the human wrist and conformable and firm gripping of contoured objects. The debut of SHRPS promises to significantly expand the versatile applications of e-skins.
In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, La Jolla, CA, USA, Apr 2024
Sparse matrix dense matrix multiplication (SpMM) is commonly used in applications ranging from scientific computing to graph neural networks. Typically, when SpMM is executed in a distributed platform, communication costs dominate. Such costs depend on how communication is scheduled. If it is scheduled in a sparsity-unaware manner, such as with collectives, execution is often inefficient due to unnecessary data transfers. On the other hand, if communication is scheduled in a fine-grained sparsity-aware manner, communicating only the necessary data, execution can also be inefficient due to high software overhead.We observe that individual sparse matrices often contain regions that are denser and regions that are sparser. Based on this observation, we develop a model that partitions communication into sparsity-unaware and sparsity-aware components. Leveraging the partition, we develop a new algorithm that performs collective communication for the denser regions, and fine-grained, one-sided communication for the sparser regions. We call the algorithm Two-Face. We show that Two-Face attains an average speedup of 2.11x over prior work when evaluated on a 4096-core supercomputer. Additionally, Two-Face scales well with the machine size.