OPEN FAU

Online publication system of Friedrich-Alexander-Universität Erlangen-Nürnberg

The online publication system OPEN FAU is the central publication platform for Open Access publishing for all members of Friedrich-Alexander-Universität. Qualified works from research and teaching may be published here free of charge, either as a primary or secondary publication. The full texts are permanently available worldwide and are findable and citable via catalogues and search engines.

To search for documents in OPEN FAU, please select "Search" (via the magnifying glass at the top right); this will provide you with various search options. If you want to publish a document, go to "Login" and "My Publications". Then drag you document into the field provided and enter the metadata. In just a few steps, you can submit your document. Please note our guidelines, the publication contract and FAQs.

Recent Submissions

Doctoral thesis

Open Access

Approximate and Reconfigurable Precision Instruction Set Processors for Tightly Coupled Processor Arrays

(2024) Brand, Marcel; Teich, Jürgen

With the decline of Moore’s law, the trend of processor architecture design shifts from powerful single-core processors to compensating for the stagnating increase of single-core frequency with the parallelism provided by many- and multi-core systems. For the increasing degree of parallelism, it also becomes more and more important to have access to small processing elements that are not only energy efficient but have a compact instruction set and use the costly instruction memory efficiently. This thesis presents a set of novel processor architectures that can be particularly beneficial when used in loop accelerator architectures like Tightly Coupled Processor Arrays (TCPAs) to not only save hardware but also energy and time under constraints for computational accuracy. Already the current generation of processor-array-on-chip architectures, e.g., coarse-grained reconfigurable or programmable arrays, include more than 100s or even 1,000s of processing elements. Thus, it becomes essential to keep the on-chip configuration/instruction memories of each processing element as small as possible. Even compilers must take into account the scarceness of available instruction memory and create the code as compact as possible. However, compilers for Very Long Instruction Word (VLIW) processors have the well-known problem that they typically produce lengthy codes, especially for pipelined instruction sequences. Barely utilized Functional Units (FUs) as well as repeating operations of single FUs inside pipelined instruction sequences may lead to unnecessary or redundant code. Techniques like software pipelining can be used to improve the utilization of the FUs, yet with the risk of code explosion due to the overlapped scheduling of multiple loop iterations or other control flow statements. The proposed Orthogonal Instruction Processing (OIP) processor architecture by Brand et al. shows that the size of pipelined code of compute-intensive loop programs can be reduced significantly compared to the size of an equivalently pipelined VLIW program. The general concept of OIP is to have each FU process a microprogram orthogonally to each other but share control flow and access to the peripheral infrastructure. Contrary to VLIW processors, each FU is equipped with its own instruction memory, branch unit, and program counter. Each FU has access to shared register files and the flags from all functional units inside the processor. The synchronization of the microprograms may necessitate the repeated execution of single instructions for a fixed number of cycles. This can be encoded in the branch instruction and capsuled by only a single instruction to prevent redundant code. To utilize OIP processors to their full potential, they have to be programmed in a way that minimizes the idle time of FUs (e.g., due to data dependencies) and maximizes throughput. To solve this resource-constrained modulo scheduling problem, techniques based on mixed integer linear programming have been proposed. The necessary hardware extensions of OIP do not produce an overhead of resources, especially of needed instruction memory, compared to VLIW processors. Therefore, the architecture in conjunction with a set of benchmark applications has been analyzed and evaluated regarding program size, memory size, and overall architecture cost (based on a cost model by Müller et al.) in relation to VLIW processors. It could be shown that OIP produces no computational or memory overhead as soon as the program size of the OIP application is at least 50% less than the VLIW application. The necessary code size reduction is not only achieved for all investigated benchmarks, but depending on the application, the code size can even go down to 4.6% compared to its VLIW counterpart. Thus, expensive instruction memory can be saved which reduces the area and power requirements of an OIP processor in comparison to a VLIW processor with an equal number of functional units. Besides memory requirements, many relevant applications from domains like image processing or machine learning are compute-intensive. But not always do they rely on perfectly accurate results, e.g., even though reasonably inaccurate computations of Convolutional Neural Networks (CNNs) may influence the exact probabilities of each class, they rarely change the final decision. With this motivation in mind, Anytime Instruction Processing (AIP) has been defined and investigated, a concept for programmable accuracy floating-point computations. AIP gives a programmer or compiler control over the accuracy of computed floating-point (FP) operations. The accuracy of the computations is encoded on bit granularity into the instruction, which leads to the executed operation only computing that many most significant bits (MSBs) and may even terminate earlier than when it had been computed at full accuracy. This is achieved by encoding an intended accuracy into the instruction’s opcode which specifically defines how many most significant mantissa bits of the FP operation shall be computed. Thus, only errors in the resulting mantissa are to be expected, while the resulting exponent and sign are computed accurately. An anytime division capitalizes on the fact that divisions are classically already computed MSB first and is implemented by a non-restoring division that can terminate early based on the instructed accuracy. Implementations for the typically non-MSB first operations of addition and multiplication were presented in. One implementation uses on-line arithmetic to compute the addition or multiplication of the mantissa MSB first, and the alternative uses a bitmasking scheme to mask the least significant bits that should not be computed. In on-line arithmetic, a recurrence equation is derived for an operation in which the dependencies between two consecutive result bits are removed (e.g., the carry chain of an addition). Without those dependencies, each result bit can be computed independently of all others, enabling MSB first computation. By computing MSB first, on-line arithmetic also provides a high potential for pipelining: the execution of consecutive on-line instructions can start as soon as the first digit of the preceding instruction has been computed. Furthermore, operating in a redundant number format can help to reduce the complexity of the recurrence equation and the number of iterations required. Thus, binary operations in on-line arithmetic are usually performed in the redundant Signed Digit Radix-2 (SDR2) number format. The redundant FP number format SDFP has been defined based on SDR2 and enables the use of on-line arithmetics in anytime instructions. An alternative implementation of anytime additions and multiplications, the bitmasking approach, is based on masking the operand bits that do not contribute to the specified a MSBs of the result mantissa. In case of a multiplication, this masking is applied to the partial products, instead of directly to the operands to reduce a possible error. Compared to on-line arithmetic, computations may have a higher error, but the hardware overhead is negligible. After integrating the anytime instruction paradigm into the C++ arbitrary precision framework Aarith, this framework was then used to evaluate the error behavior of anytime addition, multiplication, and division operations. Power and area were evaluated through the use of synthesis and simulation tools. The experiments clearly show a favor of computing iterative applications with anytime instructions. It is shown that the computation of an iterative Jacobi solver can on average save up to 39% of energy with a computational error of below 0.1% when compared to single-precision FP computations. Further, the applicability of anytime instructions for CNNs has been specifically investigated by setting an individual accuracy per layer of the inference of a ResNet-18 CNN implementation. It could be shown by a design space exploration that by using AIP, the energy of the inference can be reduced by up to 62%, again, compared to single-precision FP computations. Besides the programmable accuracy provided by anytime instructions, reconfigurable-precision floating-point functional units have been investigated that not only provide different floating-point formats that can be selected dynamically but also vectorization and sub-word parallelism to potentially increase the performance of applications even further. In summary, by using the concepts presented in this thesis implemented into the novel OIP processor architecture, it is possible to save not only hardware area and instruction memory but also tremendously energy and time when executing loop applications under constraints for computational accuracy, in specific circumstances even without reducing the result accuracy.

Doctoral thesis

Open Access

Transport phenomena on the nanoscale: from isotropic systems to extreme confinement

(2024-04-29) Baer, Andreas; Smith, Ana-Suncana

Nanoscale systems, including particles or macromolecules as well as fluids confined to nanochannels and liquid films, are fascinating as they feature a transition between macroscopic continuum hydrodynamics and the intrinsically discrete molecular scale. A wealth of new phenomena arises due to this cross over as the separation of time or length scales between the different components (liquid – confinement – particle) is often not satisfied. The present thesis tackles this transition region focusing on diffusive transport in a series of six peer-reviewed articles [P1] to [P6]. Molecular dynamics (MD) simulations are applied as primary method, that is a unique tool allowing to resolve molecular details and sampling statistical averages on the mesoscopic scale. The thesis starts with addressing the validity of the Stokes-Einstein-Sutherland (SES) equation [P1]. It captures a fundamental relation between the diffusion coefficient of a particle or molecule, its hydrodynamic radius and the surrounding fluid viscosity. The derivation of the SES equation at equilibrium assumes a continuous description of the fluid and a separation of time and length scales: the particle is required to be large and heavy compared to solvent molecules such that the momentum of the particle changes slowly compared to molecular time scales of the fluid. With these conditions violated for a nanoparticle, a breakdown at the nanoscale was often proposed and the latter even confirmed by several MD studies. Contrastingly, most experiments including those of our collaborators confirmed its validity for particles down to 1 nm in diameter. This discrepancy is tackled in the thesis by extensive MD simulations of the C60 buckminsterfullerene diffusing in toluene. This system clearly violates crucial conditions underlying the SES equation, yet the law is restored in simulations in the framework of the linear response to a constant drag force. This explains the success of the experiments that typically rely on the analysis of particle sedimentation when applying the SES equation. Notably, consistent with the Knudsen number, small deviations from perfect stick boundary conditions at the particle interface are required to obtain uniform results in experiments and simulations. The study of bulk systems is extended to understand diffusion in confined liquids. Here the ordering of the solvent at the interface with the solid or the vapour phase may occur on similar length scales as the characteristic length scale of the confinement and the size of the diffusing object. Similarly, the characteristic time scales of diffusion life times of structural fluctuations become comparable to those of interactions of molecules with interfaces and other molecules. The anisotropy due to confinement also requires separate handling of the directions parallel and orthogonal to the confining walls when analysing transport properties. These issues are first tackled in MD simulations of the solvent phase within solid pores or thin films in [P2] to [P4]. Using the analysis tools developed in the PULS group it is possible to show that significant anisotropic oscillations of transport coefficients may take place due to effective interactions with the interfaces. Understanding the behaviour of confined solvents is a prerequisite for the investigation of diffusive transport of nanoparticles as solutes in such confined systems [P5]. Using the fullerenes C60 and C70 in toluene filled alumina pores as model systems, it is shown that an effective diffusion coefficient can be well estimated by measuring the diffusivity in the centre of a pore and at the interface, as well as the transition rates between these two regions. These rates are estimated from the potential of mean force (PMF) of the particle with the solid. This approach is of particular relevance for understanding separation techniques including chromatography, as the direct relation between the effective transport coefficients in the pore and the particle retention time is established. With equilibrium transport properties at the nanoscale extensively analysed, the attention finally shifts towards non-equilibrium systems, specifically addressing the viscosity of water in electric fields [P6]. Due to their dipole moment, water molecules couple to such fields, altering its intrinsic structure as well as relaxation processes in an anisotropic manner. In the extensive analysis of relaxations of thermal excitations and changes in the first and second hydration shell it is possible to study the competition between order imposed by the intrinsic tetrahedral structure and order imposed by the field. It is furthermore possible to assign different modes to time dependent viscosity, each relating to different molecular relaxation processes, and ultimately explain the anisotropic response of the system. The transport phenomena of nanoscale systems tackled in the present thesis demand further research to be conducted in this area. The theoretical foundations of the SES equation are recapped and brought into the context of experiments and simulations, providing a solid framework for studying the diffusion of small dispersed nanoparticles. Relaxing the stick boundary condition demands an in-depth analysis covering various systems and particle sizes to allow for an a priori estimate of the boundary condition, e.g. from the Knudsen number. Furthermore, the study on the transport of dispersions through narrow pores paves the way for establishing a unified view of the relationship between interactions and transport properties of such systems in general. For systems with polar components, the investigations of water in an electric field furthermore provide valuable insights, into how the transport phenomena are altered in the presence of these fields, most relevant for water-filled nanopores. Therefore, the tools and concepts developed in this work shall find applications well beyond the systems studied herein.

Doctoral thesis

Open Access

Design Space Exploration for Analog Spiking Neural Networks

(2024) Elmasry, Moamen; Weigel, Robert

The human brain will always remain one of the greatest creations of all time. The development of brain-inspired neuromorphic computing has become mainstream nowadays due to its high parallelism and energy efficiency. Neuromorphic computing is an emerging field that aims to mimic the functionality of the human brain using electronic circuits. One of the most important aspects of this field is the implementation of neural models and the evaluation of their performance. Developing and deploying neuromorphic computing systems is an exceptionally challenging undertaking, requiring the collaboration of researchers from a variety of disciplines. Collaboration is necessary to bring together the development of data processing techniques, computational structures and the fundamental technologies on which these systems are based. The development of neuromorphic computing systems requires the simultaneous processing of large amounts of data, adaptation to dynamic environmental changes and replication of the complex functionality of the human brain. Artificial intelligence plays a role both in large data centers and in end devices such as smartphones, drones and networked household appliances. Data processing directly on these devices is called edge computing. Neuromorphic computing systems for edge computing offer promising prospects. Using these systems, data can be processed and analyzed in real time at the source, rather than in centralized data centers, thereby increasing responsiveness and reducing latencies. Furthermore, local computation satisfies the security requirements of potentially sensitive data by eliminating the need to communicate data over wireless networks to central computing systems. Additionally, neuromorphic systems are well-suited for a range of cognitive applications due to their intrinsic learning capabilities and ability to process complex data. Their versatility and efficiency can significantly advance areas such as pattern recognition, natural language processing and robotics. Moreover, neuromorphic systems are an attractive choice for energy-efficient computing applications, which are increasingly important in today’s world of sustainable technologies, due to their ability to achieve significant energy savings compared to traditional processors (Christensen et al., 2022). The efficient and effective execution of AI on conventional hardware is hindered by the so-called von Neumann architecture. This is characterized by the fact that a processor is responsible for calculating the data and a memory for storing the results. Both are connected by a data bus, which leads to the so-called "memory wall" during the execution of artificial neural networks. In this case, data transfer to and from external memory devices becomes a major obstacle to energy efficiency. In addition, current hardware accelerators struggle with performance and power consumption issues. To address these challenges, non-volatile memory devices such as ReRAM, CBRAM, PCM, MRAM and FeFET offer promising possibilities, as they can partially handle the computation directly in memory, thus helping to reduce extensive data movement. In addition, optimization of device parameters such as memory capacity, lifetime, cycle stability, variation, and speed is crucial for an effective use in neuromorphic applications. Material innovations to enhance analog switching capabilities and three-dimensional component architectures are also being investigated to advance the design of neuromorphic hardware. In this work, a comprehensive design for a spike-based neural network platform has been presented to integrate a full ReRAM (Resistive Random Access Memory) IP implementation into a dynamic Spiking Neural Network architecture. A fully integrated System-on-Module (SoM) has been implemented, enabling a comprehensive exploration of the design space for neurosynaptic behavior in Spiking Neural Networks. This platform includes several state-of the-art components, including a ReRAM array, buffers, word line/bit line (WL/BL) drive and a digital interface, all working in synergy with replaceable non-volatile memory (NVM) modules. These components are designed to accommodate different types of spike-based modules. The interconnectivity of the platform has been a key driver for the introduction of the latest generation NVM to keep pace with ReRAM technology available in the market. This platform provides a comprehensive evaluation of ReRAM technology compared to other non-volatile memory (NVM) technologies. The integration not only increases versatility, but also expands the platform’s capabilities and applications. One of the key innovations in our platform is the emergence of a hybrid Memristor CMOS multi IP architecture. Within this framework, a ReRAM IP module coexists harmoniously with a neural network composed of spiking components. A novel weighted synaptic structure that combines weight detection and synaptic current provisioning is introduced as an interface structure between the NVM macro and the evolved SNN components. This innovative architecture provides practical solutions to real-world problems while providing a research platform for computational neuroscience. Our fully integrated system-on-module, realized in 28 nm CMOS technology, enables enhanced evaluation of the spatio-temporal properties of Spiking Neural Networks (SNNs). This platform provides a reconfigurable and interchangeable environment for modular testing of spike-based components. Integrated non-volatile memory in the form of a 128x128 ReRAM cell array enables storage of synaptic weights, while fully traceable signals ensure reliable data acquisition and analysis. The integration of a considerable number of I/O pins improves the testability of the system. Furthermore, the integrated neurons show competitive performance in terms of parameter sets, reconfigurability and energy efficiency, with activation requiring only about 0.2 pJ per spike.

Doctoral thesis

Open Access

Impacts of supraglacial lakes and snowmelt on glacier velocity on the example of the Baltoro Glacier in Pakistan

(2024) Wendleder, Anna; Braun, Matthias

Millions of people live along the Indus River in Pakistan and are dependent on the meltwater from the glaciers in the Karakoram. Glacial meltwater also mainly controls the glacier dynamics which is a key information on glacier evolution in a changing climate. Timing and amount of meltwater influences the easonal evolution and the variation of glacier dynamic. Though, the glaciers in the Karakoram are not yet been sufficiently researched to better understand their glacier dynamics and their drivers. In addition, most of the glaciers are covered with an extensive debris cover and thus react more complex to climate changes than glaciers without debris. The focus of this work is to better understand the dynamics of Baltoro Glacier in Pakistan and hydrological drainage within the glacier. The long-term time series of glacier velocity fields derived from multi-mission Synthetic Aperutre Radar (SAR) data with high temporal and spatial resolution enable the monitoring of (intra-)seasonal and annual glacier dynamics. In combination with time series of supraglacial lakes derived from Earth observation data and precipitation and temperature from reanalysis and satellite-based data complex relationships were found between winter precipitation, summer melt, supraglacial lakes, crevasses and glacier acceleration. The high winter precipitation is associated with an acceleration of the glaciers in spring, while the heavy precipitation in spring leads to an increase in supraglacial lakes. The higher temperatures during the early melting season also influenced the formation of lakes and thus the increase in meltwater in the glacier. The mapping of supraglacial lakes is based on an annual resolution and little is known about their seasonal behaviour. The multi-temporal and multi-sensor summer time series made it possible to determine the characteristic filling and discharge periods, the lake area change over the years, and how the seasonal development varies over the years. The supraglacial lakes filled between mid-April to mid-June and drained between mid-June to mid-September and expanded faster than they contracted. A tendency towards the formation of larger lakes (>0.04 km2 ) over time is visible. The combination of the dense and high-temporal time series of supraglacial lakes with the glacier surface velocity, snowmelt, runoff, precipitation, and temperature derived of earth observation and reanalysis data enables to analyze influence of supraglacial lakes and snowmelt and the drivers of glacier velocity. The prolonged period of positive air temperatures in spring affected both snowmelt and supraglacial lake formation. Snow and ice melt had the greatest influence on the spring acceleration and the high glacier velocities in summer. The drainage of the supraglacial lakes caused the glacier acceleration in fall. The influence of melting and drainage of supraglacial lakes on glacier dynamics is therefore surprisingly large, with the former also leading to efficient drainage. Despite the insulation provided by the debris cover on the main branch, the Baltoro glacier is sensitive to temperature rise, leading to additional ice loss.

Doctoral thesis

Open Access

Myeloid ZEB1 in the gastrointestinal tumor and metastatic microenvironment

(2024) Fuchs, Kathrin; Brabletz, Thomas

Cancer is one of the leading causes of morbidity worldwide. Colorectal (CRC) and pancreatic (PDAC) cancer will account for almost 20% of all expected cancer deaths in 2023. While cancer research mainly focused on the malignant tumor cells for many years, the critical impact of the tumor microenvironment (TME) on tumor progression has recently come into focus. In particular, tumor associated macrophages (TAMs) in the TME play a dual role in influencing clinical outcome of cancer patients, depending on their polarization towards tumor suppressive or supportive subtypes. The transcription factor ZEB1 conveys plasticity to numerous cell types, for example the epithelial-to-mesenchymal transition in tumor cells, which fosters malignant progression. Interestingly, ZEB1 is also present in stromal cells of TME. In this study, we therefore sought to investigate how plasticity factor ZEB1 can alter TAM polarization and functions and thereby affect gastrointestinal tumor development and metastatic colonization. Based on clinical data, we demonstrate the expression of ZEB1 in TAMs in primary tumors and metastases. Employment of a mouse model with conditional homozygous knockout of myeloid Zeb1 (ZEB1LysMDel) revealed no evident impact of myeloid ZEB1 on organ development, tissue and immune homeostasis in mice. In contrast, myeloid ZEB1 depletion resulted in increased tumor growth and tumorgenicity of subcutaneous syngeneic allografts of CRC cell line CMT 93. These results were partially reproduced in CRC cell line MC 38 and PDAC cell line KPCz661. Moreover, metastatic lung colonization of KPCz661 was also considerably enhanced in ZEB1LysMDel compared to control mice. Surprisingly, in vitro characterization of macrophages derived from ZEB1LysMDel mice did not reveal ZEB1 as a master regulator of macrophage polarization transcription, but rather as fine tuner of specific macrophage effectors. Remarkably, we provide evidence that the secretion of cytokines CCL2 and CCL22 by ZEB1 proficient macrophages may be responsible for macrophage-mediated chemotaxis of cytotoxic T cells into the TME. ZEB1 deficient macrophages were incapable of recruiting sufficient cytotoxic T cells into the TME, resulting in diminished tumor cell apoptosis and subsequently increased tumor or metastatic colony burden. Collectively, our data provide evidence for a novel and unanticipated tumor suppressive function of ZEB1 in TAMs by chemokine mediated cytotoxic T cell recruitment. This study reinforces the importance of the complex interactions in the TME and their influence on cancer biology.