Tim's Arxiv FrontPage


Generated on 2024-06-16.


This frontpage is generated by scraping new papers on Arxiv and using an embedding model to find papers matching topics I'm interested in. Currently, the false positive rate is fairly high. The repo is here. Forked and customized from this project


Artificial General Intelligence

2024-06-13

Towards Unified AI Models for MU-MIMO Communications: A Tensor Equivariance Framework

In this paper, we propose a unified framework based on equivariance for the design of artificial intelligence (AI)-assisted technologies in multi-user multiple-input-multiple-output (MU-MIMO) systems. 0.826We first provide definitions of multidimensional equivariance, high-order equivariance, and multidimensional invariance (referred to collectively as tensor equivariance).On this basis, by investigating the design of precoding and user scheduling, which are key techniques in MU-MIMO systems, we delve deeper into revealing tensor equivariance of the mappings from channel information to optimal precoding tensors, precoding auxiliary tensors, and scheduling indicators, respectively.To model mappings with tensor equivariance, we propose a series of plug-and-play tensor equivariant neural network (TENN) modules, where the computation involving intricate parameter sharing patterns is transformed into concise tensor operations.Building upon TENN modules, we propose the unified tensor equivariance framework that can be applicable to various communication tasks, based on which we easily accomplish the design of corresponding AI-assisted precoding and user scheduling schemes.Simulation results demonstrate that the constructed precoding and user scheduling methods achieve near-optimal performance while exhibiting significantly lower computational complexity and generalization to inputs with varying sizes across multiple dimensions.This validates the superiority of TENN modules and the unified framework.

link

2024-06-13

State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era

Effectively learning from sequential data is a longstanding goal of Artificial Intelligence, especially in the case of long sequences. 0.82From the dawn of Machine Learning, several researchers engaged in the search of algorithms and architectures capable of processing sequences of patterns, retaining information about the past inputs while still leveraging the upcoming data, without losing precious long-term dependencies and correlations.While such an ultimate goal is inspired by the human hallmark of continuous real-time processing of sensory information, several solutions simplified the learning paradigm by artificially limiting the processed context or dealing with sequences of limited length, given in advance. 0.822These solutions were further emphasized by the large ubiquity of Transformers, that have initially shaded the role of Recurrent Neural Nets.However, recurrent networks are facing a strong recent revival due to the growing popularity of (deep) State-Space models and novel instances of large-context Transformers, which are both based on recurrent computations to go beyond several limits of currently ubiquitous technologies.In fact, the fast development of Large Language Models enhanced the interest in efficient solutions to process data over time.This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing.A complete taxonomy over the latest trends in architectural and algorithmic solutions is reported and discussed, guiding researchers in this appealing research field.The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time, towards a more realistic scenario where patterns are effectively processed online, leveraging local-forward computations, opening to further research on this topic.

link

2024-06-13

A Symbolic Computing Perspective on Software Systems

Symbolic mathematical computing systems have served as a canary in the coal mine of software systems for more than sixty years.They have introduced or have been early adopters of programming language ideas such ideas as dynamic memory management, arbitrary precision arithmetic and dependent types.These systems have the feature of being highly complex while at the same time operating in a domain where results are well-defined and clearly verifiable.These software systems span multiple layers of abstraction with concerns ranging from instruction scheduling and cache pressure up to algorithmic complexity of constructions in algebraic geometry. 0.823All of the major symbolic mathematical computing systems include low-level code for arithmetic, memory management and other primitives, a compiler or interpreter for a bespoke programming language, a library of high level mathematical algorithms, and some form of user interface.Each of these parts invokes multiple deep issues. We present some lessons learned from this environment and free flowing opinions on topics including: * Portability of software across architectures and decades; *Infrastructure to embrace and infrastructure to avoid; *Choosing base abstractions upon which to build; *How to get the most out of a small code base; *How developments in compilers both to optimise and to validate code have always been and remain of critical importance, with plenty of remaining challenges; *The way in which individuals including in particular Alan Mycroft who has been able to span from hand-crafting Z80 machine code up to the most abstruse high level code analysis techniques are needed, and *Why it is important to teach full-stack thinking to the next generation.

link

2024-06-13

Generative Inverse Design of Crystal Structures via Diffusion Models with Transformers

Recent advances in deep learning have enabled the generation of realistic data by training generative models on large datasets of text, images, and audio. 0.827While these models have demonstrated exceptional performance in generating novel and plausible data, it remains an open question whether they can effectively accelerate scientific discovery through the data generation and drive significant advancements across various scientific fields.In particular, the discovery of new inorganic materials with promising properties poses a critical challenge, both scientifically and for industrial applications.However, unlike textual or image data, materials, or more specifically crystal structures, consist of multiple types of variables - including lattice vectors, atom positions, and atomic species.This complexity in data give rise to a variety of approaches for representing and generating such data.Consequently, the design choices of generative models for crystal structures remain an open question.In this study, we explore a new type of diffusion model for the generative inverse design of crystal structures, with a backbone based on a Transformer architecture.We demonstrate our models are superior to previous methods in their versatility for generating crystal structures with desired properties.Furthermore, our empirical results suggest that the optimal conditioning methods vary depending on the dataset.

link

2024-06-13

Transformers meet Neural Algorithmic Reasoners

Transformers have revolutionized machine learning with their simple yet effective architecture.Pre-training Transformers on massive text datasets from the Internet has led to unmatched generalization for natural language understanding (NLU) tasks.However, such language models remain fragile when tasked with algorithmic forms of reasoning, where computations must be precise and robust. 0.822To address this limitation, we propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs).Such NARs proved effective as generic solvers for algorithmic tasks, when specified in graph form.To make their embeddings accessible to a Transformer, we propose a hybrid architecture with a two-phase training procedure, allowing the tokens in the language model to cross-attend to the node embeddings from the NAR.We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning, both in and out of distribution.

link

2024-06-13

Characterising Interventions in Causal Games

Causal games are probabilistic graphical models that enable causal queries to be answered in multi-agent settings.They extend causal Bayesian networks by specifying decision and utility variables to represent the agents' degrees of freedom and objectives.In multi-agent settings, whether each agent decides on their policy before or after knowing the causal intervention is important as this affects whether they can respond to the intervention by adapting their policy.Consequently, previous work in causal games imposed chronological constraints on permissible interventions.We relax this by outlining a sound and complete set of primitive causal interventions so the effect of any arbitrarily complex interventional query can be studied in multi-agent settings.We also demonstrate applications to the design of safe AI systems by considering causal mechanism design and commitment. 0.848

link

2024-06-13

Instance-level quantitative saliency in multiple sclerosis lesion segmentation

In recent years, explainable methods for artificial intelligence (XAI) have tried to reveal and describe models' decision mechanisms in the case of classification tasks. 0.856However, XAI for semantic segmentation and in particular for single instances has been little studied to date.Understanding the process underlying automatic segmentation of single instances is crucial to reveal what information was used to detect and segment a given object of interest.In this study, we proposed two instance-level explanation maps for semantic segmentation based on SmoothGrad and Grad-CAM++ methods.Then, we investigated their relevance for the detection and segmentation of white matter lesions (WML), a magnetic resonance imaging (MRI) biomarker in multiple sclerosis (MS).687 patients diagnosed with MS for a total of 4043 FLAIR and MPRAGE MRI scans were collected at the University Hospital of Basel, Switzerland.Data were randomly split into training, validation and test sets to train a 3D U-Net for MS lesion segmentation.We observed 3050 true positive (TP), 1818 false positive (FP), and 789 false negative (FN) cases.We generated instance-level explanation maps for semantic segmentation, by developing two XAI methods based on SmoothGrad and Grad-CAM++.We investigated: 1) the distribution of gradients in saliency maps with respect to both input MRI sequences; 2) the model's response in the case of synthetic lesions; 3) the amount of perilesional tissue needed by the model to segment a lesion.Saliency maps (based on SmoothGrad) in FLAIR showed positive values inside a lesion and negative in its neighborhood.Peak values of saliency maps generated for these four groups of volumes presented distributions that differ significantly from one another, suggesting a quantitative nature of the proposed saliency.Contextual information of 7mm around the lesion border was required for their segmentation.

link

Complex Systems

2024-06-13

CGP++ : A Modern C++ Implementation of Cartesian Genetic Programming

The reference implementation of Cartesian Genetic Programming (CGP) was written in the C programming language.C inherently follows a procedural programming paradigm, which entails challenges in providing a reusable and scalable implementation model for complex structures and methods. 0.83Moreover, due to the limiting factors of C, the reference implementation of CGP does not provide a generic framework and is therefore restricted to a set of predefined evaluation types.Besides the reference implementation, we also observe that other existing implementations are limited with respect to the features provided.In this work, we therefore propose the first version of a modern C++ implementation of CGP that pursues object-oriented design and generic programming paradigm to provide an efficient implementation model that can facilitate the discovery of new problem domains and the implementation of complex advanced methods that have been proposed for CGP over time.With the proposal of our new implementation, we aim to generally promote interpretability, accessibility and reproducibility in the field of CGP.

link

2024-06-13

A Symbolic Computing Perspective on Software Systems

Symbolic mathematical computing systems have served as a canary in the coal mine of software systems for more than sixty years.They have introduced or have been early adopters of programming language ideas such ideas as dynamic memory management, arbitrary precision arithmetic and dependent types.These systems have the feature of being highly complex while at the same time operating in a domain where results are well-defined and clearly verifiable. 0.842These software systems span multiple layers of abstraction with concerns ranging from instruction scheduling and cache pressure up to algorithmic complexity of constructions in algebraic geometry.All of the major symbolic mathematical computing systems include low-level code for arithmetic, memory management and other primitives, a compiler or interpreter for a bespoke programming language, a library of high level mathematical algorithms, and some form of user interface.Each of these parts invokes multiple deep issues. We present some lessons learned from this environment and free flowing opinions on topics including: * Portability of software across architectures and decades; *Infrastructure to embrace and infrastructure to avoid; *Choosing base abstractions upon which to build; *How to get the most out of a small code base; *How developments in compilers both to optimise and to validate code have always been and remain of critical importance, with plenty of remaining challenges; *The way in which individuals including in particular Alan Mycroft who has been able to span from hand-crafting Z80 machine code up to the most abstruse high level code analysis techniques are needed, and *Why it is important to teach full-stack thinking to the next generation.

link

Decision Making Under Uncertainty

2024-06-13

General Bayesian Predictive Synthesis

This study investigates Bayesian ensemble learning for improving the quality of decision-making.We consider a decision-maker who selects an action from a set of candidates based on a policy trained using observations. 0.827In our setting, we assume the existence of experts who provide predictive distributions based on their own policies.Our goal is to integrate these predictive distributions within the Bayesian framework.Our proposed method, which we refer to as General Bayesian Predictive Synthesis (GBPS), is characterized by a loss minimization framework and does not rely on parameter estimation, unlike existing studies.Inspired by Bayesian predictive synthesis and general Bayes frameworks, we evaluate the performance of our proposed method through simulation studies.

link

Reinforcement Learning

2024-06-13

State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era

Effectively learning from sequential data is a longstanding goal of Artificial Intelligence, especially in the case of long sequences. 0.828From the dawn of Machine Learning, several researchers engaged in the search of algorithms and architectures capable of processing sequences of patterns, retaining information about the past inputs while still leveraging the upcoming data, without losing precious long-term dependencies and correlations.While such an ultimate goal is inspired by the human hallmark of continuous real-time processing of sensory information, several solutions simplified the learning paradigm by artificially limiting the processed context or dealing with sequences of limited length, given in advance.These solutions were further emphasized by the large ubiquity of Transformers, that have initially shaded the role of Recurrent Neural Nets.However, recurrent networks are facing a strong recent revival due to the growing popularity of (deep) State-Space models and novel instances of large-context Transformers, which are both based on recurrent computations to go beyond several limits of currently ubiquitous technologies.In fact, the fast development of Large Language Models enhanced the interest in efficient solutions to process data over time.This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing.A complete taxonomy over the latest trends in architectural and algorithmic solutions is reported and discussed, guiding researchers in this appealing research field.The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time, towards a more realistic scenario where patterns are effectively processed online, leveraging local-forward computations, opening to further research on this topic.

link

2024-06-13

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. 0.849Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work.In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work.Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks.Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass the performance of the current purported SOTA.Strikingly, our baselines often substantially outperform these more sophisticated algorithms.Finally, we correct for the shortcomings highlighted from this prior work by introducing a straightforward standardised methodology for evaluation and by providing our baseline implementations with statistically robust results across several scenarios, useful for comparisons in future work.Our proposal includes simple and sensible steps that are easy to adopt, which in combination with solid baselines and comparative results, could substantially improve the overall rigour of empirical science in offline MARL moving forward.

link

2024-06-13

DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning

Offline reinforcement learning (RL) can learn optimal policies from pre-collected offline datasets without interacting with the environment, but the sampled actions of the agent cannot often cover the action distribution under a given state, resulting in the extrapolation error issue. 0.842Recent works address this issue by employing generative adversarial networks (GANs).However, these methods often suffer from insufficient constraints on policy exploration and inaccurate representation of behavior policies.Moreover, the generator in GANs fails in fooling the discriminator while maximizing the expected returns of a policy.Inspired by the diffusion, a generative model with powerful feature expressiveness, we propose a new offline RL method named Diffusion Policies with Generative Adversarial Networks (DiffPoGAN).In this approach, the diffusion serves as the policy generator to generate diverse distributions of actions, and a regularization method based on maximum likelihood estimation (MLE) is developed to generate data that approximate the distribution of behavior policies.Besides, we introduce an additional regularization term based on the discriminator output to effectively constrain policy exploration for policy improvement.Comprehensive experiments are conducted on the datasets for deep data-driven reinforcement learning (D4RL), and experimental results show that DiffPoGAN outperforms state-of-the-art methods in offline RL.

link

Trajectory Optimization

2024-06-13

Applying Multi-Agent Negotiation to Solve the Production Routing Problem With Privacy Preserving

This paper presents a novel approach to address the Production Routing Problem with Privacy Preserving (PRPPP) in supply chain optimization.The integrated optimization of production, inventory, distribution, and routing decisions in real-world industry applications poses several challenges, including increased complexity, discrepancies between planning and execution, and constraints on information sharing.To mitigate these challenges, this paper proposes the use of intelligent agent negotiation within a hybrid Multi-Agent System (MAS) integrated with optimization algorithms.The MAS facilitates communication and coordination among entities, encapsulates private information, and enables negotiation.This, along with optimization algorithms, makes it a compelling framework for establishing optimal solutions. 0.823The approach is supported by real-world applications and synergies between MAS and optimization methods, demonstrating its effectiveness in addressing complex supply chain optimization problems.

link

Active Inference

2024-06-13

General Bayesian Predictive Synthesis

This study investigates Bayesian ensemble learning for improving the quality of decision-making.We consider a decision-maker who selects an action from a set of candidates based on a policy trained using observations. 0.825In our setting, we assume the existence of experts who provide predictive distributions based on their own policies.Our goal is to integrate these predictive distributions within the Bayesian framework.Our proposed method, which we refer to as General Bayesian Predictive Synthesis (GBPS), is characterized by a loss minimization framework and does not rely on parameter estimation, unlike existing studies.Inspired by Bayesian predictive synthesis and general Bayes frameworks, we evaluate the performance of our proposed method through simulation studies.

link

2024-06-13

Characterising Interventions in Causal Games

Causal games are probabilistic graphical models that enable causal queries to be answered in multi-agent settings. 0.831They extend causal Bayesian networks by specifying decision and utility variables to represent the agents' degrees of freedom and objectives.In multi-agent settings, whether each agent decides on their policy before or after knowing the causal intervention is important as this affects whether they can respond to the intervention by adapting their policy.Consequently, previous work in causal games imposed chronological constraints on permissible interventions.We relax this by outlining a sound and complete set of primitive causal interventions so the effect of any arbitrarily complex interventional query can be studied in multi-agent settings.We also demonstrate applications to the design of safe AI systems by considering causal mechanism design and commitment.

link

2024-06-13

Active Inference Meeting Energy-Efficient Control of Parallel and Identical Machines

We investigate the application of active inference in developing energy-efficient control agents for manufacturing systems. 0.831Active inference, rooted in neuroscience, provides a unified probabilistic framework integrating perception, learning, and action, with inherent uncertainty quantification elements. 0.919Our study explores deep active inference, an emerging field that combines deep learning with the active inference decision-making framework. 0.866Leveraging a deep active inference agent, we focus on controlling parallel and identical machine workstations to enhance energy efficiency.We address challenges posed by the problem's stochastic nature and delayed policy response by introducing tailored enhancements to existing agent architectures.Specifically, we introduce multi-step transition and hybrid horizon methods to mitigate the need for complex planning.Our experimental results demonstrate the effectiveness of these enhancements and highlight the potential of the active inference-based approach. 0.823

link