MA2E: Addressing Partial Observability in Multi-Agent Reinforcement Learning with Masked Auto-Encoder, MADiff: Offline Multi-agent Learning with Diffusion Models - 강인욱 발표 > Seminar

MA2E: Addressing Partial Observability in Multi-Agent Reinforcement Learning with Masked Auto-Encoder, MADiff: Offline Multi-agent Learning with Diffusion Models - 강인욱 발표

페이지 정보

작성자 최고관리자 댓글 조회 작성일 25-08-11 14:36

본문

1. MA2E: Addressing Partial Observability in Multi-Agent Reinforcement Learning with Masked Auto-Encoder

Centralized Training and Decentralized Execution (CTDE) is a widely adopted paradigm to solve cooperative multi-agent reinforcement learning (MARL) problems. Despite the successes achieved with CTDE, partial observability still limits cooperation among agents. While previous studies have attempted to overcome this challenge through communication, direct information exchanges could be restricted and introduce additional constraints. Alternatively, if an agent can infer the global information solely from local observations, it can obtain a global view without the need for communication. To this end, we propose the Multi-Agent Masked Auto-Encoder (MA2E), which utilizes the masked auto-encoder architecture to infer the information of other agents from partial observations. By employing masking to learn to reconstruct global information, MA2E serves as an inference module for individual agents within the CTDE framework. MA2E can be easily integrated into existing MARL algorithms and has been experimentally proven to be effective across a wide range of environments and algorithms.

2. MADiff: Offline Multi-agent Learning with Diffusion Models

Offline reinforcement learning (RL) aims to learn policies from pre-existing datasets without further interactions, making it a challenging task. Q-learning algorithms struggle with extrapolation errors in offline settings, while supervised learning methods are constrained by model expressiveness. Recently, diffusion models (DMs) have shown promise in overcoming these limitations in single-agent learning, but their application in multi-agent scenarios remains unclear. Generating trajectories for each agent with independent DMs may impede coordination, while concatenating all agents' information can lead to low sample efficiency. Accordingly, we propose MADiff, which is realized with an attention-based diffusion model to model the complex coordination among behaviors of multiple agents. To our knowledge, MADiff is the first diffusion-based multi-agent learning framework, functioning as both a decentralized policy and a centralized controller. During decentralized executions, MADiff simultaneously performs teammate modeling, and the centralized controller can also be applied in multi-agent trajectory predictions. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks, highlighting its effectiveness in modeling complex multi-agent interactions

첨부파일

세미나7.2.pptx (7.7M) 0회 다운로드 | DATE : 2025-08-11 14:36:08

댓글목록

등록된 댓글이 없습니다.

Boards

Seminar

페이지 정보

본문

첨부파일

댓글목록