48.8566° N, 2.3522° E

OPEN-SOURCE AI LAB

// COMPLEXITY
MACHINE LEARNING

Building efficient transformer architectures
with Mu-Guided Dynamics and Token-Routed MLP

GitHub Paper

SCROLL

// PROJECTS

What We're Building

Complexity-Deep

Active

Token-Routed MLP with Mu-Guided Dynamics. Deterministic expert routing + PID-inspired control for efficient transformers.

PyTorchTritonMoELLM

GitHub PyPI

vllm-i64

Active

Integer-first token-routed inference engine. Paged KV cache with LRU eviction, continuous batching, and OpenAI-compatible API.

Inferencei64KV-CacheToken-Routing

GitHub Try Live

Pacific-Prime

Available

1.5B parameter language model trained with Complexity-Deep architecture. Mu-guided attention and token-routed experts.

LLM1.5BF32HuggingFace

🤗HuggingFace Try Live

Complexity Framework

Stable

Base transformer architecture library. Foundation for all Complexity models with modern attention patterns.

PyTorchGQARoPEFlashAttention

GitHub PyPI

gpu-i64

Hardware

64-bit GPU architecture with native O(1) KV-Cache using CAM. 4× faster LLM inference at 75W for edge deployment.

RTLSystemVerilogKV-CacheO(1)Edge AI

GitHub

Safety Dataset

Available

10K contrastive pairs for learning harm directions. Enables Representation Engineering for jailbreak-resistant LLMs.

DatasetSafetyContrastive10K

🤗HuggingFace

// PUBLICATIONS

Research

GPU-64: A 64-bit Inference GPU with Native O(1) KV-Cache for Edge LLM Deployment

Boris Peyriguere

Zenodo • 2025

GPU-64 is a power-efficient 64-bit GPU architecture optimized for LLM inference. Using on-chip CAM (Content-Addressable Memory) for KV-Cache, it achieves O(1) lookup latency instead of O(N), resulting in 4× faster inference at 75W TDP for edge deployment.

Read Paper

DOI: 10.5281/zenodo.18364282

Layer-Native Safety Clamping: Representation Engineering for Jailbreak-Resistant LLMs

Boris Peyriguere

Zenodo • 2025

We propose Layer-Native Safety Clamping, a representation engineering approach that operates directly within the model's activation space. By learning harm directions and clamping activations, our method provides safety guarantees that cannot be bypassed through prompt manipulation.

Read Paper

DOI: 10.5281/zenodo.18359832

Complexity-Deep: Token-Routed MLP with Mu-Guided Dynamics for Efficient Transformer Architectures

Boris Peyriguere

Zenodo • 2025

We present Complexity-Deep, a novel transformer architecture that combines deterministic token-routed MLP with mu-guided dynamics for efficient and stable training.

Read Paper

DOI: 10.5281/zenodo.18293026

Cite Our Work

@software{peyriguere2026complexity,
  author       = {Peyriguere, Boris},
  title        = {Complexity-Deep: Token-Routed MLP with
                  Mu-Guided Dynamics for Efficient
                  Transformer Architectures},
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18293026},
  url          = {https://doi.org/10.5281/zenodo.18293026}
}

// ABOUT

Our Mission

Complexity-ML is dedicated to developing efficient and innovative transformer architectures. Our research focuses on making large language models more accessible through novel routing mechanisms and dynamics-inspired control systems.

Mu-Guided Dynamics

PID-inspired control mechanism that maintains context across layers through velocity and mu accumulation.

Token-Routed MLP

Deterministic expert routing based on token identity. Perfect load balance without routing collapse.

CGGR Kernels

Custom Triton kernels for contiguous group GEMM routing. 5-6x speedup over naive implementations.

// COMPLEXITYMACHINE LEARNING

What We're Building

Research

GPU-64: A 64-bit Inference GPU with Native O(1) KV-Cache for Edge LLM Deployment

Layer-Native Safety Clamping: Representation Engineering for Jailbreak-Resistant LLMs

Complexity-Deep: Token-Routed MLP with Mu-Guided Dynamics for Efficient Transformer Architectures

Cite Our Work

Our Mission

Mu-Guided Dynamics

Token-Routed MLP

CGGR Kernels

// COMPLEXITY
MACHINE LEARNING