48.8566° N, 2.3522° E

OPEN-SOURCE AI LAB

// COMPLEXITY
MACHINE LEARNING

Building efficient transformer architectures
with Mu-Guided Dynamics and Token-Routed MLP

SCROLL

// PROJECTS

What We're Building

Complexity-Deep
Active

Token-Routed MLP with Mu-Guided Dynamics. Deterministic expert routing + PID-inspired control for efficient transformers.

PyTorchTritonMoELLM
vllm-i64
Active

Integer-first token-routed inference engine. Paged KV cache with LRU eviction, continuous batching, and OpenAI-compatible API.

Inferencei64KV-CacheToken-Routing
Pacific-Prime
Available

1.5B parameter language model trained with Complexity-Deep architecture. Mu-guided attention and token-routed experts.

LLM1.5BF32HuggingFace
Complexity Framework
Stable

Base transformer architecture library. Foundation for all Complexity models with modern attention patterns.

PyTorchGQARoPEFlashAttention
gpu-i64
Hardware

64-bit GPU architecture with native O(1) KV-Cache using CAM. 4× faster LLM inference at 75W for edge deployment.

RTLSystemVerilogKV-CacheO(1)Edge AI
Safety Dataset
Available

10K contrastive pairs for learning harm directions. Enables Representation Engineering for jailbreak-resistant LLMs.

DatasetSafetyContrastive10K

// PUBLICATIONS

Research

GPU-64: A 64-bit Inference GPU with Native O(1) KV-Cache for Edge LLM Deployment

Boris Peyriguere

Zenodo • 2025

GPU-64 is a power-efficient 64-bit GPU architecture optimized for LLM inference. Using on-chip CAM (Content-Addressable Memory) for KV-Cache, it achieves O(1) lookup latency instead of O(N), resulting in 4× faster inference at 75W TDP for edge deployment.

Read Paper
DOI: 10.5281/zenodo.18364282

Layer-Native Safety Clamping: Representation Engineering for Jailbreak-Resistant LLMs

Boris Peyriguere

Zenodo • 2025

We propose Layer-Native Safety Clamping, a representation engineering approach that operates directly within the model's activation space. By learning harm directions and clamping activations, our method provides safety guarantees that cannot be bypassed through prompt manipulation.

Read Paper
DOI: 10.5281/zenodo.18359832

Complexity-Deep: Token-Routed MLP with Mu-Guided Dynamics for Efficient Transformer Architectures

Boris Peyriguere

Zenodo • 2025

We present Complexity-Deep, a novel transformer architecture that combines deterministic token-routed MLP with mu-guided dynamics for efficient and stable training.

Read Paper
DOI: 10.5281/zenodo.18293026

Cite Our Work

@software{peyriguere2026complexity,
  author       = {Peyriguere, Boris},
  title        = {Complexity-Deep: Token-Routed MLP with
                  Mu-Guided Dynamics for Efficient
                  Transformer Architectures},
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18293026},
  url          = {https://doi.org/10.5281/zenodo.18293026}
}

// ABOUT

Our Mission

Complexity-ML is dedicated to developing efficient and innovative transformer architectures. Our research focuses on making large language models more accessible through novel routing mechanisms and dynamics-inspired control systems.

μ

Mu-Guided Dynamics

PID-inspired control mechanism that maintains context across layers through velocity and mu accumulation.

Token-Routed MLP

Deterministic expert routing based on token identity. Perfect load balance without routing collapse.

CGGR Kernels

Custom Triton kernels for contiguous group GEMM routing. 5-6x speedup over naive implementations.