// PROJECTS
What We're Building
1.5B parameter language model trained with Complexity-Deep architecture. Mu-guided attention and token-routed experts.
64-bit GPU architecture with native O(1) KV-Cache using CAM. 4× faster LLM inference at 75W for edge deployment.
10K contrastive pairs for learning harm directions. Enables Representation Engineering for jailbreak-resistant LLMs.
// PUBLICATIONS
Research
GPU-64: A 64-bit Inference GPU with Native O(1) KV-Cache for Edge LLM Deployment
Boris Peyriguere
Zenodo • 2025
GPU-64 is a power-efficient 64-bit GPU architecture optimized for LLM inference. Using on-chip CAM (Content-Addressable Memory) for KV-Cache, it achieves O(1) lookup latency instead of O(N), resulting in 4× faster inference at 75W TDP for edge deployment.
Layer-Native Safety Clamping: Representation Engineering for Jailbreak-Resistant LLMs
Boris Peyriguere
Zenodo • 2025
We propose Layer-Native Safety Clamping, a representation engineering approach that operates directly within the model's activation space. By learning harm directions and clamping activations, our method provides safety guarantees that cannot be bypassed through prompt manipulation.
Complexity-Deep: Token-Routed MLP with Mu-Guided Dynamics for Efficient Transformer Architectures
Boris Peyriguere
Zenodo • 2025
We present Complexity-Deep, a novel transformer architecture that combines deterministic token-routed MLP with mu-guided dynamics for efficient and stable training.
Cite Our Work
@software{peyriguere2026complexity,
author = {Peyriguere, Boris},
title = {Complexity-Deep: Token-Routed MLP with
Mu-Guided Dynamics for Efficient
Transformer Architectures},
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.18293026},
url = {https://doi.org/10.5281/zenodo.18293026}
}// ABOUT
Our Mission
Complexity-ML is dedicated to developing efficient and innovative transformer architectures. Our research focuses on making large language models more accessible through novel routing mechanisms and dynamics-inspired control systems.
Mu-Guided Dynamics
PID-inspired control mechanism that maintains context across layers through velocity and mu accumulation.
Token-Routed MLP
Deterministic expert routing based on token identity. Perfect load balance without routing collapse.
CGGR Kernels
Custom Triton kernels for contiguous group GEMM routing. 5-6x speedup over naive implementations.