DiLoCo vs DeMo
As models scale to billions of parameters, communication overhead becomes the key barrier to scaling distributed training systems. This post explores two groundbreaking solutions, DiLoCo and DeMo, that challenge the need for full synchronization and drastically reduce bandwidth demands by up to 1000x. We also share early results from experiments combining both techniques.
Decentralised Distributed Training On Bittensor
As centralized AI becomes increasingly powerful and opaque, we propose a decentralized alternative: a Bittensor subnet that incentivizes compute, bandwidth, and latency to collaboratively train large models. This post dives into the technical architecture, stability challenges, and the pragmatic trade-offs we make as we work toward trustless, scalable training.