39 results for “topic:model-parallelism”
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Making large AI models cheaper, faster and more accessible
A GPipe implementation in PyTorch
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Slicing a PyTorch Tensor Into Parallel Shards
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
A curated list of awesome projects and papers for distributed training or inference
Distributed training (multi-node) of a Transformer model
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
SC23 Deep Learning at Scale Tutorial Material
Distributed training of DNNs • C++/MPI Proxies (GPT-2, GPT-3, CosmoFlow, DLRM)
Deep Learning at Scale Training Event at NERSC
Deep learning for science school material 2025
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
Deep Learning at Scale @ SC25
Fast and easy distributed model training examples.
Performance Estimates for Transformer AI Models in Science
Adaptive Tensor Parallelism for Foundation Models
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
Official implementation of DynPartition: Automatic Optimal Pipeline Parallelism of Dynamic Neural Networks over Heterogeneous GPU Systems for Inference Tasks
Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)
Serving distributed deep learning models with model parallel swapping.
A decentralized and distributed framework for training DNNs
pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism
Development of Project HPGO | Hybrid Parallelism Global Orchestration
Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI, this module automatically parallelizes neural network training.
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
distributed tensorflow (model parallelism) example repository