Course schedule

Date Topic Detail Presentor
1/13 Introduction slides Dr. Yang Wang
1/15 Framework TensorFlow: A System for Large-Scale Machine Learning (OSDI 16) Shuzhan Yang
1/15 Framework PyTorch: An Imperative Style, High-Performance Deep Learning Library (NIPS 19) Oliver Proudfoot
1/20 Framework Ray: A Distributed Framework for Emerging AI Applications (OSDI 18) Jintong Liu
1/20 Parallelism Scaling Distributed Machine Learning with the Parameter Server (OSDI 14) ?
1/22 Parallelism Horovod: Fast and Easy Distributed Deep Learning in TensorFlow Kailun Lin
1/22 Parallelism Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Andy Wu
1/27 Parallelism PipeDream: generalized pipeline parallelism for DNN training (SOSP 19) ?
1/27 Parallelism Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (OSDI 22) ?
1/29 Transformer Overview of the Transformer model Dr. Andrew Perrault
2/3 Memory Training Deep Nets with Sublinear Memory Cost ?
2/3 Memory ZeRO: Memory Optimizations Toward Training Trillion Parameter Models (SC 20) ?
2/5 Memory ZeRO-Offload: Democratizing Billion-Scale Model Training (USENIX ATC 21) ?
2/5 Memory FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (NIPS 21) ?
2/10 Compiler TVM: An Automated End-to-End Optimizing Compiler for Deep Learning (OSDI 18) ?
2/10 Compiler Triton: An Intermediate Language and Compiler for Tiled Neural Networks (MAPL 19) ?
2/12 Compiler TASO: optimizing deep learning computation with automatic generation of graph substitutions (SOSP 19) ?
2/12 Compiler TensorIR: An Abstraction for Automatic Tensorized Program Optimization (ASPLOS 23) ?
2/17 Checkpoint Check-N-Run: A Checkpointing System for Training Deep Learning Recommendation Models (NSDI 22) ?
2/17 Checkpoint GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints (SOSP 23) ?
2/19 Fault Tolerance Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates (SOSP 23) ?
2/19 Fault Tolerance ReCycle: Resilient Training of Large DNNs using Pipeline Adaptation (SOSP 24) ?
2/24 Model Search EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (ICML 19) ?
2/24 Model Search Once for All: Train One Network and Specialize it for Efficient Deployment (ICLR 20) ?
2/26 Quantization LLM.int8(): 8-bit matrix multiplication for transformers at scale (NIPS 22) ?
2/26 Quantization GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers (ICLR 23) ?
3/3 Cluster Gandiva: Introspective Cluster Scheduling for Deep Learning (OSDI 18) ?
3/3 Cluster Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads (USENIX ATC 19) ?
3/5 Cluster Tiresias: A GPU Cluster Manager for Distributed Deep Learning (NSDI 19) ?
3/5 Cluster Themis: Fair and Efficient GPU Cluster Scheduling (NSDI 20) ?
3/10 Cluster AntMan: Dynamic Scaling on GPU Clusters for Deep Learning (OSDI 20) ?
3/10 Cluster Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning (OSDI 21) ?
3/12 Cluster MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters (NSDI 22) ?
3/12 Cluster MAST: global scheduling of ML training across geo-distributed datacenters at hyperscale (OSDI 24) ?
3/17 Sprint break
3/19 Sprint break
3/24 Inference TensorFlow-Serving: Flexible, High-Performance ML Serving ?
3/24 Inference Serving DNNs like Clockwork: Performance Predictability from the Bottom Up (OSDI 20) ?
3/26 Inference DeepSpeed-inference: enabling efficient inference of transformer models at unprecedented scale (SC 22) ?
3/26 Inference DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale (ICML 22) ?
3/31 Inference Orca: A Distributed Serving System for Transformer (OSDI 22) ?
3/31 Inference Efficient Memory Management for LLM Serving with PagedAttention (SOSP 23) ?
4/2 Inference AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23) ?
4/2 Inference FlexGen: high-throughput generative inference of large language models with a single GPU (ICML 23) ?
4/7 Inference DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving (OSDI 24) ?
4/7 TBD TBD ?
4/9 TBD TBD ?
4/14 TBD TBD ?
4/16 Project Presentation ? ?
4/21 Project Presentation ? ?
4/23 Project Presentation ? ?