Pipeline Parallelism¶
In this section, we will show how to scale the training of Bert model with EPL pipeline parallelism.
Training setup.¶
The model code is based on https://github.com/google-research/bert .
Get pretrained bert base model.¶
wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
unzip uncased_L-12_H-768_A-12.zip
Prepare dataset¶
mkdir data
cd data
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
wget https://raw.githubusercontent.com/allenai/bi-att-flow/master/squad/evaluate-v1.1.py
Distributed Bert training¶
Pipeline parallelism¶
To implement Bert pipeline parallelism, EPL only needs to change the annotation and configuration, as follows:
+ import epl
+ epl.init(epl.Config({"pipeline.num_micro_batch": 4}))
# model annotation
+ epl.set_default_strategy(epl.replicate(1))
model_stage0()
+ epl.set_default_strategy(epl.replicate(1))
model_stage1()
You can refer to EPL Bert Example for detailed implementation.
The following command launches a pipeline parallelism program with two stages.
epl-launch --num_workers 1 --gpu_per_worker 2 scripts/train_bert_base_dp.sh
Evaluation¶
After training, you can perform the following commands to get the evaluation results.
SQUAD_DIR=data
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ${output_dir}/predictions.json
You are expected to get f1 ~= 88.0, exact_match ~= 79.8 after 2 epochs.