0: W1122 23:27:22.859000 734626 torch/distributed/run.py:792] 0: W1122 23:27:22.859000 734626 torch/distributed/run.py:792] ***************************************** 0: W1122 23:27:22.859000 734626 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 0: W1122 23:27:22.859000 734626 torch/distributed/run.py:792] ***************************************** 2: W1122 23:27:22.879000 230910 torch/distributed/run.py:792] 2: W1122 23:27:22.879000 230910 torch/distributed/run.py:792] ***************************************** 2: W1122 23:27:22.879000 230910 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 2: W1122 23:27:22.879000 230910 torch/distributed/run.py:792] ***************************************** 1: W1122 23:27:22.896000 826392 torch/distributed/run.py:792] 1: W1122 23:27:22.896000 826392 torch/distributed/run.py:792] ***************************************** 1: W1122 23:27:22.896000 826392 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 1: W1122 23:27:22.896000 826392 torch/distributed/run.py:792] ***************************************** 3: W1122 23:27:22.930000 127766 torch/distributed/run.py:792] 3: W1122 23:27:22.930000 127766 torch/distributed/run.py:792] ***************************************** 3: W1122 23:27:22.930000 127766 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 3: W1122 23:27:22.930000 127766 torch/distributed/run.py:792] ***************************************** 2: [2025-11-22 23:27:42,843] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:119] [PID:230985] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing` 2: [2025-11-22 23:27:42,843] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:218] [PID:230985] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing 1: [2025-11-22 23:27:42,890] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:119] [PID:826467] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing` 1: [2025-11-22 23:27:42,890] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:218] [PID:826467] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing 0: [2025-11-22 23:27:42,900] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:119] [PID:734703] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing` 0: [2025-11-22 23:27:42,900] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:218] [PID:734703] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing 3: [2025-11-22 23:27:43,046] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:119] [PID:127841] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing` 3: [2025-11-22 23:27:43,046] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:218] [PID:127841] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing 0: [2025-11-22 23:27:46,176] [WARNING] [axolotl.utils.config.normalize_config:139] [PID:734703] [RANK:0] Invalid value for save_steps (1.6666666666666667) from saves_per_epoch and/or num_epochs. Saving at training end only. 0: [2025-11-22 23:27:46,188] [INFO] [axolotl.cli.config.load_cfg:245] [PID:734703] [RANK:0] config: 0: { 0: "activation_offloading": false, 0: "auto_resume_from_checkpoints": true, 0: "axolotl_config_path": "/lustre/fswork/projects/rech/dgo/udv55np/train/tmp/1763850438391954992.yaml", 0: "base_model": "/lustre/fswork/projects/rech/qwv/udv55np/Gemma/base/gemma-3-12b", 0: "base_model_config": "/lustre/fswork/projects/rech/qwv/udv55np/Gemma/base/gemma-3-12b", 0: "batch_size": 16, 0: "bf16": true, 0: "capabilities": { 0: "bf16": true, 0: "compute_capability": "sm_90", 0: "fp8": false, 0: "n_gpu": 16, 0: "n_node": 1 0: }, 0: "chat_template": "gemma3", 0: "context_parallel_size": 1, 0: "curriculum_sampling": true, 0: "dataloader_num_workers": 2, 0: "dataset_prepared_path": "/lustre/fswork/projects/rech/dgo/udv55np/dataset_gemma/Nemotron-Super-49B-v1_5/split_1", 0: "dataset_processes": 32, 0: "datasets": [ 0: { 0: "chat_template": "tokenizer_default", 0: "data_files": [ 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0007.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0009.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0005.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0006.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0014.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0010.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0012.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0008.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0001.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0002.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0013.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0015.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0004.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0011.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0000.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0003.jsonl" 0: ], 0: "ds_type": "json", 0: "field_messages": "conversations", 0: "message_property_mappings": { 0: "content": "content", 0: "role": "role" 0: }, 0: "path": "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking", 0: "trust_remote_code": false, 0: "type": "chat_template" 0: } 0: ], 0: "ddp": true, 0: "deepspeed": { 0: "bf16": { 0: "enabled": true 0: }, 0: "gradient_accumulation_steps": "auto", 0: "gradient_clipping": "auto", 0: "train_batch_size": "auto", 0: "train_micro_batch_size_per_gpu": "auto", 0: "wall_clock_breakdown": false, 0: "zero_optimization": { 0: "contiguous_gradients": true, 0: "overlap_comm": true, 0: "reduce_bucket_size": "auto", 0: "stage": 3, 0: "stage3_gather_16bit_weights_on_model_save": true, 0: "stage3_param_persistence_threshold": "auto", 0: "stage3_prefetch_bucket_size": "auto", 0: "sub_group_size": 0 0: } 0: }, 0: "device": "cuda:0", 0: "device_map": { 0: "": 0 0: }, 0: "dion_rank_fraction": 1.0, 0: "dion_rank_multiple_of": 1, 0: "env_capabilities": { 0: "torch_version": "2.6.0" 0: }, 0: "eot_tokens": [ 0: "" 0: ], 0: "eval_batch_size": 1, 0: "eval_causal_lm_metrics": [ 0: "sacrebleu", 0: "comet", 0: "ter", 0: "chrf" 0: ], 0: "eval_max_new_tokens": 128, 0: "eval_sample_packing": true, 0: "eval_table_size": 0, 0: "evals_per_epoch": 0, 0: "flash_attention": true, 0: "fp16": false, 0: "gradient_accumulation_steps": 1, 0: "gradient_checkpointing": true, 0: "gradient_checkpointing_kwargs": { 0: "use_reentrant": true 0: }, 0: "is_multimodal": true, 0: "learning_rate": 2e-06, 0: "lisa_layers_attribute": "model.layers", 0: "load_best_model_at_end": false, 0: "load_in_4bit": false, 0: "load_in_8bit": false, 0: "local_rank": 0, 0: "logging_steps": 10, 0: "lora_dropout": 0.0, 0: "loraplus_lr_embedding": 1e-06, 0: "lr_scheduler": "warmup_stable_decay", 0: "lr_scheduler_kwargs": { 0: "min_lr_ratio": 0.1, 0: "num_decay_steps": 200 0: }, 0: "max_prompt_len": 512, 0: "mean_resizing_embeddings": false, 0: "micro_batch_size": 1, 0: "model_config_type": "gemma3", 0: "num_epochs": 0.6, 0: "optimizer": "adamw_torch_fused", 0: "output_dir": "/lustre/fswork/projects/rech/dgo/udv55np/ift/Nemotron-Super-49B-v1_5/gemma-3-12b/1", 0: "pad_to_sequence_len": true, 0: "pretrain_multipack_attn": true, 0: "pretrain_multipack_buffer_size": 10000, 0: "processor_config": "/lustre/fswork/projects/rech/qwv/udv55np/Gemma/base/gemma-3-12b", 0: "profiler_steps_start": 0, 0: "qlora_sharded_model_loading": false, 0: "ray_num_workers": 1, 0: "resources_per_worker": { 0: "GPU": 1 0: }, 0: "sample_packing": true, 0: "sample_packing_bin_size": 200, 0: "sample_packing_group_size": 100000, 0: "sample_packing_sequentially": true, 0: "save_only_model": true, 0: "save_safetensors": true, 0: "save_total_limit": 20, 0: "saves_per_epoch": 1, 0: "sequence_len": 16384, 0: "shuffle_before_merging_datasets": true, 0: "shuffle_merged_datasets": false, 0: "skip_prepare_dataset": false, 0: "strict": false, 0: "tensor_parallel_size": 1, 0: "tf32": false, 0: "tiled_mlp_use_original_mlp": true, 0: "tokenizer_config": "/lustre/fswork/projects/rech/qwv/udv55np/Gemma/base/gemma-3-27b", 0: "torch_dtype": "torch.bfloat16", 0: "train_on_inputs": false, 0: "trl": { 0: "log_completions": false, 0: "mask_truncated_completions": false, 0: "ref_model_mixup_alpha": 0.9, 0: "ref_model_sync_steps": 64, 0: "scale_rewards": true, 0: "sync_ref_model": false, 0: "use_vllm": false, 0: "vllm_server_host": "0.0.0.0", 0: "vllm_server_port": 8000 0: }, 0: "use_ray": false, 0: "use_tensorboard": true, 0: "val_set_size": 0.0, 0: "vllm": { 0: "device": "auto", 0: "dtype": "auto", 0: "gpu_memory_utilization": 0.9, 0: "host": "0.0.0.0", 0: "port": 8000 0: }, 0: "warmup_steps": 100, 0: "weight_decay": 0.0, 0: "world_size": 16 0: } 0: [2025-11-22 23:27:46,189] [INFO] [axolotl.cli.checks.check_user_token:35] [PID:734703] [RANK:0] Skipping HuggingFace token verification because HF_HUB_OFFLINE is set to True. Only local files will be used. 0: [2025-11-22 23:27:47,405] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:472] [PID:734703] [RANK:0] Loading prepared dataset from disk at /lustre/fswork/projects/rech/dgo/udv55np/dataset_gemma/Nemotron-Super-49B-v1_5/split_1/9c9c4c535c90616e08bd1b7f6d00d00a... 0: jzxh118:734703:734703 [0] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.213<0> 0: jzxh118:734703:734703 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 0: jzxh118:734703:734703 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 0: jzxh118:734703:734703 [0] NCCL INFO NET/Plugin: Using internal network plugin. 0: jzxh118:734703:734703 [0] NCCL INFO cudaDriverVersion 12080 0: NCCL version 2.21.5+cuda12.4 0: jzxh118:734703:734703 [0] NCCL INFO Comm config Blocking set to 1 0: jzxh118:734705:734705 [2] NCCL INFO cudaDriverVersion 12080 0: jzxh118:734705:734705 [2] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.213<0> 0: jzxh118:734705:734705 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 0: jzxh118:734705:734705 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 0: jzxh118:734705:734705 [2] NCCL INFO NET/Plugin: Using internal network plugin. 0: jzxh118:734705:734705 [2] NCCL INFO Comm config Blocking set to 1 0: jzxh118:734706:734706 [3] NCCL INFO cudaDriverVersion 12080 0: jzxh118:734704:734704 [1] NCCL INFO cudaDriverVersion 12080 1: jzxh119:826467:826467 [0] NCCL INFO cudaDriverVersion 12080 0: jzxh118:734706:734706 [3] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.213<0> 0: jzxh118:734704:734704 [1] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.213<0> 0: jzxh118:734704:734704 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 0: jzxh118:734706:734706 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 0: jzxh118:734704:734704 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 0: jzxh118:734704:734704 [1] NCCL INFO NET/Plugin: Using internal network plugin. 0: jzxh118:734706:734706 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 0: jzxh118:734706:734706 [3] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh119:826467:826467 [0] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.217<0> 1: jzxh119:826467:826467 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 1: jzxh119:826467:826467 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 1: jzxh119:826467:826467 [0] NCCL INFO NET/Plugin: Using internal network plugin. 0: jzxh118:734704:734704 [1] NCCL INFO Comm config Blocking set to 1 0: jzxh118:734706:734706 [3] NCCL INFO Comm config Blocking set to 1 2: jzxh120:230988:230988 [3] NCCL INFO cudaDriverVersion 12080 3: jzxh121:127843:127843 [2] NCCL INFO cudaDriverVersion 12080 2: jzxh120:230988:230988 [3] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.221<0> 1: jzxh119:826467:826467 [0] NCCL INFO Comm config Blocking set to 1 3: jzxh121:127843:127843 [2] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.225<0> 2: jzxh120:230988:230988 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 2: jzxh120:230988:230988 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 2: jzxh120:230988:230988 [3] NCCL INFO NET/Plugin: Using internal network plugin. 3: jzxh121:127843:127843 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh121:127843:127843 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 3: jzxh121:127843:127843 [2] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh119:826469:826469 [2] NCCL INFO cudaDriverVersion 12080 2: jzxh120:230988:230988 [3] NCCL INFO Comm config Blocking set to 1 3: jzxh121:127843:127843 [2] NCCL INFO Comm config Blocking set to 1 2: jzxh120:230986:230986 [1] NCCL INFO cudaDriverVersion 12080 2: jzxh120:230986:230986 [1] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.221<0> 2: jzxh120:230986:230986 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 1: jzxh119:826469:826469 [2] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.217<0> 2: jzxh120:230986:230986 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 2: jzxh120:230986:230986 [1] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh119:826469:826469 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 1: jzxh119:826469:826469 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 1: jzxh119:826469:826469 [2] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh119:826468:826468 [1] NCCL INFO cudaDriverVersion 12080 3: jzxh121:127841:127841 [0] NCCL INFO cudaDriverVersion 12080 2: jzxh120:230985:230985 [0] NCCL INFO cudaDriverVersion 12080 1: jzxh119:826470:826470 [3] NCCL INFO cudaDriverVersion 12080 2: jzxh120:230986:230986 [1] NCCL INFO Comm config Blocking set to 1 3: jzxh121:127841:127841 [0] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.225<0> 1: jzxh119:826469:826469 [2] NCCL INFO Comm config Blocking set to 1 3: jzxh121:127841:127841 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh121:127841:127841 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 3: jzxh121:127841:127841 [0] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh119:826468:826468 [1] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.217<0> 1: jzxh119:826470:826470 [3] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.217<0> 2: jzxh120:230987:230987 [2] NCCL INFO cudaDriverVersion 12080 2: jzxh120:230985:230985 [0] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.221<0> 1: jzxh119:826470:826470 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 1: jzxh119:826468:826468 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 1: jzxh119:826470:826470 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 1: jzxh119:826468:826468 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 1: jzxh119:826470:826470 [3] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh119:826468:826468 [1] NCCL INFO NET/Plugin: Using internal network plugin. 2: jzxh120:230985:230985 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 2: jzxh120:230985:230985 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 2: jzxh120:230985:230985 [0] NCCL INFO NET/Plugin: Using internal network plugin. 3: jzxh121:127844:127844 [3] NCCL INFO cudaDriverVersion 12080 2: jzxh120:230987:230987 [2] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.221<0> 2: jzxh120:230987:230987 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh121:127842:127842 [1] NCCL INFO cudaDriverVersion 12080 3: jzxh121:127841:127841 [0] NCCL INFO Comm config Blocking set to 1 2: jzxh120:230987:230987 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 2: jzxh120:230987:230987 [2] NCCL INFO NET/Plugin: Using internal network plugin. 2: jzxh120:230985:230985 [0] NCCL INFO Comm config Blocking set to 1 1: jzxh119:826470:826470 [3] NCCL INFO Comm config Blocking set to 1 1: jzxh119:826468:826468 [1] NCCL INFO Comm config Blocking set to 1 2: jzxh120:230987:230987 [2] NCCL INFO Comm config Blocking set to 1 3: jzxh121:127844:127844 [3] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.225<0> 3: jzxh121:127842:127842 [1] NCCL INFO Bootstrap : Using ibp24s0:10.100.5.225<0> 3: jzxh121:127842:127842 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh121:127844:127844 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh121:127842:127842 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 3: jzxh121:127844:127844 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 3: jzxh121:127842:127842 [1] NCCL INFO NET/Plugin: Using internal network plugin. 3: jzxh121:127844:127844 [3] NCCL INFO NET/Plugin: Using internal network plugin. 3: jzxh121:127842:127842 [1] NCCL INFO Comm config Blocking set to 1 3: jzxh121:127844:127844 [3] NCCL INFO Comm config Blocking set to 1 0: jzxh118:734703:735288 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.213<0> 0: jzxh118:734703:735288 [0] NCCL INFO Using non-device net plugin version 0 0: jzxh118:734703:735288 [0] NCCL INFO Using network IB 3: jzxh121:127843:128424 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.225<0> 3: jzxh121:127843:128424 [2] NCCL INFO Using non-device net plugin version 0 3: jzxh121:127843:128424 [2] NCCL INFO Using network IB 3: jzxh121:127841:128425 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.225<0> 3: jzxh121:127841:128425 [0] NCCL INFO Using non-device net plugin version 0 3: jzxh121:127841:128425 [0] NCCL INFO Using network IB 2: jzxh120:230985:231570 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.221<0> 2: jzxh120:230988:231568 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.221<0> 2: jzxh120:230985:231570 [0] NCCL INFO Using non-device net plugin version 0 2: jzxh120:230985:231570 [0] NCCL INFO Using network IB 2: jzxh120:230988:231568 [3] NCCL INFO Using non-device net plugin version 0 2: jzxh120:230988:231568 [3] NCCL INFO Using network IB 0: jzxh118:734705:735289 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.213<0> 0: jzxh118:734706:735291 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.213<0> 0: jzxh118:734705:735289 [2] NCCL INFO Using non-device net plugin version 0 0: jzxh118:734705:735289 [2] NCCL INFO Using network IB 0: jzxh118:734706:735291 [3] NCCL INFO Using non-device net plugin version 0 0: jzxh118:734706:735291 [3] NCCL INFO Using network IB 1: jzxh119:826467:827055 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.217<0> 1: jzxh119:826467:827055 [0] NCCL INFO Using non-device net plugin version 0 1: jzxh119:826467:827055 [0] NCCL INFO Using network IB 2: jzxh120:230986:231569 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.221<0> 2: jzxh120:230986:231569 [1] NCCL INFO Using non-device net plugin version 0 2: jzxh120:230986:231569 [1] NCCL INFO Using network IB 1: jzxh119:826469:827056 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.217<0> 1: jzxh119:826469:827056 [2] NCCL INFO Using non-device net plugin version 0 1: jzxh119:826469:827056 [2] NCCL INFO Using network IB 2: jzxh120:230987:231571 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.221<0> 2: jzxh120:230987:231571 [2] NCCL INFO Using non-device net plugin version 0 2: jzxh120:230987:231571 [2] NCCL INFO Using network IB 3: jzxh121:127844:128427 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.225<0> 3: jzxh121:127844:128427 [3] NCCL INFO Using non-device net plugin version 0 3: jzxh121:127844:128427 [3] NCCL INFO Using network IB 3: jzxh121:127842:128426 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.225<0> 3: jzxh121:127842:128426 [1] NCCL INFO Using non-device net plugin version 0 3: jzxh121:127842:128426 [1] NCCL INFO Using network IB 1: jzxh119:826468:827058 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.217<0> 1: jzxh119:826470:827057 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.217<0> 1: jzxh119:826468:827058 [1] NCCL INFO Using non-device net plugin version 0 1: jzxh119:826468:827058 [1] NCCL INFO Using network IB 1: jzxh119:826470:827057 [3] NCCL INFO Using non-device net plugin version 0 1: jzxh119:826470:827057 [3] NCCL INFO Using network IB 0: jzxh118:734704:735290 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.5.213<0> 0: jzxh118:734704:735290 [1] NCCL INFO Using non-device net plugin version 0 0: jzxh118:734704:735290 [1] NCCL INFO Using network IB 1: jzxh119:826470:827057 [3] NCCL INFO ncclCommInitRank comm 0x555937567370 rank 7 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x8004dd5fc9927b85 - Init START 1: jzxh119:826469:827056 [2] NCCL INFO ncclCommInitRank comm 0x5592e2086e30 rank 6 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x8004dd5fc9927b85 - Init START 2: jzxh120:230985:231570 [0] NCCL INFO ncclCommInitRank comm 0x562a42092400 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x8004dd5fc9927b85 - Init START 2: jzxh120:230986:231569 [1] NCCL INFO ncclCommInitRank comm 0x55da41206ba0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x8004dd5fc9927b85 - Init START 0: jzxh118:734704:735290 [1] NCCL INFO ncclCommInitRank comm 0x563195d2fc40 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x8004dd5fc9927b85 - Init START 0: jzxh118:734703:735288 [0] NCCL INFO ncclCommInitRank comm 0x5577f9213700 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x8004dd5fc9927b85 - Init START 0: jzxh118:734705:735289 [2] NCCL INFO ncclCommInitRank comm 0x5632985679d0 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x8004dd5fc9927b85 - Init START 0: jzxh118:734706:735291 [3] NCCL INFO ncclCommInitRank comm 0x56306775a5b0 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x8004dd5fc9927b85 - Init START 2: jzxh120:230988:231568 [3] NCCL INFO ncclCommInitRank comm 0x55aca8736ed0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x8004dd5fc9927b85 - Init START 1: jzxh119:826468:827058 [1] NCCL INFO ncclCommInitRank comm 0x5618ab708160 rank 5 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x8004dd5fc9927b85 - Init START 2: jzxh120:230987:231571 [2] NCCL INFO ncclCommInitRank comm 0x55f22cfb7d60 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x8004dd5fc9927b85 - Init START 1: jzxh119:826467:827055 [0] NCCL INFO ncclCommInitRank comm 0x55ecfcdf75c0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x8004dd5fc9927b85 - Init START 3: jzxh121:127841:128425 [0] NCCL INFO ncclCommInitRank comm 0x55fc4ea08020 rank 12 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x8004dd5fc9927b85 - Init START 3: jzxh121:127844:128427 [3] NCCL INFO ncclCommInitRank comm 0x5605694b8640 rank 15 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x8004dd5fc9927b85 - Init START 3: jzxh121:127842:128426 [1] NCCL INFO ncclCommInitRank comm 0x55cc89ae9730 rank 13 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x8004dd5fc9927b85 - Init START 3: jzxh121:127843:128424 [2] NCCL INFO ncclCommInitRank comm 0x55c4ba21cf70 rank 14 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x8004dd5fc9927b85 - Init START 0: jzxh118:734704:735290 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ff000000,00000000,0000ffff,ff000000 0: jzxh118:734704:735290 [1] NCCL INFO NVLS multicast support is not available on dev 1 0: jzxh118:734705:735289 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00000000,000000ff,ffff0000,00000000 0: jzxh118:734705:735289 [2] NCCL INFO NVLS multicast support is not available on dev 2 0: jzxh118:734706:735291 [3] NCCL INFO Setting affinity for GPU 3 to ffffff00,00000000,00000000,ffffff00,00000000,00000000 0: jzxh118:734706:735291 [3] NCCL INFO NVLS multicast support is not available on dev 3 0: jzxh118:734703:735288 [0] NCCL INFO Setting affinity for GPU 0 to ffffff,00000000,00000000,00ffffff 0: jzxh118:734703:735288 [0] NCCL INFO NVLS multicast support is not available on dev 0 1: jzxh119:826467:827055 [0] NCCL INFO Setting affinity for GPU 0 to ffffff,00000000,00000000,00ffffff 1: jzxh119:826467:827055 [0] NCCL INFO NVLS multicast support is not available on dev 0 3: jzxh121:127844:128427 [3] NCCL INFO Setting affinity for GPU 3 to ffffff00,00000000,00000000,ffffff00,00000000,00000000 3: jzxh121:127844:128427 [3] NCCL INFO NVLS multicast support is not available on dev 3 1: jzxh119:826469:827056 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00000000,000000ff,ffff0000,00000000 1: jzxh119:826469:827056 [2] NCCL INFO NVLS multicast support is not available on dev 2 1: jzxh119:826468:827058 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ff000000,00000000,0000ffff,ff000000 1: jzxh119:826468:827058 [1] NCCL INFO NVLS multicast support is not available on dev 1 1: jzxh119:826470:827057 [3] NCCL INFO Setting affinity for GPU 3 to ffffff00,00000000,00000000,ffffff00,00000000,00000000 1: jzxh119:826470:827057 [3] NCCL INFO NVLS multicast support is not available on dev 3 3: jzxh121:127842:128426 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ff000000,00000000,0000ffff,ff000000 3: jzxh121:127842:128426 [1] NCCL INFO NVLS multicast support is not available on dev 1 2: jzxh120:230985:231570 [0] NCCL INFO Setting affinity for GPU 0 to ffffff,00000000,00000000,00ffffff 2: jzxh120:230985:231570 [0] NCCL INFO NVLS multicast support is not available on dev 0 3: jzxh121:127841:128425 [0] NCCL INFO Setting affinity for GPU 0 to ffffff,00000000,00000000,00ffffff 3: jzxh121:127841:128425 [0] NCCL INFO NVLS multicast support is not available on dev 0 3: jzxh121:127843:128424 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00000000,000000ff,ffff0000,00000000 3: jzxh121:127843:128424 [2] NCCL INFO NVLS multicast support is not available on dev 2 2: jzxh120:230987:231571 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00000000,000000ff,ffff0000,00000000 2: jzxh120:230987:231571 [2] NCCL INFO NVLS multicast support is not available on dev 2 2: jzxh120:230988:231568 [3] NCCL INFO Setting affinity for GPU 3 to ffffff00,00000000,00000000,ffffff00,00000000,00000000 2: jzxh120:230988:231568 [3] NCCL INFO NVLS multicast support is not available on dev 3 2: jzxh120:230986:231569 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ff000000,00000000,0000ffff,ff000000 2: jzxh120:230986:231569 [1] NCCL INFO NVLS multicast support is not available on dev 1 0: jzxh118:734703:735288 [0] NCCL INFO comm 0x5577f9213700 rank 0 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 0: jzxh118:734703:735288 [0] NCCL INFO Channel 00/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh118:734703:735288 [0] NCCL INFO Channel 01/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh118:734704:735290 [1] NCCL INFO comm 0x563195d2fc40 rank 1 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 0: jzxh118:734705:735289 [2] NCCL INFO comm 0x5632985679d0 rank 2 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 0: jzxh118:734703:735288 [0] NCCL INFO Channel 02/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh118:734706:735291 [3] NCCL INFO comm 0x56306775a5b0 rank 3 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 0: jzxh118:734703:735288 [0] NCCL INFO Channel 03/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh118:734703:735288 [0] NCCL INFO Channel 04/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh118:734703:735288 [0] NCCL INFO Channel 05/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh118:734703:735288 [0] NCCL INFO Channel 06/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh118:734704:735290 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/9/-1->1->-1 [2] 3/-1/-1->1->0 [3] 0/-1/-1->1->3 [4] -1/-1/-1->1->2 [5] 3/9/-1->1->-1 [6] -1/-1/-1->1->3 [7] 0/-1/-1->1->2 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->5 [10] 3/-1/-1->1->0 [11] 0/-1/-1->1->3 [12] -1/-1/-1->1->2 [13] 3/-1/-1->1->5 [14] -1/-1/-1->1->3 [15] 0/-1/-1->1->2 0: jzxh118:734703:735288 [0] NCCL INFO Channel 07/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh118:734704:735290 [1] NCCL INFO P2P Chunksize set to 131072 0: jzxh118:734703:735288 [0] NCCL INFO Channel 08/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh118:734705:735289 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 0/10/-1->2->-1 [3] -1/-1/-1->2->0 [4] 1/-1/-1->2->3 [5] -1/-1/-1->2->0 [6] 0/10/-1->2->-1 [7] 1/-1/-1->2->3 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 0/-1/-1->2->6 [11] -1/-1/-1->2->0 [12] 1/-1/-1->2->3 [13] -1/-1/-1->2->0 [14] 0/-1/-1->2->6 [15] 1/-1/-1->2->3 0: jzxh118:734703:735288 [0] NCCL INFO Channel 09/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh118:734705:735289 [2] NCCL INFO P2P Chunksize set to 131072 0: jzxh118:734703:735288 [0] NCCL INFO Channel 10/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh118:734703:735288 [0] NCCL INFO Channel 11/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh118:734703:735288 [0] NCCL INFO Channel 12/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh118:734703:735288 [0] NCCL INFO Channel 13/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh118:734703:735288 [0] NCCL INFO Channel 14/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh118:734706:735291 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] 0/-1/-1->3->2 [2] -1/-1/-1->3->1 [3] 1/11/-1->3->-1 [4] 2/-1/-1->3->0 [5] 0/-1/-1->3->1 [6] 1/-1/-1->3->0 [7] 2/11/-1->3->-1 [8] -1/-1/-1->3->2 [9] 0/-1/-1->3->2 [10] -1/-1/-1->3->1 [11] 1/-1/-1->3->7 [12] 2/-1/-1->3->0 [13] 0/-1/-1->3->1 [14] 1/-1/-1->3->0 [15] 2/-1/-1->3->7 2: jzxh120:230985:231570 [0] NCCL INFO comm 0x562a42092400 rank 8 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 2: jzxh120:230986:231569 [1] NCCL INFO comm 0x55da41206ba0 rank 9 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 2: jzxh120:230988:231568 [3] NCCL INFO comm 0x55aca8736ed0 rank 11 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 2: jzxh120:230987:231571 [2] NCCL INFO comm 0x55f22cfb7d60 rank 10 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 3: jzxh121:127843:128424 [2] NCCL INFO comm 0x55c4ba21cf70 rank 14 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 3: jzxh121:127842:128426 [1] NCCL INFO comm 0x55cc89ae9730 rank 13 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 3: jzxh121:127844:128427 [3] NCCL INFO comm 0x5605694b8640 rank 15 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 3: jzxh121:127843:128424 [2] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 12/-1/-1->14->10 [3] -1/-1/-1->14->12 [4] 13/-1/-1->14->15 [5] -1/-1/-1->14->12 [6] 12/-1/-1->14->10 [7] 13/-1/-1->14->15 [8] 15/-1/-1->14->13 [9] 15/-1/-1->14->13 [10] 12/6/-1->14->-1 [11] -1/-1/-1->14->12 [12] 13/-1/-1->14->15 [13] -1/-1/-1->14->12 [14] 12/6/-1->14->-1 [15] 13/-1/-1->14->15 0: jzxh118:734703:735288 [0] NCCL INFO Channel 15/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh118:734706:735291 [3] NCCL INFO P2P Chunksize set to 131072 0: jzxh118:734703:735288 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] -1/-1/-1->0->3 [2] 1/-1/-1->0->2 [3] 2/-1/-1->0->1 [4] 3/8/-1->0->-1 [5] 2/-1/-1->0->3 [6] 3/-1/-1->0->2 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->4 [9] -1/-1/-1->0->3 [10] 1/-1/-1->0->2 [11] 2/-1/-1->0->1 [12] 3/-1/-1->0->4 [13] 2/-1/-1->0->3 [14] 3/-1/-1->0->2 [15] -1/-1/-1->0->1 0: jzxh118:734703:735288 [0] NCCL INFO P2P Chunksize set to 131072 2: jzxh120:230988:231568 [3] NCCL INFO Trees [0] -1/-1/-1->11->10 [1] 8/-1/-1->11->10 [2] -1/-1/-1->11->9 [3] 9/7/15->11->3 [4] 10/-1/-1->11->8 [5] 8/-1/-1->11->9 [6] 9/-1/-1->11->8 [7] 10/7/15->11->3 [8] -1/-1/-1->11->10 [9] 8/-1/-1->11->10 [10] -1/-1/-1->11->9 [11] 9/-1/-1->11->7 [12] 10/-1/-1->11->8 [13] 8/-1/-1->11->9 [14] 9/-1/-1->11->8 [15] 10/-1/-1->11->7 2: jzxh120:230988:231568 [3] NCCL INFO P2P Chunksize set to 131072 2: jzxh120:230987:231571 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->9 [2] 8/6/14->10->2 [3] -1/-1/-1->10->8 [4] 9/-1/-1->10->11 [5] -1/-1/-1->10->8 [6] 8/6/14->10->2 [7] 9/-1/-1->10->11 [8] 11/-1/-1->10->9 [9] 11/-1/-1->10->9 [10] 8/-1/-1->10->6 [11] -1/-1/-1->10->8 [12] 9/-1/-1->10->11 [13] -1/-1/-1->10->8 [14] 8/-1/-1->10->6 [15] 9/-1/-1->10->11 2: jzxh120:230987:231571 [2] NCCL INFO P2P Chunksize set to 131072 3: jzxh121:127844:128427 [3] NCCL INFO Trees [0] -1/-1/-1->15->14 [1] 12/-1/-1->15->14 [2] -1/-1/-1->15->13 [3] 13/-1/-1->15->11 [4] 14/-1/-1->15->12 [5] 12/-1/-1->15->13 [6] 13/-1/-1->15->12 [7] 14/-1/-1->15->11 [8] -1/-1/-1->15->14 [9] 12/-1/-1->15->14 [10] -1/-1/-1->15->13 [11] 13/7/-1->15->-1 [12] 14/-1/-1->15->12 [13] 12/-1/-1->15->13 [14] 13/-1/-1->15->12 [15] 14/7/-1->15->-1 3: jzxh121:127844:128427 [3] NCCL INFO P2P Chunksize set to 131072 3: jzxh121:127843:128424 [2] NCCL INFO P2P Chunksize set to 131072 3: jzxh121:127842:128426 [1] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->9 [2] 15/-1/-1->13->12 [3] 12/-1/-1->13->15 [4] -1/-1/-1->13->14 [5] 15/-1/-1->13->9 [6] -1/-1/-1->13->15 [7] 12/-1/-1->13->14 [8] 14/-1/-1->13->12 [9] 14/5/-1->13->-1 [10] 15/-1/-1->13->12 [11] 12/-1/-1->13->15 [12] -1/-1/-1->13->14 [13] 15/5/-1->13->-1 [14] -1/-1/-1->13->15 [15] 12/-1/-1->13->14 3: jzxh121:127842:128426 [1] NCCL INFO P2P Chunksize set to 131072 1: jzxh119:826469:827056 [2] NCCL INFO comm 0x5592e2086e30 rank 6 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 1: jzxh119:826468:827058 [1] NCCL INFO comm 0x5618ab708160 rank 5 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 1: jzxh119:826467:827055 [0] NCCL INFO comm 0x55ecfcdf75c0 rank 4 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 1: jzxh119:826469:827056 [2] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 4/-1/-1->6->10 [3] -1/-1/-1->6->4 [4] 5/-1/-1->6->7 [5] -1/-1/-1->6->4 [6] 4/-1/-1->6->10 [7] 5/-1/-1->6->7 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 4/10/2->6->14 [11] -1/-1/-1->6->4 [12] 5/-1/-1->6->7 [13] -1/-1/-1->6->4 [14] 4/10/2->6->14 [15] 5/-1/-1->6->7 1: jzxh119:826470:827057 [3] NCCL INFO comm 0x555937567370 rank 7 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 1: jzxh119:826469:827056 [2] NCCL INFO P2P Chunksize set to 131072 2: jzxh120:230986:231569 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] 10/5/13->9->1 [2] 11/-1/-1->9->8 [3] 8/-1/-1->9->11 [4] -1/-1/-1->9->10 [5] 11/5/13->9->1 [6] -1/-1/-1->9->11 [7] 8/-1/-1->9->10 [8] 10/-1/-1->9->8 [9] 10/-1/-1->9->5 [10] 11/-1/-1->9->8 [11] 8/-1/-1->9->11 [12] -1/-1/-1->9->10 [13] 11/-1/-1->9->5 [14] -1/-1/-1->9->11 [15] 8/-1/-1->9->10 2: jzxh120:230985:231570 [0] NCCL INFO Trees [0] 9/4/12->8->0 [1] -1/-1/-1->8->11 [2] 9/-1/-1->8->10 [3] 10/-1/-1->8->9 [4] 11/4/12->8->0 [5] 10/-1/-1->8->11 [6] 11/-1/-1->8->10 [7] -1/-1/-1->8->9 [8] 9/-1/-1->8->4 [9] -1/-1/-1->8->11 [10] 9/-1/-1->8->10 [11] 10/-1/-1->8->9 [12] 11/-1/-1->8->4 [13] 10/-1/-1->8->11 [14] 11/-1/-1->8->10 [15] -1/-1/-1->8->9 2: jzxh120:230986:231569 [1] NCCL INFO P2P Chunksize set to 131072 3: jzxh121:127841:128425 [0] NCCL INFO comm 0x55fc4ea08020 rank 12 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 3: jzxh121:127841:128425 [0] NCCL INFO Trees [0] 13/-1/-1->12->8 [1] -1/-1/-1->12->15 [2] 13/-1/-1->12->14 [3] 14/-1/-1->12->13 [4] 15/-1/-1->12->8 [5] 14/-1/-1->12->15 [6] 15/-1/-1->12->14 [7] -1/-1/-1->12->13 [8] 13/4/-1->12->-1 [9] -1/-1/-1->12->15 [10] 13/-1/-1->12->14 [11] 14/-1/-1->12->13 [12] 15/4/-1->12->-1 [13] 14/-1/-1->12->15 [14] 15/-1/-1->12->14 [15] -1/-1/-1->12->13 3: jzxh121:127841:128425 [0] NCCL INFO P2P Chunksize set to 131072 2: jzxh120:230985:231570 [0] NCCL INFO P2P Chunksize set to 131072 1: jzxh119:826468:827058 [1] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->9 [2] 7/-1/-1->5->4 [3] 4/-1/-1->5->7 [4] -1/-1/-1->5->6 [5] 7/-1/-1->5->9 [6] -1/-1/-1->5->7 [7] 4/-1/-1->5->6 [8] 6/-1/-1->5->4 [9] 6/9/1->5->13 [10] 7/-1/-1->5->4 [11] 4/-1/-1->5->7 [12] -1/-1/-1->5->6 [13] 7/9/1->5->13 [14] -1/-1/-1->5->7 [15] 4/-1/-1->5->6 1: jzxh119:826468:827058 [1] NCCL INFO P2P Chunksize set to 131072 1: jzxh119:826467:827055 [0] NCCL INFO Trees [0] 5/-1/-1->4->8 [1] -1/-1/-1->4->7 [2] 5/-1/-1->4->6 [3] 6/-1/-1->4->5 [4] 7/-1/-1->4->8 [5] 6/-1/-1->4->7 [6] 7/-1/-1->4->6 [7] -1/-1/-1->4->5 [8] 5/8/0->4->12 [9] -1/-1/-1->4->7 [10] 5/-1/-1->4->6 [11] 6/-1/-1->4->5 [12] 7/8/0->4->12 [13] 6/-1/-1->4->7 [14] 7/-1/-1->4->6 [15] -1/-1/-1->4->5 1: jzxh119:826467:827055 [0] NCCL INFO P2P Chunksize set to 131072 1: jzxh119:826470:827057 [3] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 4/-1/-1->7->6 [2] -1/-1/-1->7->5 [3] 5/-1/-1->7->11 [4] 6/-1/-1->7->4 [5] 4/-1/-1->7->5 [6] 5/-1/-1->7->4 [7] 6/-1/-1->7->11 [8] -1/-1/-1->7->6 [9] 4/-1/-1->7->6 [10] -1/-1/-1->7->5 [11] 5/11/3->7->15 [12] 6/-1/-1->7->4 [13] 4/-1/-1->7->5 [14] 5/-1/-1->7->4 [15] 6/11/3->7->15 1: jzxh119:826470:827057 [3] NCCL INFO P2P Chunksize set to 131072 3: jzxh121:127843:128424 [2] NCCL INFO Channel 00/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 04/0 : 14[2] -> 15[3] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 00/0 : 13[1] -> 14[2] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 08/0 : 14[2] -> 15[3] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 12/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 00/0 : 10[2] -> 11[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 03/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 04/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 08/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 07/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 11/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 08/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 11/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 04/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 12/0 : 9[1] -> 10[2] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 08/0 : 10[2] -> 11[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 12/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 15/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 15/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 12/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 00/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 00/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 04/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 00/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 00/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 04/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 04/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 08/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 08/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 08/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 00/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 12/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 12/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 00/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 04/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 12/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 08/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 08/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 12/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 00/0 : 8[0] -> 9[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 12/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 03/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 03/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 04/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 07/0 : 12[0] -> 13[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 07/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 08/0 : 12[0] -> 13[1] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 11/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 08/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 12/0 : 12[0] -> 13[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 11/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 15/0 : 12[0] -> 13[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 12/0 : 8[0] -> 9[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 15/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 01/0 : 0[0] -> 3[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 02/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 01/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 00/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 05/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 02/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 00/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 06/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 04/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 05/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 09/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 06/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 10/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 08/0 : 6[2] -> 7[3] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 12/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 09/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 13/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 03/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 14/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 10/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 04/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 13/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 07/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 14/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 08/0 : 5[1] -> 6[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 11/0 : 5[1] -> 6[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 12/0 : 5[1] -> 6[2] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 00/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 00/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 15/0 : 5[1] -> 6[2] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 04/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 04/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 08/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 08/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 00/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 12/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 03/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 01/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 04/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 07/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 02/0 : 8[0] -> 11[3] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 05/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 08/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 06/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 11/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 09/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 4[0] -> 5[1] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 02/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 02/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 10/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 06/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 06/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 02/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 10/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 06/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 13/0 : 8[0] -> 11[3] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 10/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 10/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 02/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 15/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 06/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 10/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 14/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 10/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 14/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 14/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 14/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 14/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 14/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 01/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 02/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 05/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 01/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 01/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 05/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 06/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 05/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 09/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 09/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 13/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 09/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 10/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 13/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 01/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 13/0 : 4[0] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 01/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 05/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 09/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 09/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 14/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 02/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 13/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 01/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 06/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 13/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 05/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 05/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 09/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 10/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 14/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 09/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 13/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 13/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 01/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 02/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 02/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 02/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 05/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 06/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 09/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 06/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 06/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 10/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 10/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 13/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 14/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 14/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 01/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 01/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 05/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 05/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 09/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 13/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 03/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 03/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 07/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 07/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 11/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 11/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 15/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 15/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 01/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 03/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 07/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 11/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 11/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 02/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 15/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 03/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 15/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 02/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 05/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 07/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 06/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 11/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 06/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 09/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 10/0 : 5[1] -> 4[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 15/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 10/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 5[1] -> 4[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 01/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 14/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 05/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 13/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 14/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 09/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 03/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 03/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 07/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 07/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 11/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 15/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 10/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 13/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 14/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 11[3] -> 8[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 02/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 11/0 : 11[3] -> 8[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 15/0 : 11[3] -> 8[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 01/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 06/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 05/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 09/0 : 11[3] -> 10[2] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 13/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 03/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 03/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 07/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 07/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 11/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 11/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 15/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 15/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 03/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 03/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 07/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 07/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 11/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 15/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 01/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 05/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 09/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 13/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 02/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 06/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 10/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 14/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Connected all rings 0: jzxh118:734706:735291 [3] NCCL INFO Connected all rings 0: jzxh118:734703:735288 [0] NCCL INFO Connected all rings 0: jzxh118:734704:735290 [1] NCCL INFO Connected all rings 0: jzxh118:734703:735288 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Connected all rings 1: jzxh119:826470:827057 [3] NCCL INFO Connected all rings 1: jzxh119:826467:827055 [0] NCCL INFO Connected all rings 1: jzxh119:826468:827058 [1] NCCL INFO Connected all rings 1: jzxh119:826467:827055 [0] NCCL INFO Channel 02/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Connected all rings 3: jzxh121:127844:128427 [3] NCCL INFO Connected all rings 3: jzxh121:127841:128425 [0] NCCL INFO Connected all rings 0: jzxh118:734704:735290 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 01/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 02/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 10/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 07/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Connected all rings 3: jzxh121:127841:128425 [0] NCCL INFO Channel 10/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 02/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 01/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 02/0 : 8[0] -> 9[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 09/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 03/0 : 0[0] -> 2[2] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Connected all rings 2: jzxh120:230985:231570 [0] NCCL INFO Channel 10/0 : 8[0] -> 9[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Connected all rings 2: jzxh120:230987:231571 [2] NCCL INFO Connected all rings 0: jzxh118:734705:735289 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 15/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 05/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Connected all rings 3: jzxh121:127843:128424 [2] NCCL INFO Channel 01/0 : 14[2] -> 15[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 06/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 02/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 07/0 : 14[2] -> 15[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 02/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 09/0 : 14[2] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 02/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 01/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 10/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 03/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 15/0 : 14[2] -> 15[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 09/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 03/0 : 5[1] -> 7[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 03/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 01/0 : 10[2] -> 11[3] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 05/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 11/0 : 0[0] -> 2[2] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 02/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 05/0 : 5[1] -> 7[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 05/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 09/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 02/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 07/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 06/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 13/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 06/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 03/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 06/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 09/0 : 10[2] -> 11[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 03/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 02/0 : 8[0] -> 10[2] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 15/0 : 10[2] -> 11[3] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 03/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 10/0 : 5[1] -> 7[3] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 10/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 05/0 : 12[0] -> 14[2] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 05/0 : 8[0] -> 10[2] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 02/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 14/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 10/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 05/0 : 13[1] -> 15[3] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 11/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 06/0 : 13[1] -> 15[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 06/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 11/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 06/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 11/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 03/0 : 9[1] -> 11[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 10/0 : 12[0] -> 14[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 10/0 : 13[1] -> 15[3] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 13/0 : 1[1] -> 3[3] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 11/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 04/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 13/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 11/0 : 13[1] -> 15[3] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 14/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 10/0 : 2[2] -> 6[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 14/0 : 2[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 14/0 : 5[1] -> 7[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 12/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 13/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh119:826467:827055 [0] NCCL INFO Channel 14/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 13/0 : 13[1] -> 15[3] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 14/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 1[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 04/0 : 4[0] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 14/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 1[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 09/0 : 1[1] -> 5[1] [send] via NET/IB/1/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 13/0 : 1[1] -> 5[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 2[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 01/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 2[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 05/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 02/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 06/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 10/0 : 8[0] -> 10[2] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 08/0 : 0[0] -> 4[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 11/0 : 3[3] -> 7[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 12/0 : 0[0] -> 4[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 15/0 : 3[3] -> 7[3] [send] via NET/IB/3/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 01/0 : 9[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 3[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 08/0 : 0[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 3[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 0[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 11/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 03/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 00/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 07/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 04/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 04/0 : 12[0] -> 15[3] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 05/0 : 9[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 02/0 : 10[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 06/0 : 10[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 12/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 03/0 : 11[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 13/0 : 8[0] -> 10[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 07/0 : 11[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 03/0 : 3[3] -> 11[3] [send] via NET/IB/3/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 00/0 : 8[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 03/0 : 11[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 07/0 : 3[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 14/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh121:127841:128425 [0] NCCL INFO Channel 04/0 : 8[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 07/0 : 11[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 08/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 06/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 10/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 11/0 : 9[1] -> 11[3] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 13/0 : 9[1] -> 11[3] via P2P/CUMEM 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 8[0] -> 11[3] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 14/0 : 9[1] -> 11[3] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 12/0 : 8[0] -> 11[3] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 10/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 14/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 10[2] -> 14[2] [send] via NET/IB/2/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 10[2] -> 14[2] [send] via NET/IB/2/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 14[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 10/0 : 6[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 14[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 6[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 6[2] -> 14[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 14/0 : 6[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 10/0 : 14[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 14/0 : 14[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 02/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 02/0 : 14[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 06/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 06/0 : 14[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 6[2] -> 2[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 6[2] -> 2[2] [send] via NET/IB/2/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 00/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 08/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 11/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 09/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 12/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 15/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 13/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 11[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 00/0 : 8[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 9[1] -> 13[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 11[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 8[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 9[1] -> 13[1] [send] via NET/IB/1/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 11/0 : 7[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 08/0 : 4[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 09/0 : 5[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 09/0 : 5[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 3[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 15/0 : 7[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 12/0 : 4[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 3[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 13/0 : 5[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 11/0 : 15[3] -> 7[3] [send] via NET/IB/3/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 08/0 : 12[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 11[3] -> 3[3] [send] via NET/IB/3/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 09/0 : 13[1] -> 5[1] [send] via NET/IB/1/GDRDMA 0: jzxh118:734704:735290 [1] NCCL INFO Channel 13/0 : 5[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [send] via NET/IB/1/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 15/0 : 15[3] -> 7[3] [send] via NET/IB/3/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 12/0 : 12[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 13/0 : 13[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 11[3] -> 3[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 08/0 : 12[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 13[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 10/0 : 6[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 14/0 : 6[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 12[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 13[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 00/0 : 12[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 13[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 08/0 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 15[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 15[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 00/0 : 12[0] -> 8[0] [send] via NET/IB/0/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 01/0 : 13[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 5[1] -> 13[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 08/0 : 4[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127841:128425 [0] NCCL INFO Channel 04/0 : 12[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 15[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 13[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 12[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 15[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127842:128426 [1] NCCL INFO Channel 05/0 : 13[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 4[0] -> 12[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 5[1] -> 13[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 01/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 00/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 03/0 : 15[3] -> 11[3] [send] via NET/IB/3/GDRDMA 0: jzxh118:734703:735288 [0] NCCL INFO Channel 12/0 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 7[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 05/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 14[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 04/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 07/0 : 15[3] -> 11[3] [send] via NET/IB/3/GDRDMA 0: jzxh118:734705:735289 [2] NCCL INFO Channel 02/0 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 7[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 09/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 08/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 11/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 00/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 14[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 01/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 04/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 05/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 08/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 03/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 08/0 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 09/0 : 5[1] -> 1[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 07/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826467:827055 [0] NCCL INFO Channel 12/0 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826468:827058 [1] NCCL INFO Channel 13/0 : 5[1] -> 1[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230986:231569 [1] NCCL INFO Channel 13/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230985:231570 [0] NCCL INFO Channel 12/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230988:231568 [3] NCCL INFO Channel 15/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127844:128427 [3] NCCL INFO Channel 01/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 7[3] -> 3[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 7[3] -> 3[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 02/0 : 6[2] -> 4[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 10/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127843:128424 [2] NCCL INFO Channel 02/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 04/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 03/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 14/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734706:735291 [3] NCCL INFO Channel 11/0 : 7[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:231571 [2] NCCL INFO Channel 02/0 : 10[2] -> 8[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 15/0 : 7[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827056 [2] NCCL INFO Channel 03/0 : 6[2] -> 4[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 01/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 01/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 05/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 03/0 : 10[2] -> 8[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 05/0 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 05/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 01/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 04/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 03/0 : 14[2] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 06/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 05/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 05/0 : 14[2] -> 12[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 04/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 06/0 : 6[2] -> 4[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 10/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 05/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 06/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 10/0 : 6[2] -> 4[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 06/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 06/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 09/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 06/0 : 11[3] -> 8[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 05/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 11/0 : 6[2] -> 4[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 11/0 : 2[2] -> 0[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 10/0 : 14[2] -> 12[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 04/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 10/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 11/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 09/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 12/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 13/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 11/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 13/0 : 14[2] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 14/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 12/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 13/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 13/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 14/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 13/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 14/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 14/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 13/0 : 6[2] -> 4[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 14/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 05/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 02/0 : 15[3] -> 13[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 02/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 03/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 06/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 03/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 05/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 09/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 05/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 06/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 12/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 06/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 06/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 14/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 09/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 12/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 13/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 14/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 13/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 10/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 02/0 : 7[3] -> 5[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 03/0 : 7[3] -> 5[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 05/0 : 7[3] -> 5[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 06/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 10/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 11/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 10/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 14/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 11/0 : 11[3] -> 9[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 13/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 11/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 02/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 14/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 13/0 : 7[3] -> 5[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 13/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 03/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 14/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 00/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 14/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 00/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 05/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 00/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 06/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 04/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 04/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 04/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 10/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 07/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 11/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 07/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 07/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 13/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 08/0 : 7[3] -> 6[2] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 08/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 14/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 08/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 12/0 : 7[3] -> 6[2] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 12/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 12/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 00/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826470:827057 [3] NCCL INFO Channel 15/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 00/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh121:127844:128427 [3] NCCL INFO Channel 15/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM 2: jzxh120:230988:231568 [3] NCCL INFO Channel 15/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 00/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 01/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 01/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 01/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 04/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 04/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 00/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 00/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 07/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 04/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 03/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh118:734706:735291 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 07/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 07/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 00/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 07/0 : 5[1] -> 4[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 08/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 08/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 03/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 08/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 08/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 03/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 09/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 07/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 09/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 11/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 07/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 12/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 09/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 08/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127843:128424 [2] NCCL INFO Channel 15/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 12/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO Channel 15/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 08/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 11/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 12/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 11/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh121:127842:128426 [1] NCCL INFO Channel 15/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh120:230987:231571 [2] NCCL INFO Channel 15/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh119:826469:827056 [2] NCCL INFO Channel 15/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230986:231569 [1] NCCL INFO Channel 15/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734705:735289 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734704:735290 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734703:735288 [0] NCCL INFO Connected all trees 0: jzxh118:734706:735291 [3] NCCL INFO Connected all trees 0: jzxh118:734704:735290 [1] NCCL INFO Connected all trees 0: jzxh118:734706:735291 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh118:734706:735291 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh118:734705:735289 [2] NCCL INFO Connected all trees 0: jzxh118:734704:735290 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh118:734704:735290 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230985:231570 [0] NCCL INFO Connected all trees 3: jzxh121:127841:128425 [0] NCCL INFO Connected all trees 0: jzxh118:734703:735288 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh118:734703:735288 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh119:826467:827055 [0] NCCL INFO Connected all trees 0: jzxh118:734705:735289 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh118:734705:735289 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh119:826470:827057 [3] NCCL INFO Connected all trees 1: jzxh119:826468:827058 [1] NCCL INFO Connected all trees 1: jzxh119:826467:827055 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh120:230988:231568 [3] NCCL INFO Connected all trees 2: jzxh120:230985:231570 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh120:230986:231569 [1] NCCL INFO Connected all trees 2: jzxh120:230985:231570 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230987:231571 [2] NCCL INFO Connected all trees 2: jzxh120:230988:231568 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh119:826467:827055 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh119:826469:827056 [2] NCCL INFO Connected all trees 2: jzxh120:230988:231568 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh119:826470:827057 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh119:826468:827058 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh119:826470:827057 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230987:231571 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh120:230987:231571 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127841:128425 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127841:128425 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127844:128427 [3] NCCL INFO Connected all trees 1: jzxh119:826468:827058 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230986:231569 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127842:128426 [1] NCCL INFO Connected all trees 1: jzxh119:826469:827056 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh119:826469:827056 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230986:231569 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127843:128424 [2] NCCL INFO Connected all trees 3: jzxh121:127844:128427 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127844:128427 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127842:128426 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127842:128426 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127843:128424 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127843:128424 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230988:231568 [3] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 2: jzxh120:230988:231568 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2: jzxh120:230988:231568 [3] NCCL INFO ncclCommInitRank comm 0x55aca8736ed0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x8004dd5fc9927b85 - Init COMPLETE 2: jzxh120:230987:231571 [2] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 2: jzxh120:230987:231571 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2: jzxh120:230987:231571 [2] NCCL INFO ncclCommInitRank comm 0x55f22cfb7d60 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x8004dd5fc9927b85 - Init COMPLETE 2: jzxh120:230985:231570 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 2: jzxh120:230986:231569 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 2: jzxh120:230986:231569 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2: jzxh120:230985:231570 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2: jzxh120:230986:231569 [1] NCCL INFO ncclCommInitRank comm 0x55da41206ba0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x8004dd5fc9927b85 - Init COMPLETE 2: jzxh120:230985:231570 [0] NCCL INFO ncclCommInitRank comm 0x562a42092400 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x8004dd5fc9927b85 - Init COMPLETE 3: jzxh121:127844:128427 [3] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh121:127843:128424 [2] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh121:127842:128426 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh121:127841:128425 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh121:127844:128427 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 3: jzxh121:127843:128424 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 3: jzxh121:127842:128426 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 3: jzxh121:127841:128425 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 3: jzxh121:127844:128427 [3] NCCL INFO ncclCommInitRank comm 0x5605694b8640 rank 15 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x8004dd5fc9927b85 - Init COMPLETE 3: jzxh121:127843:128424 [2] NCCL INFO ncclCommInitRank comm 0x55c4ba21cf70 rank 14 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x8004dd5fc9927b85 - Init COMPLETE 3: jzxh121:127841:128425 [0] NCCL INFO ncclCommInitRank comm 0x55fc4ea08020 rank 12 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x8004dd5fc9927b85 - Init COMPLETE 3: jzxh121:127842:128426 [1] NCCL INFO ncclCommInitRank comm 0x55cc89ae9730 rank 13 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x8004dd5fc9927b85 - Init COMPLETE 0: jzxh118:734704:735290 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 0: jzxh118:734704:735290 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 0: jzxh118:734704:735290 [1] NCCL INFO ncclCommInitRank comm 0x563195d2fc40 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x8004dd5fc9927b85 - Init COMPLETE 0: jzxh118:734703:735288 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 0: jzxh118:734703:735288 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 0: jzxh118:734705:735289 [2] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 0: jzxh118:734706:735291 [3] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 0: jzxh118:734703:735288 [0] NCCL INFO ncclCommInitRank comm 0x5577f9213700 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x8004dd5fc9927b85 - Init COMPLETE 0: jzxh118:734705:735289 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 0: jzxh118:734706:735291 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 0: jzxh118:734705:735289 [2] NCCL INFO ncclCommInitRank comm 0x5632985679d0 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x8004dd5fc9927b85 - Init COMPLETE 0: jzxh118:734706:735291 [3] NCCL INFO ncclCommInitRank comm 0x56306775a5b0 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x8004dd5fc9927b85 - Init COMPLETE 2: jzxh120:230985:231600 [0] NCCL INFO Channel 04/1 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 2: jzxh120:230985:231600 [0] NCCL INFO Channel 05/1 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 2: jzxh120:230988:231601 [3] NCCL INFO Channel 04/1 : 11[3] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh120:230988:231601 [3] NCCL INFO Channel 05/1 : 11[3] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh120:230987:231602 [2] NCCL INFO Channel 04/1 : 10[2] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh120:230987:231602 [2] NCCL INFO Channel 05/1 : 10[2] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh120:230986:231603 [1] NCCL INFO Channel 04/1 : 9[1] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh120:230986:231603 [1] NCCL INFO Channel 05/1 : 9[1] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 0: jzxh118:734706:735320 [3] NCCL INFO Channel 00/1 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh121:127841:128456 [0] NCCL INFO Channel 08/1 : 12[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 3: jzxh121:127841:128456 [0] NCCL INFO Channel 09/1 : 12[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 1: jzxh119:826469:827056 [2] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 1: jzxh119:826469:827056 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 1: jzxh119:826469:827056 [2] NCCL INFO ncclCommInitRank comm 0x5592e2086e30 rank 6 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x8004dd5fc9927b85 - Init COMPLETE 1: jzxh119:826468:827058 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 1: jzxh119:826468:827058 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 1: jzxh119:826467:827055 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh121:127844:128457 [3] NCCL INFO Channel 08/1 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 0: jzxh118:734706:735320 [3] NCCL INFO Channel 01/1 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826468:827058 [1] NCCL INFO ncclCommInitRank comm 0x5618ab708160 rank 5 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x8004dd5fc9927b85 - Init COMPLETE 1: jzxh119:826470:827057 [3] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 1: jzxh119:826467:827055 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 1: jzxh119:826470:827057 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 1: jzxh119:826467:827055 [0] NCCL INFO ncclCommInitRank comm 0x55ecfcdf75c0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x8004dd5fc9927b85 - Init COMPLETE 1: jzxh119:826470:827057 [3] NCCL INFO ncclCommInitRank comm 0x555937567370 rank 7 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x8004dd5fc9927b85 - Init COMPLETE 3: jzxh121:127843:128458 [2] NCCL INFO Channel 08/1 : 14[2] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 3: jzxh121:127844:128457 [3] NCCL INFO Channel 09/1 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 3: jzxh121:127843:128458 [2] NCCL INFO Channel 09/1 : 14[2] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 0: jzxh118:734704:735321 [1] NCCL INFO Channel 00/1 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh121:127842:128459 [1] NCCL INFO Channel 08/1 : 13[1] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 08/1 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 3: jzxh121:127842:128459 [1] NCCL INFO Channel 09/1 : 13[1] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 0: jzxh118:734705:735323 [2] NCCL INFO Channel 00/1 : 2[2] -> 0[0] via P2P/CUMEM 0: jzxh118:734703:735322 [0] NCCL INFO Channel 09/1 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734704:735321 [1] NCCL INFO Channel 01/1 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734703:735322 [0] NCCL INFO Channel 08/1 : 14[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 09/1 : 14[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734705:735323 [2] NCCL INFO Channel 01/1 : 2[2] -> 0[0] via P2P/CUMEM 0: jzxh118:734703:735322 [0] NCCL INFO Channel 08/1 : 13[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 09/1 : 13[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 08/1 : 12[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 09/1 : 12[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 04/1 : 11[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 05/1 : 11[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 04/1 : 10[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 05/1 : 10[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 04/1 : 9[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 05/1 : 9[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 04/1 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 05/1 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 12/1 : 7[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 13/1 : 7[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 1: jzxh119:826468:827091 [1] NCCL INFO Channel 12/1 : 5[1] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 1: jzxh119:826467:827089 [0] NCCL INFO Channel 12/1 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 1: jzxh119:826470:827088 [3] NCCL INFO Channel 12/1 : 7[3] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 12/1 : 6[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 1: jzxh119:826467:827089 [0] NCCL INFO Channel 13/1 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 1: jzxh119:826469:827090 [2] NCCL INFO Channel 12/1 : 6[2] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 1: jzxh119:826470:827088 [3] NCCL INFO Channel 13/1 : 7[3] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 1: jzxh119:826468:827091 [1] NCCL INFO Channel 13/1 : 5[1] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 13/1 : 6[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 1: jzxh119:826469:827090 [2] NCCL INFO Channel 13/1 : 6[2] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 12/1 : 5[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 13/1 : 5[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 12/1 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh118:734703:735322 [0] NCCL INFO Channel 13/1 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: [2025-11-22 23:28:17,601] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:436] [PID:734703] [RANK:0] gather_len_batches: [99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297, 99297] 0: [2025-11-22 23:28:17,626] [INFO] [axolotl.utils.trainer.calc_sample_packing_eff_est:495] [PID:734703] [RANK:0] sample_packing_eff_est across ranks: [0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223, 0.8523049354553223] 0: [2025-11-22 23:28:17,644] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:127] [PID:734703] [RANK:0] Maximum number of steps set at 3723 2: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 1: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 1: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 1: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 3: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 1: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 3: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 3: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 3: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: [2025-11-22 23:28:24,589] [INFO] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:110] [PID:734703] [RANK:0] Patched Trainer.evaluation_loop with nanmean loss calculation 0: [2025-11-22 23:28:24,590] [INFO] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:164] [PID:734703] [RANK:0] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation 1: Loading checkpoint shards: 0%| | 0/5 [00:003->2 [1] 0/-1/-1->3->2 [2] -1/-1/-1->3->1 [3] 1/11/-1->3->-1 [4] 2/-1/-1->3->0 [5] 0/-1/-1->3->1 [6] 1/-1/-1->3->0 [7] 2/11/-1->3->-1 [8] -1/-1/-1->3->2 [9] 0/-1/-1->3->2 [10] -1/-1/-1->3->1 [11] 1/-1/-1->3->7 [12] 2/-1/-1->3->0 [13] 0/-1/-1->3->1 [14] 1/-1/-1->3->0 [15] 2/-1/-1->3->7 0: jzxh118:734703:736201 [0] NCCL INFO Channel 01/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh118:734705:736204 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 0/10/-1->2->-1 [3] -1/-1/-1->2->0 [4] 1/-1/-1->2->3 [5] -1/-1/-1->2->0 [6] 0/10/-1->2->-1 [7] 1/-1/-1->2->3 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 0/-1/-1->2->6 [11] -1/-1/-1->2->0 [12] 1/-1/-1->2->3 [13] -1/-1/-1->2->0 [14] 0/-1/-1->2->6 [15] 1/-1/-1->2->3 3: jzxh121:127843:129318 [2] NCCL INFO comm 0x1530ac14b0d0 rank 14 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 3: jzxh121:127844:129319 [3] NCCL INFO comm 0x1512d813fa80 rank 15 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 3: jzxh121:127842:129320 [1] NCCL INFO comm 0x14cb04147d00 rank 13 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 0: jzxh118:734704:736202 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/9/-1->1->-1 [2] 3/-1/-1->1->0 [3] 0/-1/-1->1->3 [4] -1/-1/-1->1->2 [5] 3/9/-1->1->-1 [6] -1/-1/-1->1->3 [7] 0/-1/-1->1->2 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->5 [10] 3/-1/-1->1->0 [11] 0/-1/-1->1->3 [12] -1/-1/-1->1->2 [13] 3/-1/-1->1->5 [14] -1/-1/-1->1->3 [15] 0/-1/-1->1->2 0: jzxh118:734706:736203 [3] NCCL INFO P2P Chunksize set to 131072 0: jzxh118:734703:736201 [0] NCCL INFO Channel 02/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh118:734705:736204 [2] NCCL INFO P2P Chunksize set to 131072 0: jzxh118:734704:736202 [1] NCCL INFO P2P Chunksize set to 131072 0: jzxh118:734703:736201 [0] NCCL INFO Channel 03/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh118:734703:736201 [0] NCCL INFO Channel 04/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh118:734703:736201 [0] NCCL INFO Channel 05/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 1: jzxh119:826467:827965 [0] NCCL INFO comm 0x15454013e2c0 rank 4 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 1: jzxh119:826468:827964 [1] NCCL INFO comm 0x14c404138dc0 rank 5 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 1: jzxh119:826469:827962 [2] NCCL INFO comm 0x14b39c13f610 rank 6 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 1: jzxh119:826470:827963 [3] NCCL INFO comm 0x14965012e200 rank 7 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 3: jzxh121:127843:129318 [2] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 12/-1/-1->14->10 [3] -1/-1/-1->14->12 [4] 13/-1/-1->14->15 [5] -1/-1/-1->14->12 [6] 12/-1/-1->14->10 [7] 13/-1/-1->14->15 [8] 15/-1/-1->14->13 [9] 15/-1/-1->14->13 [10] 12/6/-1->14->-1 [11] -1/-1/-1->14->12 [12] 13/-1/-1->14->15 [13] -1/-1/-1->14->12 [14] 12/6/-1->14->-1 [15] 13/-1/-1->14->15 3: jzxh121:127843:129318 [2] NCCL INFO P2P Chunksize set to 131072 3: jzxh121:127844:129319 [3] NCCL INFO Trees [0] -1/-1/-1->15->14 [1] 12/-1/-1->15->14 [2] -1/-1/-1->15->13 [3] 13/-1/-1->15->11 [4] 14/-1/-1->15->12 [5] 12/-1/-1->15->13 [6] 13/-1/-1->15->12 [7] 14/-1/-1->15->11 [8] -1/-1/-1->15->14 [9] 12/-1/-1->15->14 [10] -1/-1/-1->15->13 [11] 13/7/-1->15->-1 [12] 14/-1/-1->15->12 [13] 12/-1/-1->15->13 [14] 13/-1/-1->15->12 [15] 14/7/-1->15->-1 3: jzxh121:127844:129319 [3] NCCL INFO P2P Chunksize set to 131072 0: jzxh118:734703:736201 [0] NCCL INFO Channel 06/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh118:734703:736201 [0] NCCL INFO Channel 07/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 1: jzxh119:826467:827965 [0] NCCL INFO Trees [0] 5/-1/-1->4->8 [1] -1/-1/-1->4->7 [2] 5/-1/-1->4->6 [3] 6/-1/-1->4->5 [4] 7/-1/-1->4->8 [5] 6/-1/-1->4->7 [6] 7/-1/-1->4->6 [7] -1/-1/-1->4->5 [8] 5/8/0->4->12 [9] -1/-1/-1->4->7 [10] 5/-1/-1->4->6 [11] 6/-1/-1->4->5 [12] 7/8/0->4->12 [13] 6/-1/-1->4->7 [14] 7/-1/-1->4->6 [15] -1/-1/-1->4->5 1: jzxh119:826467:827965 [0] NCCL INFO P2P Chunksize set to 131072 1: jzxh119:826469:827962 [2] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 4/-1/-1->6->10 [3] -1/-1/-1->6->4 [4] 5/-1/-1->6->7 [5] -1/-1/-1->6->4 [6] 4/-1/-1->6->10 [7] 5/-1/-1->6->7 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 4/10/2->6->14 [11] -1/-1/-1->6->4 [12] 5/-1/-1->6->7 [13] -1/-1/-1->6->4 [14] 4/10/2->6->14 [15] 5/-1/-1->6->7 0: jzxh118:734703:736201 [0] NCCL INFO Channel 08/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh118:734703:736201 [0] NCCL INFO Channel 09/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh118:734703:736201 [0] NCCL INFO Channel 10/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh118:734703:736201 [0] NCCL INFO Channel 11/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh118:734703:736201 [0] NCCL INFO Channel 12/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh118:734703:736201 [0] NCCL INFO Channel 13/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh118:734703:736201 [0] NCCL INFO Channel 14/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh118:734703:736201 [0] NCCL INFO Channel 15/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 1: jzxh119:826468:827964 [1] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->9 [2] 7/-1/-1->5->4 [3] 4/-1/-1->5->7 [4] -1/-1/-1->5->6 [5] 7/-1/-1->5->9 [6] -1/-1/-1->5->7 [7] 4/-1/-1->5->6 [8] 6/-1/-1->5->4 [9] 6/9/1->5->13 [10] 7/-1/-1->5->4 [11] 4/-1/-1->5->7 [12] -1/-1/-1->5->6 [13] 7/9/1->5->13 [14] -1/-1/-1->5->7 [15] 4/-1/-1->5->6 1: jzxh119:826470:827963 [3] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 4/-1/-1->7->6 [2] -1/-1/-1->7->5 [3] 5/-1/-1->7->11 [4] 6/-1/-1->7->4 [5] 4/-1/-1->7->5 [6] 5/-1/-1->7->4 [7] 6/-1/-1->7->11 [8] -1/-1/-1->7->6 [9] 4/-1/-1->7->6 [10] -1/-1/-1->7->5 [11] 5/11/3->7->15 [12] 6/-1/-1->7->4 [13] 4/-1/-1->7->5 [14] 5/-1/-1->7->4 [15] 6/11/3->7->15 1: jzxh119:826469:827962 [2] NCCL INFO P2P Chunksize set to 131072 1: jzxh119:826468:827964 [1] NCCL INFO P2P Chunksize set to 131072 1: jzxh119:826470:827963 [3] NCCL INFO P2P Chunksize set to 131072 0: jzxh118:734703:736201 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] -1/-1/-1->0->3 [2] 1/-1/-1->0->2 [3] 2/-1/-1->0->1 [4] 3/8/-1->0->-1 [5] 2/-1/-1->0->3 [6] 3/-1/-1->0->2 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->4 [9] -1/-1/-1->0->3 [10] 1/-1/-1->0->2 [11] 2/-1/-1->0->1 [12] 3/-1/-1->0->4 [13] 2/-1/-1->0->3 [14] 3/-1/-1->0->2 [15] -1/-1/-1->0->1 0: jzxh118:734703:736201 [0] NCCL INFO P2P Chunksize set to 131072 2: jzxh120:230988:232477 [3] NCCL INFO comm 0x15368412ef80 rank 11 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 2: jzxh120:230987:232476 [2] NCCL INFO comm 0x14c2b81361c0 rank 10 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 2: jzxh120:230986:232479 [1] NCCL INFO comm 0x14b07412e240 rank 9 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 2: jzxh120:230985:232478 [0] NCCL INFO comm 0x149348151b00 rank 8 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 2: jzxh120:230988:232477 [3] NCCL INFO Trees [0] -1/-1/-1->11->10 [1] 8/-1/-1->11->10 [2] -1/-1/-1->11->9 [3] 9/7/15->11->3 [4] 10/-1/-1->11->8 [5] 8/-1/-1->11->9 [6] 9/-1/-1->11->8 [7] 10/7/15->11->3 [8] -1/-1/-1->11->10 [9] 8/-1/-1->11->10 [10] -1/-1/-1->11->9 [11] 9/-1/-1->11->7 [12] 10/-1/-1->11->8 [13] 8/-1/-1->11->9 [14] 9/-1/-1->11->8 [15] 10/-1/-1->11->7 2: jzxh120:230988:232477 [3] NCCL INFO P2P Chunksize set to 131072 3: jzxh121:127841:129321 [0] NCCL INFO comm 0x14895813f700 rank 12 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 3: jzxh121:127842:129320 [1] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->9 [2] 15/-1/-1->13->12 [3] 12/-1/-1->13->15 [4] -1/-1/-1->13->14 [5] 15/-1/-1->13->9 [6] -1/-1/-1->13->15 [7] 12/-1/-1->13->14 [8] 14/-1/-1->13->12 [9] 14/5/-1->13->-1 [10] 15/-1/-1->13->12 [11] 12/-1/-1->13->15 [12] -1/-1/-1->13->14 [13] 15/5/-1->13->-1 [14] -1/-1/-1->13->15 [15] 12/-1/-1->13->14 3: jzxh121:127842:129320 [1] NCCL INFO P2P Chunksize set to 131072 3: jzxh121:127841:129321 [0] NCCL INFO Trees [0] 13/-1/-1->12->8 [1] -1/-1/-1->12->15 [2] 13/-1/-1->12->14 [3] 14/-1/-1->12->13 [4] 15/-1/-1->12->8 [5] 14/-1/-1->12->15 [6] 15/-1/-1->12->14 [7] -1/-1/-1->12->13 [8] 13/4/-1->12->-1 [9] -1/-1/-1->12->15 [10] 13/-1/-1->12->14 [11] 14/-1/-1->12->13 [12] 15/4/-1->12->-1 [13] 14/-1/-1->12->15 [14] 15/-1/-1->12->14 [15] -1/-1/-1->12->13 3: jzxh121:127841:129321 [0] NCCL INFO P2P Chunksize set to 131072 2: jzxh120:230987:232476 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->9 [2] 8/6/14->10->2 [3] -1/-1/-1->10->8 [4] 9/-1/-1->10->11 [5] -1/-1/-1->10->8 [6] 8/6/14->10->2 [7] 9/-1/-1->10->11 [8] 11/-1/-1->10->9 [9] 11/-1/-1->10->9 [10] 8/-1/-1->10->6 [11] -1/-1/-1->10->8 [12] 9/-1/-1->10->11 [13] -1/-1/-1->10->8 [14] 8/-1/-1->10->6 [15] 9/-1/-1->10->11 2: jzxh120:230987:232476 [2] NCCL INFO P2P Chunksize set to 131072 2: jzxh120:230986:232479 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] 10/5/13->9->1 [2] 11/-1/-1->9->8 [3] 8/-1/-1->9->11 [4] -1/-1/-1->9->10 [5] 11/5/13->9->1 [6] -1/-1/-1->9->11 [7] 8/-1/-1->9->10 [8] 10/-1/-1->9->8 [9] 10/-1/-1->9->5 [10] 11/-1/-1->9->8 [11] 8/-1/-1->9->11 [12] -1/-1/-1->9->10 [13] 11/-1/-1->9->5 [14] -1/-1/-1->9->11 [15] 8/-1/-1->9->10 2: jzxh120:230985:232478 [0] NCCL INFO Trees [0] 9/4/12->8->0 [1] -1/-1/-1->8->11 [2] 9/-1/-1->8->10 [3] 10/-1/-1->8->9 [4] 11/4/12->8->0 [5] 10/-1/-1->8->11 [6] 11/-1/-1->8->10 [7] -1/-1/-1->8->9 [8] 9/-1/-1->8->4 [9] -1/-1/-1->8->11 [10] 9/-1/-1->8->10 [11] 10/-1/-1->8->9 [12] 11/-1/-1->8->4 [13] 10/-1/-1->8->11 [14] 11/-1/-1->8->10 [15] -1/-1/-1->8->9 2: jzxh120:230985:232478 [0] NCCL INFO P2P Chunksize set to 131072 2: jzxh120:230986:232479 [1] NCCL INFO P2P Chunksize set to 131072 1: jzxh119:826469:827962 [2] NCCL INFO Channel 00/0 : 6[2] -> 7[3] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 04/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 00/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 08/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 03/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 00/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 12/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 04/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 07/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 04/0 : 10[2] -> 11[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 08/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 08/0 : 10[2] -> 11[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 11/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 00/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 00/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 12/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 12/0 : 10[2] -> 11[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 04/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 08/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 00/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 08/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 12/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 00/0 : 8[0] -> 9[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 12/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 15/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 03/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 03/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 00/0 : 14[2] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 04/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 04/0 : 14[2] -> 15[3] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 07/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 08/0 : 14[2] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 08/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 12/0 : 14[2] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 11/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 00/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 12/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 07/0 : 8[0] -> 9[1] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 08/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 00/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 04/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 04/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 15/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 08/0 : 8[0] -> 9[1] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 11/0 : 9[1] -> 10[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 11/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 08/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 08/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 12/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 00/0 : 12[0] -> 13[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 12/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 12/0 : 9[1] -> 10[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 12/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 03/0 : 12[0] -> 13[1] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 15/0 : 9[1] -> 10[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 15/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 04/0 : 12[0] -> 13[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 07/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 00/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 08/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 04/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 00/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 00/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 08/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 12/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 11/0 : 12[0] -> 13[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 12/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 04/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 04/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 08/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 08/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 15/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 00/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 12/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 01/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 01/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 03/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 04/0 : 4[0] -> 5[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 02/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 02/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 05/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 07/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 05/0 : 8[0] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 06/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 08/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 06/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 09/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 10/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 11/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 09/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 13/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 10/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 14/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 01/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 15/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 02/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 13/0 : 8[0] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 05/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 14/0 : 8[0] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 06/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 01/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 09/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 02/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 10/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 05/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 13/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 06/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 09/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 14/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 10/0 : 4[0] -> 7[3] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 02/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 02/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 13/0 : 4[0] -> 7[3] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 06/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 06/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 10/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 10/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 14/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 14/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 14/0 : 4[0] -> 7[3] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 02/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 06/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 10/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 10/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 14/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 14/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 02/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 02/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 02/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 02/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 06/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 06/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 10/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 10/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 14/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 14/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 06/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 01/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 06/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 05/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 10/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 09/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 09/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 14/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 13/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 13/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 01/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 01/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 05/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 05/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 09/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 09/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 13/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 13/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 01/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 01/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 01/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 05/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 05/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 09/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 09/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 13/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 13/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 02/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 05/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 06/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 01/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 01/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 05/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 05/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 09/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 10/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 09/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 13/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 13/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 14/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 01/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 02/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 05/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 06/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 02/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 10/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 06/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 09/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 14/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 10/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 13/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 14/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 03/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 03/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 07/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 07/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 11/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 15/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 03/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 07/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 11/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 11/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 15/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 15/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 11[3] -> 8[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 11[3] -> 8[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 11/0 : 11[3] -> 8[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 15/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 02/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 01/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 06/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 05/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 03/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 03/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 07/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 09/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 07/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 11/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 11/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 15/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 15/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 13/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 03/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 03/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 03/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 07/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 07/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 11/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 11/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 03/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 07/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 15/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 15/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 03/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 07/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 07/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 11/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 01/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 11/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 15/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 15/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 05/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 09/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 02/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 01/0 : 15[3] -> 14[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 06/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 13/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 10/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 05/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 09/0 : 15[3] -> 14[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 14/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 13/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 10/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 14/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Connected all rings 0: jzxh118:734706:736203 [3] NCCL INFO Connected all rings 0: jzxh118:734704:736202 [1] NCCL INFO Connected all rings 0: jzxh118:734703:736201 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Connected all rings 0: jzxh118:734705:736204 [2] NCCL INFO Connected all rings 1: jzxh119:826469:827962 [2] NCCL INFO Connected all rings 1: jzxh119:826468:827964 [1] NCCL INFO Connected all rings 1: jzxh119:826467:827965 [0] NCCL INFO Connected all rings 2: jzxh120:230987:232476 [2] NCCL INFO Connected all rings 2: jzxh120:230988:232477 [3] NCCL INFO Connected all rings 3: jzxh121:127841:129321 [0] NCCL INFO Channel 02/0 : 12[0] -> 13[1] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Connected all rings 3: jzxh121:127844:129319 [3] NCCL INFO Connected all rings 3: jzxh121:127843:129318 [2] NCCL INFO Connected all rings 3: jzxh121:127841:129321 [0] NCCL INFO Channel 10/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Connected all rings 1: jzxh119:826467:827965 [0] NCCL INFO Channel 02/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 10/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Connected all rings 3: jzxh121:127842:129320 [1] NCCL INFO Channel 01/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 02/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 01/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Connected all rings 1: jzxh119:826469:827962 [2] NCCL INFO Channel 01/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 01/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 10/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 07/0 : 14[2] -> 15[3] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 01/0 : 10[2] -> 11[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 09/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 07/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 09/0 : 14[2] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 02/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 09/0 : 6[2] -> 7[3] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 07/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 02/0 : 4[0] -> 6[2] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 15/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 15/0 : 14[2] -> 15[3] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 03/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 02/0 : 0[0] -> 2[2] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 09/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 02/0 : 1[1] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 03/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 05/0 : 12[0] -> 14[2] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 02/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 09/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 02/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 15/0 : 10[2] -> 11[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 02/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 06/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 05/0 : 4[0] -> 6[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 03/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 03/0 : 0[0] -> 2[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 03/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 03/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 06/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 02/0 : 9[1] -> 11[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 05/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 10/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 05/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 05/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 03/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 05/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 06/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 05/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 11/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 06/0 : 1[1] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 10/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 03/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 06/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 10/0 : 1[1] -> 3[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 10/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 06/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 06/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 13/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 11/0 : 1[1] -> 3[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 11/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 9[1] -> 11[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 11/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 14/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 13/0 : 1[1] -> 3[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 10/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 10/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 13/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 06/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 14/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 10/0 : 0[0] -> 2[2] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 13/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 11/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 11/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 10/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 11/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 14/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 13/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 14/0 : 8[0] -> 10[2] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 13/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 11/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 14/0 : 0[0] -> 2[2] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 14/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 14/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 13/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 04/0 : 0[0] -> 3[3] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 10/0 : 2[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 04/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 14/0 : 2[2] -> 6[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 12/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 14/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 04/0 : 4[0] -> 7[3] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 01/0 : 9[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 02/0 : 10[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 1[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 05/0 : 9[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 06/0 : 10[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 12/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 2[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 1[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 2[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 12/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 00/0 : 8[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 03/0 : 11[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 02/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 01/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 06/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 05/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 04/0 : 8[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 07/0 : 11[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 08/0 : 0[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 3[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 0[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 3[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 09/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 00/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 13/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 10/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 03/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 04/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 07/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 9[1] -> 13[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 08/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 14/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 9[1] -> 13[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 10[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 10[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 09/0 : 5[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 00/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 11/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 08/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 15/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 13/0 : 5[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [send] via NET/IB/1/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 09/0 : 13[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 12/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 13/0 : 13[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 11[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 00/0 : 8[0] -> 12[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 14[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 11[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 8[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 14[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 6[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 6[2] -> 14[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 10/0 : 6[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 08/0 : 4[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 14/0 : 6[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 09/0 : 1[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 3[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 10/0 : 6[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 13/0 : 1[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 3[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 14/0 : 6[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 10/0 : 14[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 12/0 : 4[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 14/0 : 14[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 08/0 : 12[0] -> 4[0] [send] via NET/IB/0/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 11/0 : 7[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 12/0 : 12[0] -> 4[0] [send] via NET/IB/0/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 15/0 : 7[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 02/0 : 14[2] -> 10[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 11/0 : 15[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 11/0 : 3[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 08/0 : 0[0] -> 4[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 02/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 12/0 : 0[0] -> 4[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 15/0 : 3[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 11[3] -> 3[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 11[3] -> 3[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 14[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 14[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 06/0 : 14[2] -> 10[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 15/0 : 15[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 03/0 : 11[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 09/0 : 5[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 06/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 10/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 02/0 : 14[2] -> 12[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 07/0 : 11[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 13[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 14/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734704:736202 [1] NCCL INFO Channel 13/0 : 5[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 13[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 02/0 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 6[2] -> 2[2] [send] via NET/IB/2/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 5[1] -> 13[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 6[2] -> 2[2] [send] via NET/IB/2/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 03/0 : 3[3] -> 11[3] [send] via NET/IB/3/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 07/0 : 3[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 02/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 03/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 13[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 3: jzxh121:127842:129320 [1] NCCL INFO Channel 01/0 : 13[1] -> 9[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 13[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 0: jzxh118:734705:736204 [2] NCCL INFO Channel 03/0 : 2[2] -> 0[0] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Channel 08/0 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 11/0 : 7[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh118:734703:736201 [0] NCCL INFO Channel 12/0 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 5[1] -> 13[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 01/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 03/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 05/0 : 13[1] -> 9[1] [send] via NET/IB/1/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 15/0 : 7[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 15[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 08/0 : 12[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 05/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 05/0 : 14[2] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 05/0 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 15[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 12[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 09/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 02/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 01/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 7[3] -> 15[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 08/0 : 4[0] -> 12[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 7[3] -> 15[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 4[0] -> 12[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 05/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 03/0 : 6[2] -> 4[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 06/0 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 00/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 00/0 : 12[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 06/0 : 14[2] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 10/0 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 03/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 05/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh121:127841:129321 [0] NCCL INFO Channel 00/0 : 12[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 2: jzxh120:230986:232479 [1] NCCL INFO Channel 13/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 03/0 : 15[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 04/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 12[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh121:127841:129321 [0] NCCL INFO Channel 04/0 : 12[0] -> 8[0] [send] via NET/IB/0/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 10/0 : 14[2] -> 12[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 07/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 15[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 07/0 : 15[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 09/0 : 5[1] -> 1[1] [send] via NET/IB/1/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 05/0 : 6[2] -> 4[0] via P2P/CUMEM 2: jzxh120:230985:232478 [0] NCCL INFO Channel 00/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 08/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 15[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230987:232476 [2] NCCL INFO Channel 06/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh119:826468:827964 [1] NCCL INFO Channel 13/0 : 5[1] -> 1[1] [send] via NET/IB/1/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 04/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 08/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 06/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 08/0 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA 0: jzxh118:734706:736203 [3] NCCL INFO Channel 01/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 11/0 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 7[3] -> 3[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 3: jzxh121:127843:129318 [2] NCCL INFO Channel 11/0 : 14[2] -> 12[0] via P2P/CUMEM 1: jzxh119:826467:827965 [0] NCCL INFO Channel 12/0 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA 2: jzxh120:230985:232478 [0] NCCL INFO Channel 12/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 3: jzxh121:127844:129319 [3] NCCL INFO Channel 01/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 7[3] -> 3[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 11/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 1: jzxh119:826469:827962 [2] NCCL INFO Channel 10/0 : 6[2] -> 4[0] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 10/0 : 10[2] -> 8[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 15/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh120:230988:232477 [3] NCCL INFO Channel 01/0 : 11[3] -> 8[0] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 11/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 01/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 13/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 04/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 11/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 04/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 04/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 04/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 05/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 13/0 : 10[2] -> 8[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 05/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 14/0 : 14[2] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 13/0 : 6[2] -> 4[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 06/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 05/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 14/0 : 10[2] -> 8[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 05/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 06/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 09/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 14/0 : 6[2] -> 4[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 06/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 09/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 13/0 : 2[2] -> 0[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 12/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 06/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 09/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 12/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 14/0 : 2[2] -> 0[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 13/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 12/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 13/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 13/0 : 11[3] -> 8[0] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 14/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 14/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 14/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 02/0 : 15[3] -> 13[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 03/0 : 15[3] -> 13[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 02/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 05/0 : 15[3] -> 13[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 03/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 06/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 02/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 05/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 10/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 03/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 06/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 11/0 : 15[3] -> 13[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 10/0 : 11[3] -> 9[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 05/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 09/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 13/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 12/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 11/0 : 11[3] -> 9[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 06/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 13/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 14/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 10/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 14/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 13/0 : 11[3] -> 9[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 11/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 14/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 00/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 02/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 13/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 03/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 00/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 04/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 14/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 05/0 : 7[3] -> 5[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 06/0 : 7[3] -> 5[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 07/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 10/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 04/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 08/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 11/0 : 7[3] -> 5[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 13/0 : 7[3] -> 5[1] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 12/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 14/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 07/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh121:127844:129319 [3] NCCL INFO Channel 15/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 00/0 : 7[3] -> 6[2] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 08/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 00/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 04/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 01/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 12/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 07/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 04/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh120:230988:232477 [3] NCCL INFO Channel 15/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 07/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 08/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 08/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 09/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 12/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 12/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh118:734706:736203 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 00/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh121:127843:129318 [2] NCCL INFO Channel 15/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh119:826470:827963 [3] NCCL INFO Channel 15/0 : 7[3] -> 6[2] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 01/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 00/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 04/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 03/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 07/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 07/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 08/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 08/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 09/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 12/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 11/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 00/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh120:230987:232476 [2] NCCL INFO Channel 15/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh120:230986:232479 [1] NCCL INFO Channel 15/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 03/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh118:734705:736204 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh118:734704:736202 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 07/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 08/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 11/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 00/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh119:826468:827964 [1] NCCL INFO Channel 15/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 01/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 04/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 07/0 : 6[2] -> 5[1] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 00/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 08/0 : 6[2] -> 5[1] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 03/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 07/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 09/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 12/0 : 6[2] -> 5[1] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 08/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh119:826469:827962 [2] NCCL INFO Channel 15/0 : 6[2] -> 5[1] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 11/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh121:127842:129320 [1] NCCL INFO Channel 15/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh118:734703:736201 [0] NCCL INFO Connected all trees 0: jzxh118:734706:736203 [3] NCCL INFO Connected all trees 0: jzxh118:734706:736203 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh118:734706:736203 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh118:734705:736204 [2] NCCL INFO Connected all trees 0: jzxh118:734705:736204 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh118:734705:736204 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh118:734703:736201 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh118:734703:736201 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh118:734704:736202 [1] NCCL INFO Connected all trees 0: jzxh118:734704:736202 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh118:734704:736202 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127841:129321 [0] NCCL INFO Connected all trees 2: jzxh120:230985:232478 [0] NCCL INFO Connected all trees 1: jzxh119:826467:827965 [0] NCCL INFO Connected all trees 1: jzxh119:826470:827963 [3] NCCL INFO Connected all trees 1: jzxh119:826467:827965 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh119:826467:827965 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh119:826470:827963 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh119:826470:827963 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230985:232478 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh120:230985:232478 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127841:129321 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127841:129321 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh119:826468:827964 [1] NCCL INFO Connected all trees 1: jzxh119:826469:827962 [2] NCCL INFO Connected all trees 2: jzxh120:230988:232477 [3] NCCL INFO Connected all trees 2: jzxh120:230986:232479 [1] NCCL INFO Connected all trees 2: jzxh120:230988:232477 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh120:230987:232476 [2] NCCL INFO Connected all trees 2: jzxh120:230988:232477 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127844:129319 [3] NCCL INFO Connected all trees 1: jzxh119:826468:827964 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh119:826469:827962 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh119:826468:827964 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh119:826469:827962 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230986:232479 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh120:230986:232479 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh120:230987:232476 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh120:230987:232476 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127844:129319 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127844:129319 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127842:129320 [1] NCCL INFO Connected all trees 3: jzxh121:127843:129318 [2] NCCL INFO Connected all trees 3: jzxh121:127842:129320 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127842:129320 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127843:129318 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh121:127843:129318 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh121:127841:129321 [0] NCCL INFO ncclCommInitRank comm 0x14895813f700 rank 12 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0xca6cf876c34ea956 - Init COMPLETE 3: jzxh121:127843:129318 [2] NCCL INFO ncclCommInitRank comm 0x1530ac14b0d0 rank 14 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0xca6cf876c34ea956 - Init COMPLETE 3: jzxh121:127844:129319 [3] NCCL INFO ncclCommInitRank comm 0x1512d813fa80 rank 15 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0xca6cf876c34ea956 - Init COMPLETE 3: jzxh121:127842:129320 [1] NCCL INFO ncclCommInitRank comm 0x14cb04147d00 rank 13 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0xca6cf876c34ea956 - Init COMPLETE 0: jzxh118:734705:736204 [2] NCCL INFO ncclCommInitRank comm 0x14b36813a080 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0xca6cf876c34ea956 - Init COMPLETE 0: jzxh118:734703:736201 [0] NCCL INFO ncclCommInitRank comm 0x14ca2414c310 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0xca6cf876c34ea956 - Init COMPLETE 0: jzxh118:734706:736203 [3] NCCL INFO ncclCommInitRank comm 0x14a250139f50 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0xca6cf876c34ea956 - Init COMPLETE 0: jzxh118:734704:736202 [1] NCCL INFO ncclCommInitRank comm 0x14b4c0124180 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0xca6cf876c34ea956 - Init COMPLETE 2: jzxh120:230985:232478 [0] NCCL INFO ncclCommInitRank comm 0x149348151b00 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0xca6cf876c34ea956 - Init COMPLETE 2: jzxh120:230988:232477 [3] NCCL INFO ncclCommInitRank comm 0x15368412ef80 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0xca6cf876c34ea956 - Init COMPLETE 2: jzxh120:230987:232476 [2] NCCL INFO ncclCommInitRank comm 0x14c2b81361c0 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0xca6cf876c34ea956 - Init COMPLETE 2: jzxh120:230986:232479 [1] NCCL INFO ncclCommInitRank comm 0x14b07412e240 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0xca6cf876c34ea956 - Init COMPLETE 1: jzxh119:826467:827965 [0] NCCL INFO ncclCommInitRank comm 0x15454013e2c0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0xca6cf876c34ea956 - Init COMPLETE 1: jzxh119:826469:827962 [2] NCCL INFO ncclCommInitRank comm 0x14b39c13f610 rank 6 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0xca6cf876c34ea956 - Init COMPLETE 1: jzxh119:826470:827963 [3] NCCL INFO ncclCommInitRank comm 0x14965012e200 rank 7 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0xca6cf876c34ea956 - Init COMPLETE 1: jzxh119:826468:827964 [1] NCCL INFO ncclCommInitRank comm 0x14c404138dc0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0xca6cf876c34ea956 - Init COMPLETE 0: {'loss': 0.8119, 'grad_norm': 3.7927233901901705, 'learning_rate': 3.62e-07, 'memory/max_mem_active(gib)': 67.99, 'memory/max_mem_allocated(gib)': 67.66, 'memory/device_mem_reserved(gib)': 76.06, 'epoch': 0.0} 0: 0%| | 0/3723 [00:00