diff --git "a/stage2/open-stage2.log" "b/stage2/open-stage2.log" new file mode 100644--- /dev/null +++ "b/stage2/open-stage2.log" @@ -0,0 +1,20617 @@ +2025-10-25 10:23:48,860 - root - INFO - Starting training. +2025-10-25 10:23:48,861 - root - INFO - Loading config from jobs/munin-7b-open-stage2/config.json +2025-10-25 10:23:48,967 - root - INFO - Starting training. +2025-10-25 10:23:48,967 - root - INFO - Loading config from jobs/munin-7b-open-stage2/config.json +2025-10-25 10:23:48,983 - root - INFO - Starting training. +2025-10-25 10:23:48,983 - root - INFO - Loading config from jobs/munin-7b-open-stage2/config.json +2025-10-25 10:23:49,004 - root - INFO - Starting training. +2025-10-25 10:23:49,004 - root - INFO - Loading config from jobs/munin-7b-open-stage2/config.json +2025-10-25 10:23:49,175 - root - INFO - Starting training. +2025-10-25 10:23:49,175 - root - INFO - Loading config from jobs/munin-7b-open-stage2/config.json +2025-10-25 10:23:49,176 - root - INFO - Starting training. +2025-10-25 10:23:49,176 - root - INFO - Loading config from jobs/munin-7b-open-stage2/config.json +2025-10-25 10:23:49,184 - root - INFO - Starting training. +2025-10-25 10:23:49,184 - root - INFO - Loading config from jobs/munin-7b-open-stage2/config.json +2025-10-25 10:23:49,245 - root - INFO - Starting training. +2025-10-25 10:23:49,245 - root - INFO - Loading config from jobs/munin-7b-open-stage2/config.json +2025-10-25 10:23:50,204 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-25 10:23:50,205 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-25 10:23:50,206 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-25 10:23:50,533 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-25 10:23:50,535 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-25 10:23:50,535 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-25 10:23:50,542 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-25 10:23:50,592 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-25 10:23:50,592 - root - INFO - GPU capacity: NVIDIA B200 (0) with 178.36GiB memory +2025-10-25 10:23:50,596 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-25 10:23:50,630 - root - INFO - Applied FSDP to the model +2025-10-25 10:23:50,630 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-25 10:23:50,862 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-25 10:23:50,899 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-25 10:23:50,899 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-25 10:23:50,901 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-25 10:23:50,901 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-25 10:23:50,901 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-25 10:23:50,902 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-25 10:23:50,911 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-25 10:23:50,912 - root - INFO - GPU capacity: NVIDIA B200 (1) with 178.36GiB memory +2025-10-25 10:23:50,915 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-25 10:23:50,948 - root - INFO - Applied FSDP to the model +2025-10-25 10:23:50,948 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-25 10:23:51,102 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-25 10:23:51,104 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-25 10:23:51,104 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-25 10:23:51,222 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-25 10:23:51,239 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-25 10:23:51,240 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-25 10:23:51,241 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-25 10:23:51,249 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-25 10:23:51,273 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-25 10:23:51,273 - root - INFO - GPU capacity: NVIDIA B200 (4) with 178.36GiB memory +2025-10-25 10:23:51,276 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-25 10:23:51,278 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-25 10:23:51,280 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-25 10:23:51,281 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-25 10:23:51,299 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-25 10:23:51,299 - root - INFO - GPU capacity: NVIDIA B200 (7) with 178.36GiB memory +2025-10-25 10:23:51,302 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-25 10:23:51,313 - root - INFO - Applied FSDP to the model +2025-10-25 10:23:51,313 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-25 10:23:51,335 - root - INFO - Applied FSDP to the model +2025-10-25 10:23:51,336 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-25 10:23:51,423 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-25 10:23:51,450 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-25 10:23:51,452 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-25 10:23:51,452 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-25 10:23:51,474 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-25 10:23:51,474 - root - INFO - GPU capacity: NVIDIA B200 (5) with 178.36GiB memory +2025-10-25 10:23:51,477 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-25 10:23:51,511 - root - INFO - Applied FSDP to the model +2025-10-25 10:23:51,511 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-25 10:23:51,573 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-25 10:23:51,605 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-25 10:23:51,623 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-25 10:23:51,623 - root - INFO - GPU capacity: NVIDIA B200 (2) with 178.36GiB memory +2025-10-25 10:23:51,627 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-25 10:23:51,655 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-25 10:23:51,655 - root - INFO - GPU capacity: NVIDIA B200 (3) with 178.36GiB memory +2025-10-25 10:23:51,659 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-25 10:23:51,660 - root - INFO - Applied FSDP to the model +2025-10-25 10:23:51,661 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-25 10:23:51,692 - root - INFO - Applied FSDP to the model +2025-10-25 10:23:51,693 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-25 10:23:51,787 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-25 10:23:51,837 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-25 10:23:51,837 - root - INFO - GPU capacity: NVIDIA B200 (6) with 178.36GiB memory +2025-10-25 10:23:51,841 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-25 10:23:51,877 - root - INFO - Applied FSDP to the model +2025-10-25 10:23:51,878 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-25 10:24:16,574 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-25 10:24:16,574 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-25 10:24:16,574 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-25 10:24:16,574 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-25 10:24:16,575 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-25 10:24:16,575 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-25 10:24:16,575 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-25 10:24:16,575 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +2025-10-25 10:24:17,077 - root - INFO - Loaded cached document counts in 7.987022399902344e-05 seconds +2025-10-25 10:24:17,077 - root - INFO - Loaded cached document counts in 0.00010371208190917969 seconds +2025-10-25 10:24:17,077 - root - INFO - Loaded cached document counts in 8.249282836914062e-05 seconds +2025-10-25 10:24:17,077 - root - INFO - Loaded cached document counts in 7.772445678710938e-05 seconds +2025-10-25 10:24:17,077 - root - INFO - Loaded cached document counts in 6.723403930664062e-05 seconds +2025-10-25 10:24:17,077 - root - INFO - Loaded cached document counts in 6.079673767089844e-05 seconds +2025-10-25 10:24:17,077 - root - INFO - Loaded cached document counts in 6.604194641113281e-05 seconds +2025-10-25 10:24:17,077 - root - INFO - Loaded cached document counts in 8.463859558105469e-05 seconds +2025-10-25 10:24:17,078 - root - INFO - Worker 0 responsible for docs: [('/work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet', 0, 945398)] +2025-10-25 10:24:17,078 - root - INFO - Total docs: 945399 +2025-10-25 10:24:17,078 - root - INFO - Worker 0 assembled subdataset iterator for /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/, 1 of 1 +No valid checkpoint detected at jobs/munin-7b-open-pt/checkpoints/dataloader, dataset starting from scratch. +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: tok_embeddings.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.0._orig_mod.attention.wq.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.0._orig_mod.attention.wk.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.0._orig_mod.attention.wv.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.0._orig_mod.attention.wo.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.0._orig_mod.attention_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.0._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.1._orig_mod.attention.wq.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.1._orig_mod.attention.wk.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.1._orig_mod.attention.wv.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.1._orig_mod.attention.wo.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.1._orig_mod.attention_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.1._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.2._orig_mod.attention.wq.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.2._orig_mod.attention.wk.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.2._orig_mod.attention.wv.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.2._orig_mod.attention.wo.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.2._orig_mod.attention_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.2._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.3._orig_mod.attention.wq.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.3._orig_mod.attention.wk.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.3._orig_mod.attention.wv.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.3._orig_mod.attention.wo.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.3._orig_mod.attention_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.3._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.4._orig_mod.attention.wq.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.4._orig_mod.attention.wk.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.4._orig_mod.attention.wv.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.4._orig_mod.attention.wo.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.4._orig_mod.attention_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.4._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.5._orig_mod.attention.wq.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.5._orig_mod.attention.wk.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.5._orig_mod.attention.wv.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.5._orig_mod.attention.wo.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.5._orig_mod.attention_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.5._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.6._orig_mod.attention.wq.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.6._orig_mod.attention.wk.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.6._orig_mod.attention.wv.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.6._orig_mod.attention.wo.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.6._orig_mod.attention_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Nodecay weight: layers.6._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.7._orig_mod.attention.wq.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.7._orig_mod.attention.wk.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.7._orig_mod.attention.wv.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.7._orig_mod.attention.wo.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,079 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.7._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.7._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.8._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.8._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.8._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.8._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.8._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.8._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.9._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.9._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.9._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.9._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.9._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.9._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.10._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.10._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.10._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.10._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.10._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.10._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.11._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.11._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.11._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.11._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.11._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.11._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.12._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.12._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.12._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.12._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.12._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.12._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.13._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.13._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.13._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.13._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.13._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.13._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.14._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.14._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.14._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.14._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.14._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.14._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.15._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.15._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.15._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.15._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.15._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.15._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.16._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.16._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.16._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.16._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.16._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.16._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.17._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.17._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.17._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.17._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.17._orig_mod.attention_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Nodecay weight: layers.17._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.18._orig_mod.attention.wq.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.18._orig_mod.attention.wk.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.18._orig_mod.attention.wv.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.18._orig_mod.attention.wo.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,080 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.18._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.18._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.19._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.19._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.19._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.19._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.19._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.19._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.20._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.20._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.20._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.20._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.20._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.20._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.21._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.21._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.21._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.21._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.21._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.21._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.22._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.22._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.22._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.22._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.22._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.22._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.23._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.23._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.23._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.23._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.23._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.23._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.24._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.24._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.24._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.24._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.24._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.24._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.25._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.25._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.25._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.25._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.25._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.25._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.26._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.26._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.26._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.26._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.26._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.26._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.27._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.27._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.27._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.27._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.27._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.27._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.28._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.28._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.28._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.28._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.28._orig_mod.attention_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.28._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.29._orig_mod.attention.wq.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.29._orig_mod.attention.wk.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.29._orig_mod.attention.wv.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.29._orig_mod.attention.wo.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,081 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,081 - root - INFO - Nodecay weight: layers.29._orig_mod.attention_norm.weight +2025-10-25 10:24:17,082 - root - INFO - Nodecay weight: layers.29._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.30._orig_mod.attention.wq.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.30._orig_mod.attention.wk.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.30._orig_mod.attention.wv.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.30._orig_mod.attention.wo.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,082 - root - INFO - Nodecay weight: layers.30._orig_mod.attention_norm.weight +2025-10-25 10:24:17,082 - root - INFO - Nodecay weight: layers.30._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.31._orig_mod.attention.wq.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.31._orig_mod.attention.wk.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.31._orig_mod.attention.wv.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.31._orig_mod.attention.wo.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w1.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w2.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w3.weight +2025-10-25 10:24:17,082 - root - INFO - Nodecay weight: layers.31._orig_mod.attention_norm.weight +2025-10-25 10:24:17,082 - root - INFO - Nodecay weight: layers.31._orig_mod.ffn_norm.weight +2025-10-25 10:24:17,082 - root - INFO - Nodecay weight: norm.weight +2025-10-25 10:24:17,082 - root - INFO - Decay weight: output.weight +2025-10-25 10:24:17,643 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-pt/checkpoints +2025-10-25 10:24:17,650 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-pt/checkpoints +2025-10-25 10:24:17,650 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-pt/checkpoints +2025-10-25 10:24:17,677 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-pt/checkpoints +2025-10-25 10:24:17,678 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-pt/checkpoints +2025-10-25 10:24:17,680 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-pt/checkpoints +2025-10-25 10:24:17,682 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-pt/checkpoints +2025-10-25 10:24:17,687 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-pt/checkpoints +2025-10-25 10:24:17,710 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-stage1/checkpoints/step-37852/ +2025-10-25 10:24:17,710 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-stage1/checkpoints/step-37852/ +2025-10-25 10:24:17,710 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-stage1/checkpoints/step-37852/ +2025-10-25 10:24:17,710 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-stage1/checkpoints/step-37852/ +2025-10-25 10:24:17,711 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-stage1/checkpoints/step-37852/ +2025-10-25 10:24:17,711 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-stage1/checkpoints/step-37852/ +2025-10-25 10:24:17,711 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-stage1/checkpoints/step-37852/ +2025-10-25 10:24:17,711 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-stage1/checkpoints/step-37852/ +2025-10-25 10:24:23,456 - root - INFO - Loaded model-only checkpoint from forced path in 5.75 seconds +2025-10-25 10:24:23,456 - root - INFO - Loaded model-only checkpoint from forced path in 5.75 seconds +2025-10-25 10:24:23,457 - root - INFO - Loaded model-only checkpoint from forced path in 5.75 seconds +2025-10-25 10:24:23,457 - root - INFO - Loaded model-only checkpoint from forced path in 5.75 seconds +2025-10-25 10:24:23,457 - root - INFO - Loaded model-only checkpoint from forced path in 5.75 seconds +2025-10-25 10:24:23,457 - root - INFO - Loaded model-only checkpoint from forced path in 5.75 seconds +2025-10-25 10:24:23,457 - root - INFO - Loaded model-only checkpoint from forced path in 5.75 seconds +2025-10-25 10:24:23,457 - root - INFO - Loaded model-only checkpoint from forced path in 5.75 seconds +2025-10-25 10:24:23,479 - root - INFO - Training starts at step 0 +2025-10-25 10:24:23,479 - root - INFO - Training starts at step 0 +2025-10-25 10:24:23,480 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-pt/traces +2025-10-25 10:24:23,480 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-pt/traces +2025-10-25 10:24:23,480 - root - INFO - Training starts at step 0 +2025-10-25 10:24:23,480 - root - INFO - Training starts at step 0 +2025-10-25 10:24:23,480 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-pt/traces +2025-10-25 10:24:23,480 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-pt/traces +2025-10-25 10:24:23,480 - root - INFO - Training starts at step 0 +2025-10-25 10:24:23,481 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-pt/traces +2025-10-25 10:24:23,481 - root - INFO - Training starts at step 0 +2025-10-25 10:24:23,481 - root - INFO - Training starts at step 0 +2025-10-25 10:24:23,481 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-pt/traces +2025-10-25 10:24:23,481 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-pt/traces +2025-10-25 10:24:23,482 - root - INFO - Training starts at step 0 +2025-10-25 10:24:23,482 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-pt/traces +2025-10-25 10:24:23,482 - root - INFO - ParquetDataset: entering epoch 0 +2025-10-25 10:24:23,482 - root - INFO - Worker 0 opening new file /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet +2025-10-25 10:24:42,205 - root - INFO - Step 1: lr=4.00E-08, loss= 1.1054 (max= 1.7935), tps=3501, mfu=7.29%, memory: 150.54GiB(84.40%) time/data_loading=2.00s (max=2.64s, 28.25%) +2025-10-25 10:24:42,205 - root - INFO - Step 1: lr=4.00E-08, loss= 1.1054 (max= 1.7935), tps=3501, mfu=7.29%, memory: 150.54GiB(84.40%) time/data_loading=2.00s (max=2.64s, 28.25%) +2025-10-25 10:24:42,205 - root - INFO - Step 1: lr=4.00E-08, loss= 1.1054 (max= 1.7935), tps=3501, mfu=7.29%, memory: 150.54GiB(84.40%) time/data_loading=2.00s (max=2.64s, 28.25%) +2025-10-25 10:24:42,205 - root - INFO - Step 1: lr=4.00E-08, loss= 1.1054 (max= 1.7935), tps=3501, mfu=7.29%, memory: 150.54GiB(84.40%) time/data_loading=2.00s (max=2.64s, 28.25%) +2025-10-25 10:24:42,205 - root - INFO - Step 1: lr=4.00E-08, loss= 1.1054 (max= 1.7935), tps=3501, mfu=7.29%, memory: 150.54GiB(84.40%) time/data_loading=2.00s (max=2.64s, 28.25%) +2025-10-25 10:24:42,205 - root - INFO - Step 1: lr=4.00E-08, loss= 1.1054 (max= 1.7935), tps=3501, mfu=7.29%, memory: 150.54GiB(84.40%) time/data_loading=2.00s (max=2.64s, 28.25%) +2025-10-25 10:24:42,205 - root - INFO - Step 1: lr=4.00E-08, loss= 1.1054 (max= 1.7935), tps=3501, mfu=7.30%, memory: 150.54GiB(84.40%) time/data_loading=2.00s (max=2.64s, 28.25%) +2025-10-25 10:24:42,205 - root - INFO - Step 1: lr=4.00E-08, loss= 1.1054 (max= 1.7935), tps=3501, mfu=7.30%, memory: 150.54GiB(84.40%) time/data_loading=2.00s (max=2.64s, 28.25%) +2025-10-25 10:24:42,205 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-25 10:24:42,205 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-25 10:24:42,205 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-25 10:24:42,205 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-25 10:24:42,205 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-25 10:24:42,206 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-25 10:24:42,206 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-25 10:24:42,206 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-25 10:25:11,016 - root - INFO - Step 10: lr=2.20E-07, loss= 1.2078 (max= 1.9146), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:11,016 - root - INFO - Step 10: lr=2.20E-07, loss= 1.2078 (max= 1.9146), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:11,016 - root - INFO - Step 10: lr=2.20E-07, loss= 1.2078 (max= 1.9146), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:11,016 - root - INFO - Step 10: lr=2.20E-07, loss= 1.2078 (max= 1.9146), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:11,016 - root - INFO - Step 10: lr=2.20E-07, loss= 1.2078 (max= 1.9146), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:11,016 - root - INFO - Step 10: lr=2.20E-07, loss= 1.2078 (max= 1.9146), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:11,017 - root - INFO - Step 10: lr=2.20E-07, loss= 1.2078 (max= 1.9146), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:11,017 - root - INFO - Step 10: lr=2.20E-07, loss= 1.2078 (max= 1.9146), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:11,193 - root - INFO - Dumping traces at step 10 +2025-10-25 10:25:11,201 - root - INFO - Dumping traces at step 10 +2025-10-25 10:25:11,202 - root - INFO - Dumping traces at step 10 +2025-10-25 10:25:11,209 - root - INFO - Dumping traces at step 10 +2025-10-25 10:25:11,213 - root - INFO - Dumping traces at step 10 +2025-10-25 10:25:11,216 - root - INFO - Dumping traces at step 10 +2025-10-25 10:25:11,217 - root - INFO - Dumping traces at step 10 +2025-10-25 10:25:11,218 - root - INFO - Dumping traces at step 10 +2025-10-25 10:25:11,284 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:11,285 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-25 10:25:11,285 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-25 10:25:11,293 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-25 10:25:11,299 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:11,300 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-25 10:25:11,301 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-25 10:25:11,303 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:43,224 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1831 (max= 1.8162), tps=20350, mfu=42.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:43,225 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1831 (max= 1.8162), tps=20349, mfu=42.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:43,225 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1831 (max= 1.8162), tps=20349, mfu=42.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:43,225 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1831 (max= 1.8162), tps=20349, mfu=42.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:43,225 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1831 (max= 1.8162), tps=20350, mfu=42.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:43,225 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1831 (max= 1.8162), tps=20349, mfu=42.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:43,225 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1831 (max= 1.8162), tps=20350, mfu=42.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:43,225 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1831 (max= 1.8162), tps=20350, mfu=42.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:25:43,320 - root - INFO - Dumping traces at step 20 +2025-10-25 10:25:43,320 - root - INFO - Dumping traces at step 20 +2025-10-25 10:25:43,324 - root - INFO - Dumping traces at step 20 +2025-10-25 10:25:43,328 - root - INFO - Dumping traces at step 20 +2025-10-25 10:25:43,329 - root - INFO - Dumping traces at step 20 +2025-10-25 10:25:43,329 - root - INFO - Dumping traces at step 20 +2025-10-25 10:25:43,333 - root - INFO - Dumping traces at step 20 +2025-10-25 10:25:43,335 - root - INFO - Dumping traces at step 20 +2025-10-25 10:25:43,406 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:43,410 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:43,410 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:43,415 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:43,418 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:43,422 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:43,422 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:25:43,422 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:15,346 - root - INFO - Step 30: lr=6.20E-07, loss= 1.1546 (max= 1.8563), tps=20405, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:15,346 - root - INFO - Step 30: lr=6.20E-07, loss= 1.1546 (max= 1.8563), tps=20405, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:15,346 - root - INFO - Step 30: lr=6.20E-07, loss= 1.1546 (max= 1.8563), tps=20405, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:15,346 - root - INFO - Step 30: lr=6.20E-07, loss= 1.1546 (max= 1.8563), tps=20405, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:15,346 - root - INFO - Step 30: lr=6.20E-07, loss= 1.1546 (max= 1.8563), tps=20405, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:15,346 - root - INFO - Step 30: lr=6.20E-07, loss= 1.1546 (max= 1.8563), tps=20405, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:15,346 - root - INFO - Step 30: lr=6.20E-07, loss= 1.1546 (max= 1.8563), tps=20405, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:15,346 - root - INFO - Step 30: lr=6.20E-07, loss= 1.1546 (max= 1.8563), tps=20405, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:15,447 - root - INFO - Dumping traces at step 30 +2025-10-25 10:26:15,447 - root - INFO - Dumping traces at step 30 +2025-10-25 10:26:15,448 - root - INFO - Dumping traces at step 30 +2025-10-25 10:26:15,451 - root - INFO - Dumping traces at step 30 +2025-10-25 10:26:15,451 - root - INFO - Dumping traces at step 30 +2025-10-25 10:26:15,451 - root - INFO - Dumping traces at step 30 +2025-10-25 10:26:15,457 - root - INFO - Dumping traces at step 30 +2025-10-25 10:26:15,464 - root - INFO - Dumping traces at step 30 +2025-10-25 10:26:15,534 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:15,536 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:15,540 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:15,545 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:15,546 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:15,546 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:15,548 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:15,553 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-25 10:26:47,314 - root - INFO - Step 40: lr=8.20E-07, loss= 1.1453 (max= 1.8250), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:47,314 - root - INFO - Step 40: lr=8.20E-07, loss= 1.1453 (max= 1.8250), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:47,315 - root - INFO - Step 40: lr=8.20E-07, loss= 1.1453 (max= 1.8250), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:47,315 - root - INFO - Step 40: lr=8.20E-07, loss= 1.1453 (max= 1.8250), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:47,315 - root - INFO - Step 40: lr=8.20E-07, loss= 1.1453 (max= 1.8250), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:47,315 - root - INFO - Step 40: lr=8.20E-07, loss= 1.1453 (max= 1.8250), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:47,315 - root - INFO - Step 40: lr=8.20E-07, loss= 1.1453 (max= 1.8250), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:26:47,315 - root - INFO - Step 40: lr=8.20E-07, loss= 1.1453 (max= 1.8250), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:19,185 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1616 (max= 1.6243), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:19,185 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1616 (max= 1.6243), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:19,185 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1616 (max= 1.6243), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:19,185 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1616 (max= 1.6243), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:19,185 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1616 (max= 1.6243), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:19,186 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1616 (max= 1.6243), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:19,186 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1616 (max= 1.6243), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:19,186 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1616 (max= 1.6243), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:51,058 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1443 (max= 1.5698), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:51,058 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1443 (max= 1.5698), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:51,058 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1443 (max= 1.5698), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:51,059 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1443 (max= 1.5698), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:51,059 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1443 (max= 1.5698), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:51,059 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1443 (max= 1.5698), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:51,059 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1443 (max= 1.5698), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:27:51,059 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1443 (max= 1.5698), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:22,930 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1656 (max= 1.5584), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:22,930 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1656 (max= 1.5584), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:22,930 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1656 (max= 1.5584), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:22,930 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1656 (max= 1.5584), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:22,930 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1656 (max= 1.5584), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:22,930 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1656 (max= 1.5584), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:22,930 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1656 (max= 1.5584), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:22,930 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1656 (max= 1.5584), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:54,726 - root - INFO - Step 80: lr=1.62E-06, loss= 1.1321 (max= 1.5080), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:54,726 - root - INFO - Step 80: lr=1.62E-06, loss= 1.1321 (max= 1.5080), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:54,726 - root - INFO - Step 80: lr=1.62E-06, loss= 1.1321 (max= 1.5080), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:54,726 - root - INFO - Step 80: lr=1.62E-06, loss= 1.1321 (max= 1.5080), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:54,726 - root - INFO - Step 80: lr=1.62E-06, loss= 1.1321 (max= 1.5080), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:54,726 - root - INFO - Step 80: lr=1.62E-06, loss= 1.1321 (max= 1.5080), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:54,726 - root - INFO - Step 80: lr=1.62E-06, loss= 1.1321 (max= 1.5080), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:28:54,726 - root - INFO - Step 80: lr=1.62E-06, loss= 1.1321 (max= 1.5080), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:26,627 - root - INFO - Step 90: lr=1.82E-06, loss= 1.1582 (max= 1.6911), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:26,627 - root - INFO - Step 90: lr=1.82E-06, loss= 1.1582 (max= 1.6911), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:26,627 - root - INFO - Step 90: lr=1.82E-06, loss= 1.1582 (max= 1.6911), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:26,627 - root - INFO - Step 90: lr=1.82E-06, loss= 1.1582 (max= 1.6911), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:26,627 - root - INFO - Step 90: lr=1.82E-06, loss= 1.1582 (max= 1.6911), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:26,627 - root - INFO - Step 90: lr=1.82E-06, loss= 1.1582 (max= 1.6911), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:26,628 - root - INFO - Step 90: lr=1.82E-06, loss= 1.1582 (max= 1.6911), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:26,628 - root - INFO - Step 90: lr=1.82E-06, loss= 1.1582 (max= 1.6911), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:58,406 - root - INFO - Step 100: lr=2.02E-06, loss= 1.1164 (max= 1.5130), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:58,406 - root - INFO - Step 100: lr=2.02E-06, loss= 1.1164 (max= 1.5130), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:58,406 - root - INFO - Step 100: lr=2.02E-06, loss= 1.1164 (max= 1.5130), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:58,406 - root - INFO - Step 100: lr=2.02E-06, loss= 1.1164 (max= 1.5130), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:58,406 - root - INFO - Step 100: lr=2.02E-06, loss= 1.1164 (max= 1.5130), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:58,406 - root - INFO - Step 100: lr=2.02E-06, loss= 1.1164 (max= 1.5130), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:58,407 - root - INFO - Step 100: lr=2.02E-06, loss= 1.1164 (max= 1.5130), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:29:58,407 - root - INFO - Step 100: lr=2.02E-06, loss= 1.1164 (max= 1.5130), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:30:30,235 - root - INFO - Step 110: lr=2.22E-06, loss= 1.1537 (max= 1.5741), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:30:30,235 - root - INFO - Step 110: lr=2.22E-06, loss= 1.1537 (max= 1.5741), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:30:30,235 - root - INFO - Step 110: lr=2.22E-06, loss= 1.1537 (max= 1.5741), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:30:30,235 - root - INFO - Step 110: lr=2.22E-06, loss= 1.1537 (max= 1.5741), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:30:30,235 - root - INFO - Step 110: lr=2.22E-06, loss= 1.1537 (max= 1.5741), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:30:30,235 - root - INFO - Step 110: lr=2.22E-06, loss= 1.1537 (max= 1.5741), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:30:30,235 - root - INFO - Step 110: lr=2.22E-06, loss= 1.1537 (max= 1.5741), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:30:30,236 - root - INFO - Step 110: lr=2.22E-06, loss= 1.1537 (max= 1.5741), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:02,039 - root - INFO - Step 120: lr=2.42E-06, loss= 1.1382 (max= 1.5258), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:02,039 - root - INFO - Step 120: lr=2.42E-06, loss= 1.1382 (max= 1.5258), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:02,039 - root - INFO - Step 120: lr=2.42E-06, loss= 1.1382 (max= 1.5258), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:02,039 - root - INFO - Step 120: lr=2.42E-06, loss= 1.1382 (max= 1.5258), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:02,039 - root - INFO - Step 120: lr=2.42E-06, loss= 1.1382 (max= 1.5258), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:02,039 - root - INFO - Step 120: lr=2.42E-06, loss= 1.1382 (max= 1.5258), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:02,040 - root - INFO - Step 120: lr=2.42E-06, loss= 1.1382 (max= 1.5258), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:02,040 - root - INFO - Step 120: lr=2.42E-06, loss= 1.1382 (max= 1.5258), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:17,134 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:2732732 +2025-10-25 10:31:33,923 - root - INFO - Step 130: lr=2.62E-06, loss= 1.1566 (max= 1.6971), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:33,923 - root - INFO - Step 130: lr=2.62E-06, loss= 1.1566 (max= 1.6971), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:33,923 - root - INFO - Step 130: lr=2.62E-06, loss= 1.1566 (max= 1.6971), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:33,923 - root - INFO - Step 130: lr=2.62E-06, loss= 1.1566 (max= 1.6971), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:33,923 - root - INFO - Step 130: lr=2.62E-06, loss= 1.1566 (max= 1.6971), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:33,923 - root - INFO - Step 130: lr=2.62E-06, loss= 1.1566 (max= 1.6971), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:33,923 - root - INFO - Step 130: lr=2.62E-06, loss= 1.1566 (max= 1.6971), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:31:33,923 - root - INFO - Step 130: lr=2.62E-06, loss= 1.1566 (max= 1.6971), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:05,728 - root - INFO - Step 140: lr=2.82E-06, loss= 1.1331 (max= 1.6704), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:05,728 - root - INFO - Step 140: lr=2.82E-06, loss= 1.1331 (max= 1.6704), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:05,728 - root - INFO - Step 140: lr=2.82E-06, loss= 1.1331 (max= 1.6704), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:05,728 - root - INFO - Step 140: lr=2.82E-06, loss= 1.1331 (max= 1.6704), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:05,728 - root - INFO - Step 140: lr=2.82E-06, loss= 1.1331 (max= 1.6704), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:05,729 - root - INFO - Step 140: lr=2.82E-06, loss= 1.1331 (max= 1.6704), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:05,729 - root - INFO - Step 140: lr=2.82E-06, loss= 1.1331 (max= 1.6704), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:05,729 - root - INFO - Step 140: lr=2.82E-06, loss= 1.1331 (max= 1.6704), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:37,573 - root - INFO - Step 150: lr=3.02E-06, loss= 1.1521 (max= 1.5948), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:37,574 - root - INFO - Step 150: lr=3.02E-06, loss= 1.1521 (max= 1.5948), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:37,574 - root - INFO - Step 150: lr=3.02E-06, loss= 1.1521 (max= 1.5948), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:37,574 - root - INFO - Step 150: lr=3.02E-06, loss= 1.1521 (max= 1.5948), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:37,574 - root - INFO - Step 150: lr=3.02E-06, loss= 1.1521 (max= 1.5948), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:37,574 - root - INFO - Step 150: lr=3.02E-06, loss= 1.1521 (max= 1.5948), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:37,574 - root - INFO - Step 150: lr=3.02E-06, loss= 1.1521 (max= 1.5948), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:32:37,574 - root - INFO - Step 150: lr=3.02E-06, loss= 1.1521 (max= 1.5948), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:09,519 - root - INFO - Step 160: lr=3.22E-06, loss= 1.1248 (max= 1.4345), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:09,519 - root - INFO - Step 160: lr=3.22E-06, loss= 1.1248 (max= 1.4345), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:09,519 - root - INFO - Step 160: lr=3.22E-06, loss= 1.1248 (max= 1.4345), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:09,519 - root - INFO - Step 160: lr=3.22E-06, loss= 1.1248 (max= 1.4345), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:09,519 - root - INFO - Step 160: lr=3.22E-06, loss= 1.1248 (max= 1.4345), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:09,519 - root - INFO - Step 160: lr=3.22E-06, loss= 1.1248 (max= 1.4345), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:09,519 - root - INFO - Step 160: lr=3.22E-06, loss= 1.1248 (max= 1.4345), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:09,519 - root - INFO - Step 160: lr=3.22E-06, loss= 1.1248 (max= 1.4345), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:41,474 - root - INFO - Step 170: lr=3.42E-06, loss= 1.1135 (max= 1.6742), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:41,474 - root - INFO - Step 170: lr=3.42E-06, loss= 1.1135 (max= 1.6742), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:41,474 - root - INFO - Step 170: lr=3.42E-06, loss= 1.1135 (max= 1.6742), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:41,474 - root - INFO - Step 170: lr=3.42E-06, loss= 1.1135 (max= 1.6742), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:41,474 - root - INFO - Step 170: lr=3.42E-06, loss= 1.1135 (max= 1.6742), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:41,474 - root - INFO - Step 170: lr=3.42E-06, loss= 1.1135 (max= 1.6742), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:41,475 - root - INFO - Step 170: lr=3.42E-06, loss= 1.1135 (max= 1.6742), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:33:41,475 - root - INFO - Step 170: lr=3.42E-06, loss= 1.1135 (max= 1.6742), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:13,362 - root - INFO - Step 180: lr=3.62E-06, loss= 1.1148 (max= 1.5106), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:13,362 - root - INFO - Step 180: lr=3.62E-06, loss= 1.1148 (max= 1.5106), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:13,362 - root - INFO - Step 180: lr=3.62E-06, loss= 1.1148 (max= 1.5106), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:13,362 - root - INFO - Step 180: lr=3.62E-06, loss= 1.1148 (max= 1.5106), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:13,362 - root - INFO - Step 180: lr=3.62E-06, loss= 1.1148 (max= 1.5106), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:13,363 - root - INFO - Step 180: lr=3.62E-06, loss= 1.1148 (max= 1.5106), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:13,363 - root - INFO - Step 180: lr=3.62E-06, loss= 1.1148 (max= 1.5106), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:13,363 - root - INFO - Step 180: lr=3.62E-06, loss= 1.1148 (max= 1.5106), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:45,263 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1366 (max= 1.6542), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:45,263 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1366 (max= 1.6542), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:45,263 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1366 (max= 1.6542), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:45,263 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1366 (max= 1.6542), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:45,263 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1366 (max= 1.6542), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:45,263 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1366 (max= 1.6542), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:45,264 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1366 (max= 1.6542), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:34:45,264 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1366 (max= 1.6542), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:17,176 - root - INFO - Step 200: lr=4.02E-06, loss= 1.1565 (max= 1.5702), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:17,176 - root - INFO - Step 200: lr=4.02E-06, loss= 1.1565 (max= 1.5702), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:17,176 - root - INFO - Step 200: lr=4.02E-06, loss= 1.1565 (max= 1.5702), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:17,176 - root - INFO - Step 200: lr=4.02E-06, loss= 1.1565 (max= 1.5702), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:17,176 - root - INFO - Step 200: lr=4.02E-06, loss= 1.1565 (max= 1.5702), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:17,176 - root - INFO - Step 200: lr=4.02E-06, loss= 1.1565 (max= 1.5702), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:17,177 - root - INFO - Step 200: lr=4.02E-06, loss= 1.1565 (max= 1.5702), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:17,177 - root - INFO - Step 200: lr=4.02E-06, loss= 1.1565 (max= 1.5702), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:49,032 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1641 (max= 1.5379), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:49,032 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1641 (max= 1.5379), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:49,033 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1641 (max= 1.5379), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:49,033 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1641 (max= 1.5379), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:49,033 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1641 (max= 1.5379), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:49,033 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1641 (max= 1.5379), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:49,033 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1641 (max= 1.5379), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:35:49,033 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1641 (max= 1.5379), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:20,878 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1494 (max= 1.6935), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:20,878 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1494 (max= 1.6935), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:20,878 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1494 (max= 1.6935), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:20,878 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1494 (max= 1.6935), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:20,878 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1494 (max= 1.6935), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:20,878 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1494 (max= 1.6935), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:20,878 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1494 (max= 1.6935), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:20,878 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1494 (max= 1.6935), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:35,947 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:1280134 +2025-10-25 10:36:52,738 - root - INFO - Step 230: lr=4.62E-06, loss= 1.1689 (max= 1.6342), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:52,738 - root - INFO - Step 230: lr=4.62E-06, loss= 1.1689 (max= 1.6342), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:52,738 - root - INFO - Step 230: lr=4.62E-06, loss= 1.1689 (max= 1.6342), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:52,738 - root - INFO - Step 230: lr=4.62E-06, loss= 1.1689 (max= 1.6342), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:52,738 - root - INFO - Step 230: lr=4.62E-06, loss= 1.1689 (max= 1.6342), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:52,739 - root - INFO - Step 230: lr=4.62E-06, loss= 1.1689 (max= 1.6342), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:52,739 - root - INFO - Step 230: lr=4.62E-06, loss= 1.1689 (max= 1.6342), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:36:52,739 - root - INFO - Step 230: lr=4.62E-06, loss= 1.1689 (max= 1.6342), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:24,577 - root - INFO - Step 240: lr=4.82E-06, loss= 1.1670 (max= 1.5446), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:24,577 - root - INFO - Step 240: lr=4.82E-06, loss= 1.1670 (max= 1.5446), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:24,577 - root - INFO - Step 240: lr=4.82E-06, loss= 1.1670 (max= 1.5446), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:24,577 - root - INFO - Step 240: lr=4.82E-06, loss= 1.1670 (max= 1.5446), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:24,577 - root - INFO - Step 240: lr=4.82E-06, loss= 1.1670 (max= 1.5446), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:24,577 - root - INFO - Step 240: lr=4.82E-06, loss= 1.1670 (max= 1.5446), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:24,577 - root - INFO - Step 240: lr=4.82E-06, loss= 1.1670 (max= 1.5446), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:24,578 - root - INFO - Step 240: lr=4.82E-06, loss= 1.1670 (max= 1.5446), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:56,429 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1747 (max= 1.6533), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:56,429 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1747 (max= 1.6533), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:56,429 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1747 (max= 1.6533), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:56,429 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1747 (max= 1.6533), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:56,429 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1747 (max= 1.6533), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:56,430 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1747 (max= 1.6533), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:56,430 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1747 (max= 1.6533), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:37:56,430 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1747 (max= 1.6533), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:38:28,338 - root - INFO - Step 260: lr=5.22E-06, loss= 1.1659 (max= 1.6730), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:38:28,338 - root - INFO - Step 260: lr=5.22E-06, loss= 1.1659 (max= 1.6730), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:38:28,338 - root - INFO - Step 260: lr=5.22E-06, loss= 1.1659 (max= 1.6730), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:38:28,338 - root - INFO - Step 260: lr=5.22E-06, loss= 1.1659 (max= 1.6730), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:38:28,338 - root - INFO - Step 260: lr=5.22E-06, loss= 1.1659 (max= 1.6730), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:38:28,338 - root - INFO - Step 260: lr=5.22E-06, loss= 1.1659 (max= 1.6730), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:38:28,338 - root - INFO - Step 260: lr=5.22E-06, loss= 1.1659 (max= 1.6730), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:38:28,338 - root - INFO - Step 260: lr=5.22E-06, loss= 1.1659 (max= 1.6730), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:00,177 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1667 (max= 1.7087), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:00,177 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1667 (max= 1.7087), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:00,177 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1667 (max= 1.7087), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:00,177 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1667 (max= 1.7087), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:00,177 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1667 (max= 1.7087), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:00,177 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1667 (max= 1.7087), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:00,178 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1667 (max= 1.7087), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:00,178 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1667 (max= 1.7087), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:32,004 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1454 (max= 1.7364), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:32,004 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1454 (max= 1.7364), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:32,004 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1454 (max= 1.7364), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:32,004 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1454 (max= 1.7364), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:32,004 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1454 (max= 1.7364), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:32,004 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1454 (max= 1.7364), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:32,004 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1454 (max= 1.7364), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:39:32,004 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1454 (max= 1.7364), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:03,854 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1421 (max= 1.5813), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:03,854 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1421 (max= 1.5813), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:03,854 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1421 (max= 1.5813), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:03,854 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1421 (max= 1.5813), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:03,854 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1421 (max= 1.5813), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:03,854 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1421 (max= 1.5813), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:03,854 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1421 (max= 1.5813), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:03,854 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1421 (max= 1.5813), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:35,735 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1542 (max= 1.6011), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:35,735 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1542 (max= 1.6011), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:35,735 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1542 (max= 1.6011), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:35,735 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1542 (max= 1.6011), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:35,735 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1542 (max= 1.6011), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:35,735 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1542 (max= 1.6011), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:35,735 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1542 (max= 1.6011), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:40:35,736 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1542 (max= 1.6011), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:07,705 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1439 (max= 1.7052), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:07,705 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1439 (max= 1.7052), tps=20501, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:07,705 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1439 (max= 1.7052), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:07,705 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1439 (max= 1.7052), tps=20501, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:07,706 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1439 (max= 1.7052), tps=20501, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:07,706 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1439 (max= 1.7052), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:07,706 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1439 (max= 1.7052), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:07,706 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1439 (max= 1.7052), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:39,586 - root - INFO - Step 320: lr=6.42E-06, loss= 1.1614 (max= 1.4451), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:39,586 - root - INFO - Step 320: lr=6.42E-06, loss= 1.1614 (max= 1.4451), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:39,586 - root - INFO - Step 320: lr=6.42E-06, loss= 1.1614 (max= 1.4451), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:39,586 - root - INFO - Step 320: lr=6.42E-06, loss= 1.1614 (max= 1.4451), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:39,586 - root - INFO - Step 320: lr=6.42E-06, loss= 1.1614 (max= 1.4451), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:39,586 - root - INFO - Step 320: lr=6.42E-06, loss= 1.1614 (max= 1.4451), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:39,586 - root - INFO - Step 320: lr=6.42E-06, loss= 1.1614 (max= 1.4451), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:41:39,586 - root - INFO - Step 320: lr=6.42E-06, loss= 1.1614 (max= 1.4451), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:11,436 - root - INFO - Step 330: lr=6.62E-06, loss= 1.1520 (max= 1.6173), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:11,437 - root - INFO - Step 330: lr=6.62E-06, loss= 1.1520 (max= 1.6173), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:11,437 - root - INFO - Step 330: lr=6.62E-06, loss= 1.1520 (max= 1.6173), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:11,437 - root - INFO - Step 330: lr=6.62E-06, loss= 1.1520 (max= 1.6173), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:11,437 - root - INFO - Step 330: lr=6.62E-06, loss= 1.1520 (max= 1.6173), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:11,437 - root - INFO - Step 330: lr=6.62E-06, loss= 1.1520 (max= 1.6173), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:11,437 - root - INFO - Step 330: lr=6.62E-06, loss= 1.1520 (max= 1.6173), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:11,437 - root - INFO - Step 330: lr=6.62E-06, loss= 1.1520 (max= 1.6173), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:43,277 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1585 (max= 1.5648), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:43,277 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1585 (max= 1.5648), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:43,277 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1585 (max= 1.5648), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:43,277 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1585 (max= 1.5648), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:43,277 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1585 (max= 1.5648), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:43,277 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1585 (max= 1.5648), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:43,277 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1585 (max= 1.5648), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:42:43,277 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1585 (max= 1.5648), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:15,138 - root - INFO - Step 350: lr=7.02E-06, loss= 1.1363 (max= 1.5514), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:15,138 - root - INFO - Step 350: lr=7.02E-06, loss= 1.1363 (max= 1.5514), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:15,138 - root - INFO - Step 350: lr=7.02E-06, loss= 1.1363 (max= 1.5514), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:15,138 - root - INFO - Step 350: lr=7.02E-06, loss= 1.1363 (max= 1.5514), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:15,138 - root - INFO - Step 350: lr=7.02E-06, loss= 1.1363 (max= 1.5514), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:15,138 - root - INFO - Step 350: lr=7.02E-06, loss= 1.1363 (max= 1.5514), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:15,138 - root - INFO - Step 350: lr=7.02E-06, loss= 1.1363 (max= 1.5514), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:15,138 - root - INFO - Step 350: lr=7.02E-06, loss= 1.1363 (max= 1.5514), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:47,037 - root - INFO - Step 360: lr=7.22E-06, loss= 1.1282 (max= 1.5232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:47,038 - root - INFO - Step 360: lr=7.22E-06, loss= 1.1282 (max= 1.5232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:47,038 - root - INFO - Step 360: lr=7.22E-06, loss= 1.1282 (max= 1.5232), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:47,038 - root - INFO - Step 360: lr=7.22E-06, loss= 1.1282 (max= 1.5232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:47,038 - root - INFO - Step 360: lr=7.22E-06, loss= 1.1282 (max= 1.5232), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:47,038 - root - INFO - Step 360: lr=7.22E-06, loss= 1.1282 (max= 1.5232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:47,038 - root - INFO - Step 360: lr=7.22E-06, loss= 1.1282 (max= 1.5232), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:43:47,038 - root - INFO - Step 360: lr=7.22E-06, loss= 1.1282 (max= 1.5232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:18,859 - root - INFO - Step 370: lr=7.42E-06, loss= 1.1298 (max= 1.5892), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:18,859 - root - INFO - Step 370: lr=7.42E-06, loss= 1.1298 (max= 1.5892), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:18,859 - root - INFO - Step 370: lr=7.42E-06, loss= 1.1298 (max= 1.5892), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:18,859 - root - INFO - Step 370: lr=7.42E-06, loss= 1.1298 (max= 1.5892), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:18,859 - root - INFO - Step 370: lr=7.42E-06, loss= 1.1298 (max= 1.5892), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:18,859 - root - INFO - Step 370: lr=7.42E-06, loss= 1.1298 (max= 1.5892), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:18,859 - root - INFO - Step 370: lr=7.42E-06, loss= 1.1298 (max= 1.5892), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:18,859 - root - INFO - Step 370: lr=7.42E-06, loss= 1.1298 (max= 1.5892), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:50,685 - root - INFO - Step 380: lr=7.62E-06, loss= 1.1543 (max= 1.5739), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:50,686 - root - INFO - Step 380: lr=7.62E-06, loss= 1.1543 (max= 1.5739), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:50,686 - root - INFO - Step 380: lr=7.62E-06, loss= 1.1543 (max= 1.5739), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:50,686 - root - INFO - Step 380: lr=7.62E-06, loss= 1.1543 (max= 1.5739), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:50,686 - root - INFO - Step 380: lr=7.62E-06, loss= 1.1543 (max= 1.5739), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:50,686 - root - INFO - Step 380: lr=7.62E-06, loss= 1.1543 (max= 1.5739), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:50,686 - root - INFO - Step 380: lr=7.62E-06, loss= 1.1543 (max= 1.5739), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:44:50,686 - root - INFO - Step 380: lr=7.62E-06, loss= 1.1543 (max= 1.5739), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:22,552 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1495 (max= 1.7659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:22,552 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1495 (max= 1.7659), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:22,552 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1495 (max= 1.7659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:22,552 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1495 (max= 1.7659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:22,552 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1495 (max= 1.7659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:22,552 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1495 (max= 1.7659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:22,552 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1495 (max= 1.7659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:22,552 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1495 (max= 1.7659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:54,476 - root - INFO - Step 400: lr=8.02E-06, loss= 1.1468 (max= 1.7439), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:54,477 - root - INFO - Step 400: lr=8.02E-06, loss= 1.1468 (max= 1.7439), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:54,477 - root - INFO - Step 400: lr=8.02E-06, loss= 1.1468 (max= 1.7439), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:54,477 - root - INFO - Step 400: lr=8.02E-06, loss= 1.1468 (max= 1.7439), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:54,477 - root - INFO - Step 400: lr=8.02E-06, loss= 1.1468 (max= 1.7439), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:54,477 - root - INFO - Step 400: lr=8.02E-06, loss= 1.1468 (max= 1.7439), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:54,477 - root - INFO - Step 400: lr=8.02E-06, loss= 1.1468 (max= 1.7439), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:45:54,477 - root - INFO - Step 400: lr=8.02E-06, loss= 1.1468 (max= 1.7439), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:26,310 - root - INFO - Step 410: lr=8.22E-06, loss= 1.1426 (max= 1.5825), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:26,310 - root - INFO - Step 410: lr=8.22E-06, loss= 1.1426 (max= 1.5825), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:26,310 - root - INFO - Step 410: lr=8.22E-06, loss= 1.1426 (max= 1.5825), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:26,310 - root - INFO - Step 410: lr=8.22E-06, loss= 1.1426 (max= 1.5825), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:26,310 - root - INFO - Step 410: lr=8.22E-06, loss= 1.1426 (max= 1.5825), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:26,310 - root - INFO - Step 410: lr=8.22E-06, loss= 1.1426 (max= 1.5825), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:26,310 - root - INFO - Step 410: lr=8.22E-06, loss= 1.1426 (max= 1.5825), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:26,310 - root - INFO - Step 410: lr=8.22E-06, loss= 1.1426 (max= 1.5825), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:58,166 - root - INFO - Step 420: lr=8.42E-06, loss= 1.1283 (max= 1.5372), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:58,166 - root - INFO - Step 420: lr=8.42E-06, loss= 1.1283 (max= 1.5372), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:58,166 - root - INFO - Step 420: lr=8.42E-06, loss= 1.1283 (max= 1.5372), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:58,166 - root - INFO - Step 420: lr=8.42E-06, loss= 1.1283 (max= 1.5372), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:58,166 - root - INFO - Step 420: lr=8.42E-06, loss= 1.1283 (max= 1.5372), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:58,166 - root - INFO - Step 420: lr=8.42E-06, loss= 1.1283 (max= 1.5372), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:58,166 - root - INFO - Step 420: lr=8.42E-06, loss= 1.1283 (max= 1.5372), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:46:58,166 - root - INFO - Step 420: lr=8.42E-06, loss= 1.1283 (max= 1.5372), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:47:30,070 - root - INFO - Step 430: lr=8.62E-06, loss= 1.1625 (max= 1.5683), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:47:30,070 - root - INFO - Step 430: lr=8.62E-06, loss= 1.1625 (max= 1.5683), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:47:30,070 - root - INFO - Step 430: lr=8.62E-06, loss= 1.1625 (max= 1.5683), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:47:30,070 - root - INFO - Step 430: lr=8.62E-06, loss= 1.1625 (max= 1.5683), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:47:30,070 - root - INFO - Step 430: lr=8.62E-06, loss= 1.1625 (max= 1.5683), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:47:30,070 - root - INFO - Step 430: lr=8.62E-06, loss= 1.1625 (max= 1.5683), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:47:30,070 - root - INFO - Step 430: lr=8.62E-06, loss= 1.1625 (max= 1.5683), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:47:30,070 - root - INFO - Step 430: lr=8.62E-06, loss= 1.1625 (max= 1.5683), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:01,945 - root - INFO - Step 440: lr=8.82E-06, loss= 1.1477 (max= 1.7996), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:01,945 - root - INFO - Step 440: lr=8.82E-06, loss= 1.1477 (max= 1.7996), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:01,945 - root - INFO - Step 440: lr=8.82E-06, loss= 1.1477 (max= 1.7996), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:01,945 - root - INFO - Step 440: lr=8.82E-06, loss= 1.1477 (max= 1.7996), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:01,945 - root - INFO - Step 440: lr=8.82E-06, loss= 1.1477 (max= 1.7996), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:01,945 - root - INFO - Step 440: lr=8.82E-06, loss= 1.1477 (max= 1.7996), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:01,945 - root - INFO - Step 440: lr=8.82E-06, loss= 1.1477 (max= 1.7996), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:01,946 - root - INFO - Step 440: lr=8.82E-06, loss= 1.1477 (max= 1.7996), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:33,827 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1324 (max= 1.5936), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:33,827 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1324 (max= 1.5936), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:33,827 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1324 (max= 1.5936), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:33,827 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1324 (max= 1.5936), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:33,827 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1324 (max= 1.5936), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:33,828 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1324 (max= 1.5936), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:33,828 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1324 (max= 1.5936), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:48:33,828 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1324 (max= 1.5936), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:05,731 - root - INFO - Step 460: lr=9.22E-06, loss= 1.1358 (max= 1.8258), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:05,731 - root - INFO - Step 460: lr=9.22E-06, loss= 1.1358 (max= 1.8258), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:05,732 - root - INFO - Step 460: lr=9.22E-06, loss= 1.1358 (max= 1.8258), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:05,732 - root - INFO - Step 460: lr=9.22E-06, loss= 1.1358 (max= 1.8258), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:05,732 - root - INFO - Step 460: lr=9.22E-06, loss= 1.1358 (max= 1.8258), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:05,732 - root - INFO - Step 460: lr=9.22E-06, loss= 1.1358 (max= 1.8258), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:05,732 - root - INFO - Step 460: lr=9.22E-06, loss= 1.1358 (max= 1.8258), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:05,732 - root - INFO - Step 460: lr=9.22E-06, loss= 1.1358 (max= 1.8258), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:37,626 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1549 (max= 1.6510), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:37,626 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1549 (max= 1.6510), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:37,626 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1549 (max= 1.6510), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:37,626 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1549 (max= 1.6510), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:37,626 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1549 (max= 1.6510), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:37,626 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1549 (max= 1.6510), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:37,626 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1549 (max= 1.6510), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:37,627 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1549 (max= 1.6510), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:49:38,220 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:551418 +2025-10-25 10:50:09,447 - root - INFO - Step 480: lr=9.62E-06, loss= 1.1253 (max= 1.5244), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:09,447 - root - INFO - Step 480: lr=9.62E-06, loss= 1.1253 (max= 1.5244), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:09,447 - root - INFO - Step 480: lr=9.62E-06, loss= 1.1253 (max= 1.5244), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:09,447 - root - INFO - Step 480: lr=9.62E-06, loss= 1.1253 (max= 1.5244), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:09,447 - root - INFO - Step 480: lr=9.62E-06, loss= 1.1253 (max= 1.5244), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:09,447 - root - INFO - Step 480: lr=9.62E-06, loss= 1.1253 (max= 1.5244), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:09,448 - root - INFO - Step 480: lr=9.62E-06, loss= 1.1253 (max= 1.5244), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:09,448 - root - INFO - Step 480: lr=9.62E-06, loss= 1.1253 (max= 1.5244), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:14,988 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:7264002 +2025-10-25 10:50:41,315 - root - INFO - Step 490: lr=9.82E-06, loss= 1.1524 (max= 1.7864), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:41,315 - root - INFO - Step 490: lr=9.82E-06, loss= 1.1524 (max= 1.7864), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:41,315 - root - INFO - Step 490: lr=9.82E-06, loss= 1.1524 (max= 1.7864), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:41,315 - root - INFO - Step 490: lr=9.82E-06, loss= 1.1524 (max= 1.7864), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:41,315 - root - INFO - Step 490: lr=9.82E-06, loss= 1.1524 (max= 1.7864), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:41,315 - root - INFO - Step 490: lr=9.82E-06, loss= 1.1524 (max= 1.7864), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:41,315 - root - INFO - Step 490: lr=9.82E-06, loss= 1.1524 (max= 1.7864), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:50:41,315 - root - INFO - Step 490: lr=9.82E-06, loss= 1.1524 (max= 1.7864), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:13,226 - root - INFO - Step 500: lr=1.00E-05, loss= 1.1470 (max= 1.7062), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:13,226 - root - INFO - Step 500: lr=1.00E-05, loss= 1.1470 (max= 1.7062), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:13,226 - root - INFO - Step 500: lr=1.00E-05, loss= 1.1470 (max= 1.7062), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:13,226 - root - INFO - Step 500: lr=1.00E-05, loss= 1.1470 (max= 1.7062), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:13,226 - root - INFO - Step 500: lr=1.00E-05, loss= 1.1470 (max= 1.7062), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:13,226 - root - INFO - Step 500: lr=1.00E-05, loss= 1.1470 (max= 1.7062), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:13,226 - root - INFO - Step 500: lr=1.00E-05, loss= 1.1470 (max= 1.7062), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:13,227 - root - INFO - Step 500: lr=1.00E-05, loss= 1.1470 (max= 1.7062), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:45,114 - root - INFO - Step 510: lr=1.00E-05, loss= 1.1730 (max= 1.8511), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:45,115 - root - INFO - Step 510: lr=1.00E-05, loss= 1.1730 (max= 1.8511), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:45,115 - root - INFO - Step 510: lr=1.00E-05, loss= 1.1730 (max= 1.8511), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:45,115 - root - INFO - Step 510: lr=1.00E-05, loss= 1.1730 (max= 1.8511), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:45,115 - root - INFO - Step 510: lr=1.00E-05, loss= 1.1730 (max= 1.8511), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:45,115 - root - INFO - Step 510: lr=1.00E-05, loss= 1.1730 (max= 1.8511), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:45,115 - root - INFO - Step 510: lr=1.00E-05, loss= 1.1730 (max= 1.8511), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:51:45,115 - root - INFO - Step 510: lr=1.00E-05, loss= 1.1730 (max= 1.8511), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:17,005 - root - INFO - Step 520: lr=1.00E-05, loss= 1.1473 (max= 1.6168), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:17,005 - root - INFO - Step 520: lr=1.00E-05, loss= 1.1473 (max= 1.6168), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:17,005 - root - INFO - Step 520: lr=1.00E-05, loss= 1.1473 (max= 1.6168), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:17,005 - root - INFO - Step 520: lr=1.00E-05, loss= 1.1473 (max= 1.6168), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:17,005 - root - INFO - Step 520: lr=1.00E-05, loss= 1.1473 (max= 1.6168), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:17,005 - root - INFO - Step 520: lr=1.00E-05, loss= 1.1473 (max= 1.6168), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:17,005 - root - INFO - Step 520: lr=1.00E-05, loss= 1.1473 (max= 1.6168), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:17,005 - root - INFO - Step 520: lr=1.00E-05, loss= 1.1473 (max= 1.6168), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:27,165 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:6682241 +2025-10-25 10:52:43,058 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:6663878 +2025-10-25 10:52:48,847 - root - INFO - Step 530: lr=1.00E-05, loss= 1.1680 (max= 1.5327), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:48,847 - root - INFO - Step 530: lr=1.00E-05, loss= 1.1680 (max= 1.5327), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:48,847 - root - INFO - Step 530: lr=1.00E-05, loss= 1.1680 (max= 1.5327), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:48,847 - root - INFO - Step 530: lr=1.00E-05, loss= 1.1680 (max= 1.5327), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:48,847 - root - INFO - Step 530: lr=1.00E-05, loss= 1.1680 (max= 1.5327), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:48,847 - root - INFO - Step 530: lr=1.00E-05, loss= 1.1680 (max= 1.5327), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:48,847 - root - INFO - Step 530: lr=1.00E-05, loss= 1.1680 (max= 1.5327), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:52:48,848 - root - INFO - Step 530: lr=1.00E-05, loss= 1.1680 (max= 1.5327), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:20,757 - root - INFO - Step 540: lr=1.00E-05, loss= 1.1565 (max= 1.5298), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:20,757 - root - INFO - Step 540: lr=1.00E-05, loss= 1.1565 (max= 1.5298), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:20,758 - root - INFO - Step 540: lr=1.00E-05, loss= 1.1565 (max= 1.5298), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:20,758 - root - INFO - Step 540: lr=1.00E-05, loss= 1.1565 (max= 1.5298), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:20,758 - root - INFO - Step 540: lr=1.00E-05, loss= 1.1565 (max= 1.5298), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:20,758 - root - INFO - Step 540: lr=1.00E-05, loss= 1.1565 (max= 1.5298), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:20,758 - root - INFO - Step 540: lr=1.00E-05, loss= 1.1565 (max= 1.5298), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:20,758 - root - INFO - Step 540: lr=1.00E-05, loss= 1.1565 (max= 1.5298), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:52,620 - root - INFO - Step 550: lr=1.00E-05, loss= 1.1345 (max= 1.5973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:52,620 - root - INFO - Step 550: lr=1.00E-05, loss= 1.1345 (max= 1.5973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:52,620 - root - INFO - Step 550: lr=1.00E-05, loss= 1.1345 (max= 1.5973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:52,620 - root - INFO - Step 550: lr=1.00E-05, loss= 1.1345 (max= 1.5973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:52,620 - root - INFO - Step 550: lr=1.00E-05, loss= 1.1345 (max= 1.5973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:52,620 - root - INFO - Step 550: lr=1.00E-05, loss= 1.1345 (max= 1.5973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:52,620 - root - INFO - Step 550: lr=1.00E-05, loss= 1.1345 (max= 1.5973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:53:52,621 - root - INFO - Step 550: lr=1.00E-05, loss= 1.1345 (max= 1.5973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:24,406 - root - INFO - Step 560: lr=1.00E-05, loss= 1.1657 (max= 1.5906), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:24,407 - root - INFO - Step 560: lr=1.00E-05, loss= 1.1657 (max= 1.5906), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:24,407 - root - INFO - Step 560: lr=1.00E-05, loss= 1.1657 (max= 1.5906), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:24,407 - root - INFO - Step 560: lr=1.00E-05, loss= 1.1657 (max= 1.5906), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:24,407 - root - INFO - Step 560: lr=1.00E-05, loss= 1.1657 (max= 1.5906), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:24,407 - root - INFO - Step 560: lr=1.00E-05, loss= 1.1657 (max= 1.5906), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:24,407 - root - INFO - Step 560: lr=1.00E-05, loss= 1.1657 (max= 1.5906), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:24,407 - root - INFO - Step 560: lr=1.00E-05, loss= 1.1657 (max= 1.5906), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:56,221 - root - INFO - Step 570: lr=1.00E-05, loss= 1.1395 (max= 1.5811), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:56,221 - root - INFO - Step 570: lr=1.00E-05, loss= 1.1395 (max= 1.5811), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:56,221 - root - INFO - Step 570: lr=1.00E-05, loss= 1.1395 (max= 1.5811), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:56,221 - root - INFO - Step 570: lr=1.00E-05, loss= 1.1395 (max= 1.5811), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:56,221 - root - INFO - Step 570: lr=1.00E-05, loss= 1.1395 (max= 1.5811), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:56,221 - root - INFO - Step 570: lr=1.00E-05, loss= 1.1395 (max= 1.5811), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:56,221 - root - INFO - Step 570: lr=1.00E-05, loss= 1.1395 (max= 1.5811), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:54:56,222 - root - INFO - Step 570: lr=1.00E-05, loss= 1.1395 (max= 1.5811), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:28,068 - root - INFO - Step 580: lr=1.00E-05, loss= 1.1504 (max= 1.4901), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:28,068 - root - INFO - Step 580: lr=1.00E-05, loss= 1.1504 (max= 1.4901), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:28,068 - root - INFO - Step 580: lr=1.00E-05, loss= 1.1504 (max= 1.4901), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:28,068 - root - INFO - Step 580: lr=1.00E-05, loss= 1.1504 (max= 1.4901), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:28,068 - root - INFO - Step 580: lr=1.00E-05, loss= 1.1504 (max= 1.4901), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:28,068 - root - INFO - Step 580: lr=1.00E-05, loss= 1.1504 (max= 1.4901), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:28,068 - root - INFO - Step 580: lr=1.00E-05, loss= 1.1504 (max= 1.4901), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:28,069 - root - INFO - Step 580: lr=1.00E-05, loss= 1.1504 (max= 1.4901), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:59,987 - root - INFO - Step 590: lr=1.00E-05, loss= 1.1515 (max= 1.5409), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:59,987 - root - INFO - Step 590: lr=1.00E-05, loss= 1.1515 (max= 1.5409), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:59,987 - root - INFO - Step 590: lr=1.00E-05, loss= 1.1515 (max= 1.5409), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:59,987 - root - INFO - Step 590: lr=1.00E-05, loss= 1.1515 (max= 1.5409), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:59,987 - root - INFO - Step 590: lr=1.00E-05, loss= 1.1515 (max= 1.5409), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:59,987 - root - INFO - Step 590: lr=1.00E-05, loss= 1.1515 (max= 1.5409), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:59,987 - root - INFO - Step 590: lr=1.00E-05, loss= 1.1515 (max= 1.5409), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:55:59,987 - root - INFO - Step 590: lr=1.00E-05, loss= 1.1515 (max= 1.5409), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:56:31,866 - root - INFO - Step 600: lr=1.00E-05, loss= 1.1321 (max= 1.4679), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:56:31,866 - root - INFO - Step 600: lr=1.00E-05, loss= 1.1321 (max= 1.4679), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:56:31,866 - root - INFO - Step 600: lr=1.00E-05, loss= 1.1321 (max= 1.4679), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:56:31,866 - root - INFO - Step 600: lr=1.00E-05, loss= 1.1321 (max= 1.4679), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:56:31,866 - root - INFO - Step 600: lr=1.00E-05, loss= 1.1321 (max= 1.4679), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:56:31,866 - root - INFO - Step 600: lr=1.00E-05, loss= 1.1321 (max= 1.4679), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:56:31,866 - root - INFO - Step 600: lr=1.00E-05, loss= 1.1321 (max= 1.4679), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:56:31,867 - root - INFO - Step 600: lr=1.00E-05, loss= 1.1321 (max= 1.4679), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:03,707 - root - INFO - Step 610: lr=1.00E-05, loss= 1.1480 (max= 1.4939), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:03,707 - root - INFO - Step 610: lr=1.00E-05, loss= 1.1480 (max= 1.4939), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:03,707 - root - INFO - Step 610: lr=1.00E-05, loss= 1.1480 (max= 1.4939), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:03,707 - root - INFO - Step 610: lr=1.00E-05, loss= 1.1480 (max= 1.4939), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:03,707 - root - INFO - Step 610: lr=1.00E-05, loss= 1.1480 (max= 1.4939), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:03,707 - root - INFO - Step 610: lr=1.00E-05, loss= 1.1480 (max= 1.4939), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:03,707 - root - INFO - Step 610: lr=1.00E-05, loss= 1.1480 (max= 1.4939), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:03,708 - root - INFO - Step 610: lr=1.00E-05, loss= 1.1480 (max= 1.4939), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:35,642 - root - INFO - Step 620: lr=1.00E-05, loss= 1.1328 (max= 1.6570), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:35,642 - root - INFO - Step 620: lr=1.00E-05, loss= 1.1328 (max= 1.6570), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:35,642 - root - INFO - Step 620: lr=1.00E-05, loss= 1.1328 (max= 1.6570), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:35,642 - root - INFO - Step 620: lr=1.00E-05, loss= 1.1328 (max= 1.6570), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:35,642 - root - INFO - Step 620: lr=1.00E-05, loss= 1.1328 (max= 1.6570), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:35,642 - root - INFO - Step 620: lr=1.00E-05, loss= 1.1328 (max= 1.6570), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:35,642 - root - INFO - Step 620: lr=1.00E-05, loss= 1.1328 (max= 1.6570), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:57:35,642 - root - INFO - Step 620: lr=1.00E-05, loss= 1.1328 (max= 1.6570), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:07,506 - root - INFO - Step 630: lr=1.00E-05, loss= 1.1496 (max= 1.8959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:07,506 - root - INFO - Step 630: lr=1.00E-05, loss= 1.1496 (max= 1.8959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:07,506 - root - INFO - Step 630: lr=1.00E-05, loss= 1.1496 (max= 1.8959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:07,506 - root - INFO - Step 630: lr=1.00E-05, loss= 1.1496 (max= 1.8959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:07,506 - root - INFO - Step 630: lr=1.00E-05, loss= 1.1496 (max= 1.8959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:07,506 - root - INFO - Step 630: lr=1.00E-05, loss= 1.1496 (max= 1.8959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:07,506 - root - INFO - Step 630: lr=1.00E-05, loss= 1.1496 (max= 1.8959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:07,506 - root - INFO - Step 630: lr=1.00E-05, loss= 1.1496 (max= 1.8959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:39,388 - root - INFO - Step 640: lr=1.00E-05, loss= 1.1443 (max= 1.8163), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:39,388 - root - INFO - Step 640: lr=1.00E-05, loss= 1.1443 (max= 1.8163), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:39,388 - root - INFO - Step 640: lr=1.00E-05, loss= 1.1443 (max= 1.8163), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:39,388 - root - INFO - Step 640: lr=1.00E-05, loss= 1.1443 (max= 1.8163), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:39,388 - root - INFO - Step 640: lr=1.00E-05, loss= 1.1443 (max= 1.8163), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:39,388 - root - INFO - Step 640: lr=1.00E-05, loss= 1.1443 (max= 1.8163), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:39,388 - root - INFO - Step 640: lr=1.00E-05, loss= 1.1443 (max= 1.8163), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:58:39,388 - root - INFO - Step 640: lr=1.00E-05, loss= 1.1443 (max= 1.8163), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:11,226 - root - INFO - Step 650: lr=1.00E-05, loss= 1.1385 (max= 1.5374), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:11,226 - root - INFO - Step 650: lr=1.00E-05, loss= 1.1385 (max= 1.5374), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:11,226 - root - INFO - Step 650: lr=1.00E-05, loss= 1.1385 (max= 1.5374), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:11,226 - root - INFO - Step 650: lr=1.00E-05, loss= 1.1385 (max= 1.5374), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:11,226 - root - INFO - Step 650: lr=1.00E-05, loss= 1.1385 (max= 1.5374), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:11,226 - root - INFO - Step 650: lr=1.00E-05, loss= 1.1385 (max= 1.5374), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:11,226 - root - INFO - Step 650: lr=1.00E-05, loss= 1.1385 (max= 1.5374), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:11,226 - root - INFO - Step 650: lr=1.00E-05, loss= 1.1385 (max= 1.5374), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:43,070 - root - INFO - Step 660: lr=1.00E-05, loss= 1.1495 (max= 1.6796), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:43,070 - root - INFO - Step 660: lr=1.00E-05, loss= 1.1495 (max= 1.6796), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:43,070 - root - INFO - Step 660: lr=1.00E-05, loss= 1.1495 (max= 1.6796), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:43,070 - root - INFO - Step 660: lr=1.00E-05, loss= 1.1495 (max= 1.6796), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:43,070 - root - INFO - Step 660: lr=1.00E-05, loss= 1.1495 (max= 1.6796), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:43,071 - root - INFO - Step 660: lr=1.00E-05, loss= 1.1495 (max= 1.6796), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:43,071 - root - INFO - Step 660: lr=1.00E-05, loss= 1.1495 (max= 1.6796), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 10:59:43,071 - root - INFO - Step 660: lr=1.00E-05, loss= 1.1495 (max= 1.6796), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:14,963 - root - INFO - Step 670: lr=1.00E-05, loss= 1.1437 (max= 1.6269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:14,963 - root - INFO - Step 670: lr=1.00E-05, loss= 1.1437 (max= 1.6269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:14,963 - root - INFO - Step 670: lr=1.00E-05, loss= 1.1437 (max= 1.6269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:14,963 - root - INFO - Step 670: lr=1.00E-05, loss= 1.1437 (max= 1.6269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:14,963 - root - INFO - Step 670: lr=1.00E-05, loss= 1.1437 (max= 1.6269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:14,964 - root - INFO - Step 670: lr=1.00E-05, loss= 1.1437 (max= 1.6269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:14,964 - root - INFO - Step 670: lr=1.00E-05, loss= 1.1437 (max= 1.6269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:14,964 - root - INFO - Step 670: lr=1.00E-05, loss= 1.1437 (max= 1.6269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:46,760 - root - INFO - Step 680: lr=1.00E-05, loss= 1.1188 (max= 1.5515), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:46,760 - root - INFO - Step 680: lr=1.00E-05, loss= 1.1188 (max= 1.5515), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:46,760 - root - INFO - Step 680: lr=1.00E-05, loss= 1.1188 (max= 1.5515), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:46,760 - root - INFO - Step 680: lr=1.00E-05, loss= 1.1188 (max= 1.5515), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:46,760 - root - INFO - Step 680: lr=1.00E-05, loss= 1.1188 (max= 1.5515), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:46,760 - root - INFO - Step 680: lr=1.00E-05, loss= 1.1188 (max= 1.5515), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:46,760 - root - INFO - Step 680: lr=1.00E-05, loss= 1.1188 (max= 1.5515), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:00:46,760 - root - INFO - Step 680: lr=1.00E-05, loss= 1.1188 (max= 1.5515), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:18,645 - root - INFO - Step 690: lr=1.00E-05, loss= 1.1401 (max= 1.5535), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:18,646 - root - INFO - Step 690: lr=1.00E-05, loss= 1.1401 (max= 1.5535), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:18,646 - root - INFO - Step 690: lr=1.00E-05, loss= 1.1401 (max= 1.5535), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:18,646 - root - INFO - Step 690: lr=1.00E-05, loss= 1.1401 (max= 1.5535), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:18,646 - root - INFO - Step 690: lr=1.00E-05, loss= 1.1401 (max= 1.5535), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:18,646 - root - INFO - Step 690: lr=1.00E-05, loss= 1.1401 (max= 1.5535), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:18,646 - root - INFO - Step 690: lr=1.00E-05, loss= 1.1401 (max= 1.5535), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:18,646 - root - INFO - Step 690: lr=1.00E-05, loss= 1.1401 (max= 1.5535), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:50,531 - root - INFO - Step 700: lr=1.00E-05, loss= 1.1185 (max= 1.5006), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:50,531 - root - INFO - Step 700: lr=1.00E-05, loss= 1.1185 (max= 1.5006), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:50,531 - root - INFO - Step 700: lr=1.00E-05, loss= 1.1185 (max= 1.5006), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:50,531 - root - INFO - Step 700: lr=1.00E-05, loss= 1.1185 (max= 1.5006), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:50,531 - root - INFO - Step 700: lr=1.00E-05, loss= 1.1185 (max= 1.5006), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:50,531 - root - INFO - Step 700: lr=1.00E-05, loss= 1.1185 (max= 1.5006), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:50,531 - root - INFO - Step 700: lr=1.00E-05, loss= 1.1185 (max= 1.5006), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:01:50,531 - root - INFO - Step 700: lr=1.00E-05, loss= 1.1185 (max= 1.5006), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:22,425 - root - INFO - Step 710: lr=1.00E-05, loss= 1.1450 (max= 1.5546), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:22,425 - root - INFO - Step 710: lr=1.00E-05, loss= 1.1450 (max= 1.5546), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:22,425 - root - INFO - Step 710: lr=1.00E-05, loss= 1.1450 (max= 1.5546), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:22,425 - root - INFO - Step 710: lr=1.00E-05, loss= 1.1450 (max= 1.5546), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:22,425 - root - INFO - Step 710: lr=1.00E-05, loss= 1.1450 (max= 1.5546), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:22,425 - root - INFO - Step 710: lr=1.00E-05, loss= 1.1450 (max= 1.5546), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:22,425 - root - INFO - Step 710: lr=1.00E-05, loss= 1.1450 (max= 1.5546), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:22,425 - root - INFO - Step 710: lr=1.00E-05, loss= 1.1450 (max= 1.5546), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:54,281 - root - INFO - Step 720: lr=1.00E-05, loss= 1.1136 (max= 1.5384), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:54,281 - root - INFO - Step 720: lr=1.00E-05, loss= 1.1136 (max= 1.5384), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:54,281 - root - INFO - Step 720: lr=1.00E-05, loss= 1.1136 (max= 1.5384), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:54,282 - root - INFO - Step 720: lr=1.00E-05, loss= 1.1136 (max= 1.5384), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:54,282 - root - INFO - Step 720: lr=1.00E-05, loss= 1.1136 (max= 1.5384), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:54,282 - root - INFO - Step 720: lr=1.00E-05, loss= 1.1136 (max= 1.5384), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:54,282 - root - INFO - Step 720: lr=1.00E-05, loss= 1.1136 (max= 1.5384), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:02:54,282 - root - INFO - Step 720: lr=1.00E-05, loss= 1.1136 (max= 1.5384), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:26,183 - root - INFO - Step 730: lr=1.00E-05, loss= 1.1572 (max= 1.7571), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:26,183 - root - INFO - Step 730: lr=1.00E-05, loss= 1.1572 (max= 1.7571), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:26,183 - root - INFO - Step 730: lr=1.00E-05, loss= 1.1572 (max= 1.7571), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:26,183 - root - INFO - Step 730: lr=1.00E-05, loss= 1.1572 (max= 1.7571), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:26,183 - root - INFO - Step 730: lr=1.00E-05, loss= 1.1572 (max= 1.7571), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:26,183 - root - INFO - Step 730: lr=1.00E-05, loss= 1.1572 (max= 1.7571), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:26,183 - root - INFO - Step 730: lr=1.00E-05, loss= 1.1572 (max= 1.7571), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:26,183 - root - INFO - Step 730: lr=1.00E-05, loss= 1.1572 (max= 1.7571), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:58,049 - root - INFO - Step 740: lr=1.00E-05, loss= 1.1573 (max= 1.6704), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:58,049 - root - INFO - Step 740: lr=1.00E-05, loss= 1.1573 (max= 1.6704), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:58,049 - root - INFO - Step 740: lr=1.00E-05, loss= 1.1573 (max= 1.6704), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:58,049 - root - INFO - Step 740: lr=1.00E-05, loss= 1.1573 (max= 1.6704), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:58,049 - root - INFO - Step 740: lr=1.00E-05, loss= 1.1573 (max= 1.6704), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:58,049 - root - INFO - Step 740: lr=1.00E-05, loss= 1.1573 (max= 1.6704), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:58,049 - root - INFO - Step 740: lr=1.00E-05, loss= 1.1573 (max= 1.6704), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:03:58,049 - root - INFO - Step 740: lr=1.00E-05, loss= 1.1573 (max= 1.6704), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:04:29,963 - root - INFO - Step 750: lr=1.00E-05, loss= 1.1554 (max= 1.6341), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:04:29,963 - root - INFO - Step 750: lr=1.00E-05, loss= 1.1554 (max= 1.6341), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:04:29,963 - root - INFO - Step 750: lr=1.00E-05, loss= 1.1554 (max= 1.6341), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:04:29,963 - root - INFO - Step 750: lr=1.00E-05, loss= 1.1554 (max= 1.6341), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:04:29,963 - root - INFO - Step 750: lr=1.00E-05, loss= 1.1554 (max= 1.6341), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:04:29,963 - root - INFO - Step 750: lr=1.00E-05, loss= 1.1554 (max= 1.6341), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:04:29,964 - root - INFO - Step 750: lr=1.00E-05, loss= 1.1554 (max= 1.6341), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:04:29,964 - root - INFO - Step 750: lr=1.00E-05, loss= 1.1554 (max= 1.6341), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:01,797 - root - INFO - Step 760: lr=1.00E-05, loss= 1.1575 (max= 1.5889), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:01,797 - root - INFO - Step 760: lr=1.00E-05, loss= 1.1575 (max= 1.5889), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:01,797 - root - INFO - Step 760: lr=1.00E-05, loss= 1.1575 (max= 1.5889), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:01,797 - root - INFO - Step 760: lr=1.00E-05, loss= 1.1575 (max= 1.5889), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:01,797 - root - INFO - Step 760: lr=1.00E-05, loss= 1.1575 (max= 1.5889), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:01,797 - root - INFO - Step 760: lr=1.00E-05, loss= 1.1575 (max= 1.5889), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:01,797 - root - INFO - Step 760: lr=1.00E-05, loss= 1.1575 (max= 1.5889), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:01,797 - root - INFO - Step 760: lr=1.00E-05, loss= 1.1575 (max= 1.5889), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:33,581 - root - INFO - Step 770: lr=1.00E-05, loss= 1.1569 (max= 1.5650), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:33,581 - root - INFO - Step 770: lr=1.00E-05, loss= 1.1569 (max= 1.5650), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:33,581 - root - INFO - Step 770: lr=1.00E-05, loss= 1.1569 (max= 1.5650), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:33,581 - root - INFO - Step 770: lr=1.00E-05, loss= 1.1569 (max= 1.5650), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:33,581 - root - INFO - Step 770: lr=1.00E-05, loss= 1.1569 (max= 1.5650), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:33,581 - root - INFO - Step 770: lr=1.00E-05, loss= 1.1569 (max= 1.5650), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:33,582 - root - INFO - Step 770: lr=1.00E-05, loss= 1.1569 (max= 1.5650), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:05:33,582 - root - INFO - Step 770: lr=1.00E-05, loss= 1.1569 (max= 1.5650), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:05,398 - root - INFO - Step 780: lr=1.00E-05, loss= 1.1636 (max= 1.5804), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:05,398 - root - INFO - Step 780: lr=1.00E-05, loss= 1.1636 (max= 1.5804), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:05,398 - root - INFO - Step 780: lr=1.00E-05, loss= 1.1636 (max= 1.5804), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:05,398 - root - INFO - Step 780: lr=1.00E-05, loss= 1.1636 (max= 1.5804), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:05,398 - root - INFO - Step 780: lr=1.00E-05, loss= 1.1636 (max= 1.5804), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:05,398 - root - INFO - Step 780: lr=1.00E-05, loss= 1.1636 (max= 1.5804), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:05,398 - root - INFO - Step 780: lr=1.00E-05, loss= 1.1636 (max= 1.5804), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:05,398 - root - INFO - Step 780: lr=1.00E-05, loss= 1.1636 (max= 1.5804), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:37,273 - root - INFO - Step 790: lr=1.00E-05, loss= 1.1454 (max= 1.6800), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:37,273 - root - INFO - Step 790: lr=1.00E-05, loss= 1.1454 (max= 1.6800), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:37,273 - root - INFO - Step 790: lr=1.00E-05, loss= 1.1454 (max= 1.6800), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:37,273 - root - INFO - Step 790: lr=1.00E-05, loss= 1.1454 (max= 1.6800), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:37,273 - root - INFO - Step 790: lr=1.00E-05, loss= 1.1454 (max= 1.6800), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:37,273 - root - INFO - Step 790: lr=1.00E-05, loss= 1.1454 (max= 1.6800), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:37,273 - root - INFO - Step 790: lr=1.00E-05, loss= 1.1454 (max= 1.6800), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:06:37,274 - root - INFO - Step 790: lr=1.00E-05, loss= 1.1454 (max= 1.6800), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:09,240 - root - INFO - Step 800: lr=1.00E-05, loss= 1.1338 (max= 1.6211), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:09,241 - root - INFO - Step 800: lr=1.00E-05, loss= 1.1338 (max= 1.6211), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:09,241 - root - INFO - Step 800: lr=1.00E-05, loss= 1.1338 (max= 1.6211), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:09,241 - root - INFO - Step 800: lr=1.00E-05, loss= 1.1338 (max= 1.6211), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:09,241 - root - INFO - Step 800: lr=1.00E-05, loss= 1.1338 (max= 1.6211), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:09,241 - root - INFO - Step 800: lr=1.00E-05, loss= 1.1338 (max= 1.6211), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:09,241 - root - INFO - Step 800: lr=1.00E-05, loss= 1.1338 (max= 1.6211), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:09,241 - root - INFO - Step 800: lr=1.00E-05, loss= 1.1338 (max= 1.6211), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:41,127 - root - INFO - Step 810: lr=1.00E-05, loss= 1.1425 (max= 1.5614), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:41,127 - root - INFO - Step 810: lr=1.00E-05, loss= 1.1425 (max= 1.5614), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:41,127 - root - INFO - Step 810: lr=1.00E-05, loss= 1.1425 (max= 1.5614), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:41,127 - root - INFO - Step 810: lr=1.00E-05, loss= 1.1425 (max= 1.5614), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:41,127 - root - INFO - Step 810: lr=1.00E-05, loss= 1.1425 (max= 1.5614), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:41,128 - root - INFO - Step 810: lr=1.00E-05, loss= 1.1425 (max= 1.5614), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:41,128 - root - INFO - Step 810: lr=1.00E-05, loss= 1.1425 (max= 1.5614), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:07:41,128 - root - INFO - Step 810: lr=1.00E-05, loss= 1.1425 (max= 1.5614), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:12,984 - root - INFO - Step 820: lr=1.00E-05, loss= 1.1307 (max= 1.6597), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:12,984 - root - INFO - Step 820: lr=1.00E-05, loss= 1.1307 (max= 1.6597), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:12,984 - root - INFO - Step 820: lr=1.00E-05, loss= 1.1307 (max= 1.6597), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:12,984 - root - INFO - Step 820: lr=1.00E-05, loss= 1.1307 (max= 1.6597), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:12,984 - root - INFO - Step 820: lr=1.00E-05, loss= 1.1307 (max= 1.6597), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:12,984 - root - INFO - Step 820: lr=1.00E-05, loss= 1.1307 (max= 1.6597), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:12,984 - root - INFO - Step 820: lr=1.00E-05, loss= 1.1307 (max= 1.6597), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:12,984 - root - INFO - Step 820: lr=1.00E-05, loss= 1.1307 (max= 1.6597), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:44,817 - root - INFO - Step 830: lr=1.00E-05, loss= 1.1217 (max= 1.5419), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:44,817 - root - INFO - Step 830: lr=1.00E-05, loss= 1.1217 (max= 1.5419), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:44,817 - root - INFO - Step 830: lr=1.00E-05, loss= 1.1217 (max= 1.5419), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:44,817 - root - INFO - Step 830: lr=1.00E-05, loss= 1.1217 (max= 1.5419), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:44,817 - root - INFO - Step 830: lr=1.00E-05, loss= 1.1217 (max= 1.5419), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:44,817 - root - INFO - Step 830: lr=1.00E-05, loss= 1.1217 (max= 1.5419), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:44,817 - root - INFO - Step 830: lr=1.00E-05, loss= 1.1217 (max= 1.5419), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:08:44,817 - root - INFO - Step 830: lr=1.00E-05, loss= 1.1217 (max= 1.5419), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:16,719 - root - INFO - Step 840: lr=1.00E-05, loss= 1.1367 (max= 1.6228), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:16,719 - root - INFO - Step 840: lr=1.00E-05, loss= 1.1367 (max= 1.6228), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:16,719 - root - INFO - Step 840: lr=1.00E-05, loss= 1.1367 (max= 1.6228), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:16,719 - root - INFO - Step 840: lr=1.00E-05, loss= 1.1367 (max= 1.6228), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:16,719 - root - INFO - Step 840: lr=1.00E-05, loss= 1.1367 (max= 1.6228), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:16,719 - root - INFO - Step 840: lr=1.00E-05, loss= 1.1367 (max= 1.6228), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:16,719 - root - INFO - Step 840: lr=1.00E-05, loss= 1.1367 (max= 1.6228), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:16,719 - root - INFO - Step 840: lr=1.00E-05, loss= 1.1367 (max= 1.6228), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:48,546 - root - INFO - Step 850: lr=1.00E-05, loss= 1.1393 (max= 1.6020), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:48,546 - root - INFO - Step 850: lr=1.00E-05, loss= 1.1393 (max= 1.6020), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:48,546 - root - INFO - Step 850: lr=1.00E-05, loss= 1.1393 (max= 1.6020), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:48,546 - root - INFO - Step 850: lr=1.00E-05, loss= 1.1393 (max= 1.6020), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:48,546 - root - INFO - Step 850: lr=1.00E-05, loss= 1.1393 (max= 1.6020), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:48,546 - root - INFO - Step 850: lr=1.00E-05, loss= 1.1393 (max= 1.6020), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:48,546 - root - INFO - Step 850: lr=1.00E-05, loss= 1.1393 (max= 1.6020), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:09:48,547 - root - INFO - Step 850: lr=1.00E-05, loss= 1.1393 (max= 1.6020), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:20,431 - root - INFO - Step 860: lr=1.00E-05, loss= 1.1481 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:20,431 - root - INFO - Step 860: lr=1.00E-05, loss= 1.1481 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:20,431 - root - INFO - Step 860: lr=1.00E-05, loss= 1.1481 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:20,431 - root - INFO - Step 860: lr=1.00E-05, loss= 1.1481 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:20,431 - root - INFO - Step 860: lr=1.00E-05, loss= 1.1481 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:20,431 - root - INFO - Step 860: lr=1.00E-05, loss= 1.1481 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:20,431 - root - INFO - Step 860: lr=1.00E-05, loss= 1.1481 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:20,431 - root - INFO - Step 860: lr=1.00E-05, loss= 1.1481 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:52,280 - root - INFO - Step 870: lr=1.00E-05, loss= 1.1264 (max= 1.6355), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:52,280 - root - INFO - Step 870: lr=1.00E-05, loss= 1.1264 (max= 1.6355), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:52,280 - root - INFO - Step 870: lr=1.00E-05, loss= 1.1264 (max= 1.6355), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:52,280 - root - INFO - Step 870: lr=1.00E-05, loss= 1.1264 (max= 1.6355), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:52,280 - root - INFO - Step 870: lr=1.00E-05, loss= 1.1264 (max= 1.6355), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:52,280 - root - INFO - Step 870: lr=1.00E-05, loss= 1.1264 (max= 1.6355), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:52,280 - root - INFO - Step 870: lr=1.00E-05, loss= 1.1264 (max= 1.6355), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:10:52,281 - root - INFO - Step 870: lr=1.00E-05, loss= 1.1264 (max= 1.6355), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:24,116 - root - INFO - Step 880: lr=1.00E-05, loss= 1.1408 (max= 1.6391), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:24,116 - root - INFO - Step 880: lr=1.00E-05, loss= 1.1408 (max= 1.6391), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:24,116 - root - INFO - Step 880: lr=1.00E-05, loss= 1.1408 (max= 1.6391), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:24,116 - root - INFO - Step 880: lr=1.00E-05, loss= 1.1408 (max= 1.6391), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:24,116 - root - INFO - Step 880: lr=1.00E-05, loss= 1.1408 (max= 1.6391), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:24,116 - root - INFO - Step 880: lr=1.00E-05, loss= 1.1408 (max= 1.6391), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:24,116 - root - INFO - Step 880: lr=1.00E-05, loss= 1.1408 (max= 1.6391), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:24,116 - root - INFO - Step 880: lr=1.00E-05, loss= 1.1408 (max= 1.6391), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:55,984 - root - INFO - Step 890: lr=1.00E-05, loss= 1.1347 (max= 1.5091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:55,984 - root - INFO - Step 890: lr=1.00E-05, loss= 1.1347 (max= 1.5091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:55,984 - root - INFO - Step 890: lr=1.00E-05, loss= 1.1347 (max= 1.5091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:55,984 - root - INFO - Step 890: lr=1.00E-05, loss= 1.1347 (max= 1.5091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:55,984 - root - INFO - Step 890: lr=1.00E-05, loss= 1.1347 (max= 1.5091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:55,984 - root - INFO - Step 890: lr=1.00E-05, loss= 1.1347 (max= 1.5091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:55,984 - root - INFO - Step 890: lr=1.00E-05, loss= 1.1347 (max= 1.5091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:11:55,984 - root - INFO - Step 890: lr=1.00E-05, loss= 1.1347 (max= 1.5091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:27,886 - root - INFO - Step 900: lr=1.00E-05, loss= 1.1508 (max= 1.6627), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:27,886 - root - INFO - Step 900: lr=1.00E-05, loss= 1.1508 (max= 1.6627), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:27,886 - root - INFO - Step 900: lr=1.00E-05, loss= 1.1508 (max= 1.6627), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:27,886 - root - INFO - Step 900: lr=1.00E-05, loss= 1.1508 (max= 1.6627), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:27,886 - root - INFO - Step 900: lr=1.00E-05, loss= 1.1508 (max= 1.6627), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:27,886 - root - INFO - Step 900: lr=1.00E-05, loss= 1.1508 (max= 1.6627), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:27,887 - root - INFO - Step 900: lr=1.00E-05, loss= 1.1508 (max= 1.6627), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:27,886 - root - INFO - Step 900: lr=1.00E-05, loss= 1.1508 (max= 1.6627), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:59,735 - root - INFO - Step 910: lr=1.00E-05, loss= 1.1380 (max= 1.6403), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:59,735 - root - INFO - Step 910: lr=1.00E-05, loss= 1.1380 (max= 1.6403), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:59,735 - root - INFO - Step 910: lr=1.00E-05, loss= 1.1380 (max= 1.6403), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:59,735 - root - INFO - Step 910: lr=1.00E-05, loss= 1.1380 (max= 1.6403), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:59,735 - root - INFO - Step 910: lr=1.00E-05, loss= 1.1380 (max= 1.6403), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:59,735 - root - INFO - Step 910: lr=1.00E-05, loss= 1.1380 (max= 1.6403), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:59,735 - root - INFO - Step 910: lr=1.00E-05, loss= 1.1380 (max= 1.6403), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:12:59,735 - root - INFO - Step 910: lr=1.00E-05, loss= 1.1380 (max= 1.6403), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:13:31,547 - root - INFO - Step 920: lr=1.00E-05, loss= 1.1186 (max= 1.8039), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:13:31,547 - root - INFO - Step 920: lr=1.00E-05, loss= 1.1186 (max= 1.8039), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:13:31,547 - root - INFO - Step 920: lr=1.00E-05, loss= 1.1186 (max= 1.8039), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:13:31,547 - root - INFO - Step 920: lr=1.00E-05, loss= 1.1186 (max= 1.8039), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:13:31,547 - root - INFO - Step 920: lr=1.00E-05, loss= 1.1186 (max= 1.8039), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:13:31,547 - root - INFO - Step 920: lr=1.00E-05, loss= 1.1186 (max= 1.8039), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:13:31,547 - root - INFO - Step 920: lr=1.00E-05, loss= 1.1186 (max= 1.8039), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:13:31,547 - root - INFO - Step 920: lr=1.00E-05, loss= 1.1186 (max= 1.8039), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:03,336 - root - INFO - Step 930: lr=1.00E-05, loss= 1.1284 (max= 1.6523), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:03,336 - root - INFO - Step 930: lr=1.00E-05, loss= 1.1284 (max= 1.6523), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:03,336 - root - INFO - Step 930: lr=1.00E-05, loss= 1.1284 (max= 1.6523), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:03,336 - root - INFO - Step 930: lr=1.00E-05, loss= 1.1284 (max= 1.6523), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:03,336 - root - INFO - Step 930: lr=1.00E-05, loss= 1.1284 (max= 1.6523), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:03,336 - root - INFO - Step 930: lr=1.00E-05, loss= 1.1284 (max= 1.6523), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:03,336 - root - INFO - Step 930: lr=1.00E-05, loss= 1.1284 (max= 1.6523), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:03,336 - root - INFO - Step 930: lr=1.00E-05, loss= 1.1284 (max= 1.6523), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:35,223 - root - INFO - Step 940: lr=1.00E-05, loss= 1.1506 (max= 1.5624), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:35,224 - root - INFO - Step 940: lr=1.00E-05, loss= 1.1506 (max= 1.5624), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:35,224 - root - INFO - Step 940: lr=1.00E-05, loss= 1.1506 (max= 1.5624), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:35,224 - root - INFO - Step 940: lr=1.00E-05, loss= 1.1506 (max= 1.5624), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:35,224 - root - INFO - Step 940: lr=1.00E-05, loss= 1.1506 (max= 1.5624), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:35,224 - root - INFO - Step 940: lr=1.00E-05, loss= 1.1506 (max= 1.5624), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:35,224 - root - INFO - Step 940: lr=1.00E-05, loss= 1.1506 (max= 1.5624), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:14:35,224 - root - INFO - Step 940: lr=1.00E-05, loss= 1.1506 (max= 1.5624), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:07,145 - root - INFO - Step 950: lr=1.00E-05, loss= 1.1367 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:07,145 - root - INFO - Step 950: lr=1.00E-05, loss= 1.1367 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:07,145 - root - INFO - Step 950: lr=1.00E-05, loss= 1.1367 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:07,145 - root - INFO - Step 950: lr=1.00E-05, loss= 1.1367 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:07,145 - root - INFO - Step 950: lr=1.00E-05, loss= 1.1367 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:07,145 - root - INFO - Step 950: lr=1.00E-05, loss= 1.1367 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:07,145 - root - INFO - Step 950: lr=1.00E-05, loss= 1.1367 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:07,145 - root - INFO - Step 950: lr=1.00E-05, loss= 1.1367 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:38,945 - root - INFO - Step 960: lr=1.00E-05, loss= 1.1399 (max= 1.4701), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:38,945 - root - INFO - Step 960: lr=1.00E-05, loss= 1.1399 (max= 1.4701), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:38,945 - root - INFO - Step 960: lr=1.00E-05, loss= 1.1399 (max= 1.4701), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:38,945 - root - INFO - Step 960: lr=1.00E-05, loss= 1.1399 (max= 1.4701), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:38,945 - root - INFO - Step 960: lr=1.00E-05, loss= 1.1399 (max= 1.4701), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:38,945 - root - INFO - Step 960: lr=1.00E-05, loss= 1.1399 (max= 1.4701), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:38,945 - root - INFO - Step 960: lr=1.00E-05, loss= 1.1399 (max= 1.4701), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:15:38,945 - root - INFO - Step 960: lr=1.00E-05, loss= 1.1399 (max= 1.4701), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:10,821 - root - INFO - Step 970: lr=1.00E-05, loss= 1.1229 (max= 1.5021), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:10,821 - root - INFO - Step 970: lr=1.00E-05, loss= 1.1229 (max= 1.5021), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:10,821 - root - INFO - Step 970: lr=1.00E-05, loss= 1.1229 (max= 1.5021), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:10,821 - root - INFO - Step 970: lr=1.00E-05, loss= 1.1229 (max= 1.5021), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:10,821 - root - INFO - Step 970: lr=1.00E-05, loss= 1.1229 (max= 1.5021), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:10,821 - root - INFO - Step 970: lr=1.00E-05, loss= 1.1229 (max= 1.5021), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:10,821 - root - INFO - Step 970: lr=1.00E-05, loss= 1.1229 (max= 1.5021), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:10,821 - root - INFO - Step 970: lr=1.00E-05, loss= 1.1229 (max= 1.5021), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:42,700 - root - INFO - Step 980: lr=1.00E-05, loss= 1.1228 (max= 1.6732), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:42,700 - root - INFO - Step 980: lr=1.00E-05, loss= 1.1228 (max= 1.6732), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:42,700 - root - INFO - Step 980: lr=1.00E-05, loss= 1.1228 (max= 1.6732), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:42,700 - root - INFO - Step 980: lr=1.00E-05, loss= 1.1228 (max= 1.6732), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:42,700 - root - INFO - Step 980: lr=1.00E-05, loss= 1.1228 (max= 1.6732), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:42,700 - root - INFO - Step 980: lr=1.00E-05, loss= 1.1228 (max= 1.6732), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:42,700 - root - INFO - Step 980: lr=1.00E-05, loss= 1.1228 (max= 1.6732), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:16:42,700 - root - INFO - Step 980: lr=1.00E-05, loss= 1.1228 (max= 1.6732), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:14,465 - root - INFO - Step 990: lr=1.00E-05, loss= 1.1227 (max= 1.5690), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:14,465 - root - INFO - Step 990: lr=1.00E-05, loss= 1.1227 (max= 1.5690), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:14,465 - root - INFO - Step 990: lr=1.00E-05, loss= 1.1227 (max= 1.5690), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:14,465 - root - INFO - Step 990: lr=1.00E-05, loss= 1.1227 (max= 1.5690), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:14,465 - root - INFO - Step 990: lr=1.00E-05, loss= 1.1227 (max= 1.5690), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:14,465 - root - INFO - Step 990: lr=1.00E-05, loss= 1.1227 (max= 1.5690), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:14,465 - root - INFO - Step 990: lr=1.00E-05, loss= 1.1227 (max= 1.5690), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:14,465 - root - INFO - Step 990: lr=1.00E-05, loss= 1.1227 (max= 1.5690), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-1000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-1000! Save time: 4.524025201797485 +2025-10-25 11:17:46,343 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.1110 (max= 1.4514), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:46,343 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.1110 (max= 1.4514), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:46,343 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.1110 (max= 1.4514), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:46,343 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-25 11:17:46,343 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.1110 (max= 1.4514), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:46,343 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 11:17:46,343 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.1110 (max= 1.4514), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:46,343 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-25 11:17:46,343 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-25 11:17:46,343 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 11:17:46,343 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 11:17:46,343 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-25 11:17:46,343 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-25 11:17:46,343 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.1110 (max= 1.4514), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:46,343 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 11:17:46,343 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 11:17:46,343 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.1110 (max= 1.4514), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:46,343 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-25 11:17:46,343 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 11:17:46,343 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-25 11:17:46,343 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 11:17:46,344 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.1110 (max= 1.4514), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:17:46,344 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-25 11:17:46,344 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 11:18:12,631 - root - INFO - Finished saving the checkpoint in 26.29 seconds +2025-10-25 11:18:12,640 - root - INFO - Finished saving the checkpoint in 26.30 seconds +2025-10-25 11:18:12,641 - root - INFO - Finished saving the checkpoint in 26.30 seconds +2025-10-25 11:18:12,641 - root - INFO - Finished saving the checkpoint in 26.30 seconds +2025-10-25 11:18:12,642 - root - INFO - Finished saving the checkpoint in 26.30 seconds +2025-10-25 11:18:12,644 - root - INFO - Finished saving the checkpoint in 26.30 seconds +2025-10-25 11:18:12,644 - root - INFO - Finished saving the checkpoint in 26.30 seconds +2025-10-25 11:18:12,647 - root - INFO - Finished saving the checkpoint in 26.30 seconds +2025-10-25 11:18:19,550 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5901498 +2025-10-25 11:18:44,410 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.1184 (max= 1.6060), tps=11287, mfu=23.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:18:44,410 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.1184 (max= 1.6060), tps=11287, mfu=23.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:18:44,410 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.1184 (max= 1.6060), tps=11287, mfu=23.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:18:44,410 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.1184 (max= 1.6060), tps=11287, mfu=23.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:18:44,410 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.1184 (max= 1.6060), tps=11287, mfu=23.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:18:44,410 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.1184 (max= 1.6060), tps=11287, mfu=23.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:18:44,410 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.1184 (max= 1.6060), tps=11287, mfu=23.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:18:44,411 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.1184 (max= 1.6060), tps=11287, mfu=23.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:16,325 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.1067 (max= 1.5195), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:16,326 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.1067 (max= 1.5195), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:16,326 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.1067 (max= 1.5195), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:16,326 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.1067 (max= 1.5195), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:16,326 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.1067 (max= 1.5195), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:16,326 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.1067 (max= 1.5195), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:16,326 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.1067 (max= 1.5195), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:16,326 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.1067 (max= 1.5195), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:48,123 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.1098 (max= 1.5581), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:48,123 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.1098 (max= 1.5581), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:48,123 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.1098 (max= 1.5581), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:48,123 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.1098 (max= 1.5581), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:48,123 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.1098 (max= 1.5581), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:48,123 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.1098 (max= 1.5581), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:48,123 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.1098 (max= 1.5581), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:19:48,124 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.1098 (max= 1.5581), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:19,945 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.1296 (max= 1.6791), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:19,945 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.1296 (max= 1.6791), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:19,945 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.1296 (max= 1.6791), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:19,945 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.1296 (max= 1.6791), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:19,945 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.1296 (max= 1.6791), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:19,945 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.1296 (max= 1.6791), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:19,945 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.1296 (max= 1.6791), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:19,945 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.1296 (max= 1.6791), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:51,863 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.1291 (max= 1.5356), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:51,864 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.1291 (max= 1.5356), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:51,864 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.1291 (max= 1.5356), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:51,864 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.1291 (max= 1.5356), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:51,864 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.1291 (max= 1.5356), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:51,864 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.1291 (max= 1.5356), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:51,864 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.1291 (max= 1.5356), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:20:51,864 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.1291 (max= 1.5356), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:23,703 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.1298 (max= 1.5359), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:23,703 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.1298 (max= 1.5359), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:23,704 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.1298 (max= 1.5359), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:23,704 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.1298 (max= 1.5359), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:23,704 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.1298 (max= 1.5359), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:23,704 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.1298 (max= 1.5359), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:23,704 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.1298 (max= 1.5359), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:23,704 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.1298 (max= 1.5359), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:55,510 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.1250 (max= 1.9029), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:55,510 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.1250 (max= 1.9029), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:55,510 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.1250 (max= 1.9029), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:55,510 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.1250 (max= 1.9029), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:55,510 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.1250 (max= 1.9029), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:55,510 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.1250 (max= 1.9029), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:55,510 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.1250 (max= 1.9029), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:21:55,510 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.1250 (max= 1.9029), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:27,438 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.1299 (max= 1.5668), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:27,438 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.1299 (max= 1.5668), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:27,438 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.1299 (max= 1.5668), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:27,438 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.1299 (max= 1.5668), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:27,438 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.1299 (max= 1.5668), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:27,438 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.1299 (max= 1.5668), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:27,438 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.1299 (max= 1.5668), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:27,438 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.1299 (max= 1.5668), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:59,264 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.1357 (max= 1.6178), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:59,264 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.1357 (max= 1.6178), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:59,264 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.1357 (max= 1.6178), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:59,264 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.1357 (max= 1.6178), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:59,264 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.1357 (max= 1.6178), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:59,264 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.1357 (max= 1.6178), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:59,264 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.1357 (max= 1.6178), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:22:59,264 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.1357 (max= 1.6178), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:23:31,126 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.1208 (max= 1.4881), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:23:31,126 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.1208 (max= 1.4881), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:23:31,126 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.1208 (max= 1.4881), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:23:31,126 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.1208 (max= 1.4881), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:23:31,126 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.1208 (max= 1.4881), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:23:31,126 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.1208 (max= 1.4881), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:23:31,126 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.1208 (max= 1.4881), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:23:31,126 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.1208 (max= 1.4881), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:02,980 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.1112 (max= 1.5288), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:02,980 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.1112 (max= 1.5288), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:02,980 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.1112 (max= 1.5288), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:02,980 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.1112 (max= 1.5288), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:02,980 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.1112 (max= 1.5288), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:02,980 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.1112 (max= 1.5288), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:02,980 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.1112 (max= 1.5288), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:02,980 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.1112 (max= 1.5288), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:34,866 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.1123 (max= 1.5964), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:34,866 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.1123 (max= 1.5964), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:34,866 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.1123 (max= 1.5964), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:34,866 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.1123 (max= 1.5964), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:34,866 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.1123 (max= 1.5964), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:34,866 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.1123 (max= 1.5964), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:34,866 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.1123 (max= 1.5964), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:24:34,866 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.1123 (max= 1.5964), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:06,695 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.1067 (max= 1.5449), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:06,695 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.1067 (max= 1.5449), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:06,695 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.1067 (max= 1.5449), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:06,695 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.1067 (max= 1.5449), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:06,695 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.1067 (max= 1.5449), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:06,695 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.1067 (max= 1.5449), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:06,695 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.1067 (max= 1.5449), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:06,695 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.1067 (max= 1.5449), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:38,560 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.1165 (max= 1.7062), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:38,560 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.1165 (max= 1.7062), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:38,560 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.1165 (max= 1.7062), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:38,560 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.1165 (max= 1.7062), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:38,560 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.1165 (max= 1.7062), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:38,560 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.1165 (max= 1.7062), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:38,560 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.1165 (max= 1.7062), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:25:38,560 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.1165 (max= 1.7062), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:10,414 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.1119 (max= 1.4958), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:10,415 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.1119 (max= 1.4958), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:10,415 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.1119 (max= 1.4958), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:10,415 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.1119 (max= 1.4958), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:10,415 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.1119 (max= 1.4958), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:10,415 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.1119 (max= 1.4958), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:10,415 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.1119 (max= 1.4958), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:10,415 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.1119 (max= 1.4958), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:42,255 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.1285 (max= 1.4923), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:42,255 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.1285 (max= 1.4923), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:42,255 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.1285 (max= 1.4923), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:42,255 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.1285 (max= 1.4923), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:42,255 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.1285 (max= 1.4923), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:42,255 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.1285 (max= 1.4923), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:42,255 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.1285 (max= 1.4923), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:26:42,255 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.1285 (max= 1.4923), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:14,111 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.1184 (max= 1.5537), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:14,111 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.1184 (max= 1.5537), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:14,111 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.1184 (max= 1.5537), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:14,111 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.1184 (max= 1.5537), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:14,111 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.1184 (max= 1.5537), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:14,111 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.1184 (max= 1.5537), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:14,111 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.1184 (max= 1.5537), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:14,111 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.1184 (max= 1.5537), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:45,944 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.1323 (max= 1.6708), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:45,944 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.1323 (max= 1.6708), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:45,944 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.1323 (max= 1.6708), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:45,944 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.1323 (max= 1.6708), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:45,944 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.1323 (max= 1.6708), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:45,944 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.1323 (max= 1.6708), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:45,944 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.1323 (max= 1.6708), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:27:45,945 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.1323 (max= 1.6708), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:17,841 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.0782 (max= 1.7329), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:17,841 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.0782 (max= 1.7329), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:17,841 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.0782 (max= 1.7329), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:17,841 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.0782 (max= 1.7329), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:17,841 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.0782 (max= 1.7329), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:17,841 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.0782 (max= 1.7329), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:17,841 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.0782 (max= 1.7329), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:17,841 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.0782 (max= 1.7329), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:49,763 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.1195 (max= 1.4936), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:49,763 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.1195 (max= 1.4936), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:49,763 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.1195 (max= 1.4936), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:49,763 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.1195 (max= 1.4936), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:49,763 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.1195 (max= 1.4936), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:49,763 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.1195 (max= 1.4936), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:49,763 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.1195 (max= 1.4936), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:49,763 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.1195 (max= 1.4936), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:28:58,470 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:330705 +2025-10-25 11:29:21,618 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.1464 (max= 1.5264), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:21,618 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.1464 (max= 1.5264), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:21,618 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.1464 (max= 1.5264), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:21,618 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.1464 (max= 1.5264), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:21,618 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.1464 (max= 1.5264), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:21,618 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.1464 (max= 1.5264), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:21,618 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.1464 (max= 1.5264), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:21,618 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.1464 (max= 1.5264), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:53,491 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.1104 (max= 1.5991), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:53,491 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.1104 (max= 1.5991), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:53,491 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.1104 (max= 1.5991), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:53,491 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.1104 (max= 1.5991), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:53,491 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.1104 (max= 1.5991), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:53,491 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.1104 (max= 1.5991), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:53,491 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.1104 (max= 1.5991), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:29:53,492 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.1104 (max= 1.5991), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:25,410 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.1305 (max= 1.6453), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:25,411 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.1305 (max= 1.6453), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:25,411 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.1305 (max= 1.6453), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:25,411 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.1305 (max= 1.6453), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:25,411 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.1305 (max= 1.6453), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:25,411 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.1305 (max= 1.6453), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:25,411 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.1305 (max= 1.6453), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:25,411 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.1305 (max= 1.6453), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:57,271 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.1019 (max= 1.5244), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:57,271 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.1019 (max= 1.5244), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:57,271 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.1019 (max= 1.5244), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:57,271 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.1019 (max= 1.5244), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:57,271 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.1019 (max= 1.5244), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:57,271 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.1019 (max= 1.5244), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:57,271 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.1019 (max= 1.5244), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:57,271 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.1019 (max= 1.5244), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:30:59,621 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:4518290 +2025-10-25 11:31:29,110 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.1268 (max= 1.6368), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:31:29,110 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.1268 (max= 1.6368), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:31:29,110 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.1268 (max= 1.6368), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:31:29,110 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.1268 (max= 1.6368), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:31:29,110 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.1268 (max= 1.6368), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:31:29,110 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.1268 (max= 1.6368), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:31:29,110 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.1268 (max= 1.6368), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:31:29,110 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.1268 (max= 1.6368), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:00,951 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.1001 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:00,951 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.1001 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:00,951 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.1001 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:00,951 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.1001 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:00,951 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.1001 (max= 1.5935), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:00,951 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.1001 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:00,951 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.1001 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:00,951 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.1001 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:32,783 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.1268 (max= 1.5197), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:32,784 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.1268 (max= 1.5197), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:32,784 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.1268 (max= 1.5197), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:32,784 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.1268 (max= 1.5197), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:32,784 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.1268 (max= 1.5197), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:32,784 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.1268 (max= 1.5197), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:32,784 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.1268 (max= 1.5197), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:32:32,784 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.1268 (max= 1.5197), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:04,696 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.1214 (max= 1.5476), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:04,696 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.1214 (max= 1.5476), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:04,696 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.1214 (max= 1.5476), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:04,696 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.1214 (max= 1.5476), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:04,696 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.1214 (max= 1.5476), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:04,696 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.1214 (max= 1.5476), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:04,696 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.1214 (max= 1.5476), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:04,696 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.1214 (max= 1.5476), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:36,613 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.1180 (max= 1.5248), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:36,613 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.1180 (max= 1.5248), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:36,613 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.1180 (max= 1.5248), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:36,613 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.1180 (max= 1.5248), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:36,613 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.1180 (max= 1.5248), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:36,613 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.1180 (max= 1.5248), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:36,613 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.1180 (max= 1.5248), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:33:36,613 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.1180 (max= 1.5248), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:08,475 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.1150 (max= 1.5294), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:08,475 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.1150 (max= 1.5294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:08,475 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.1150 (max= 1.5294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:08,475 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.1150 (max= 1.5294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:08,475 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.1150 (max= 1.5294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:08,475 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.1150 (max= 1.5294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:08,475 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.1150 (max= 1.5294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:08,475 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.1150 (max= 1.5294), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:40,357 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.1043 (max= 1.6829), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:40,358 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.1043 (max= 1.6829), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:40,358 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.1043 (max= 1.6829), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:40,358 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.1043 (max= 1.6829), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:40,358 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.1043 (max= 1.6829), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:40,358 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.1043 (max= 1.6829), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:40,358 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.1043 (max= 1.6829), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:34:40,358 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.1043 (max= 1.6829), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:12,220 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.0984 (max= 1.5816), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:12,220 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.0984 (max= 1.5816), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:12,220 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.0984 (max= 1.5816), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:12,220 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.0984 (max= 1.5816), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:12,220 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.0984 (max= 1.5816), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:12,220 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.0984 (max= 1.5816), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:12,220 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.0984 (max= 1.5816), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:12,220 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.0984 (max= 1.5816), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:44,167 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.1116 (max= 1.7423), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:44,167 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.1116 (max= 1.7423), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:44,167 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.1116 (max= 1.7423), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:44,167 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.1116 (max= 1.7423), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:44,168 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.1116 (max= 1.7423), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:44,168 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.1116 (max= 1.7423), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:44,168 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.1116 (max= 1.7423), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:35:44,168 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.1116 (max= 1.7423), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:16,074 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.1240 (max= 1.4913), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:16,074 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.1240 (max= 1.4913), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:16,074 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.1240 (max= 1.4913), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:16,075 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.1240 (max= 1.4913), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:16,075 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.1240 (max= 1.4913), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:16,075 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.1240 (max= 1.4913), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:16,075 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.1240 (max= 1.4913), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:16,075 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.1240 (max= 1.4913), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:47,930 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.0977 (max= 1.5082), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:47,930 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.0977 (max= 1.5082), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:47,930 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.0977 (max= 1.5082), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:47,930 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.0977 (max= 1.5082), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:47,930 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.0977 (max= 1.5082), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:47,930 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.0977 (max= 1.5082), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:47,930 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.0977 (max= 1.5082), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:36:47,930 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.0977 (max= 1.5082), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:19,755 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.1210 (max= 1.5434), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:19,755 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.1210 (max= 1.5434), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:19,755 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.1210 (max= 1.5434), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:19,755 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.1210 (max= 1.5434), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:19,755 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.1210 (max= 1.5434), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:19,755 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.1210 (max= 1.5434), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:19,755 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.1210 (max= 1.5434), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:19,755 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.1210 (max= 1.5434), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:51,632 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.1104 (max= 1.5349), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:51,632 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.1104 (max= 1.5349), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:51,632 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.1104 (max= 1.5349), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:51,632 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.1104 (max= 1.5349), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:51,632 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.1104 (max= 1.5349), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:51,632 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.1104 (max= 1.5349), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:51,632 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.1104 (max= 1.5349), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:37:51,632 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.1104 (max= 1.5349), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:23,487 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.0897 (max= 1.4804), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:23,487 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.0897 (max= 1.4804), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:23,487 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.0897 (max= 1.4804), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:23,487 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.0897 (max= 1.4804), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:23,487 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.0897 (max= 1.4804), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:23,487 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.0897 (max= 1.4804), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:23,487 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.0897 (max= 1.4804), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:23,488 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.0897 (max= 1.4804), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:55,381 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.0945 (max= 1.6310), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:55,381 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.0945 (max= 1.6310), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:55,381 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.0945 (max= 1.6310), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:55,381 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.0945 (max= 1.6310), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:55,381 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.0945 (max= 1.6310), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:55,381 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.0945 (max= 1.6310), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:55,381 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.0945 (max= 1.6310), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:38:55,381 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.0945 (max= 1.6310), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:27,201 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.1220 (max= 1.5682), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:27,201 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.1220 (max= 1.5682), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:27,201 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.1220 (max= 1.5682), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:27,201 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.1220 (max= 1.5682), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:27,201 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.1220 (max= 1.5682), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:27,201 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.1220 (max= 1.5682), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:27,201 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.1220 (max= 1.5682), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:27,201 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.1220 (max= 1.5682), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:59,122 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.1068 (max= 1.5705), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:59,122 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.1068 (max= 1.5705), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:59,122 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.1068 (max= 1.5705), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:59,122 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.1068 (max= 1.5705), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:59,122 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.1068 (max= 1.5705), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:59,122 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.1068 (max= 1.5705), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:59,122 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.1068 (max= 1.5705), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:39:59,122 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.1068 (max= 1.5705), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:40:30,979 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.1119 (max= 1.6405), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:40:30,979 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.1119 (max= 1.6405), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:40:30,979 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.1119 (max= 1.6405), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:40:30,979 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.1119 (max= 1.6405), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:40:30,979 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.1119 (max= 1.6405), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:40:30,979 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.1119 (max= 1.6405), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:40:30,979 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.1119 (max= 1.6405), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:40:30,979 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.1119 (max= 1.6405), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:02,823 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.1334 (max= 1.5643), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:02,823 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.1334 (max= 1.5643), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:02,823 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.1334 (max= 1.5643), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:02,823 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.1334 (max= 1.5643), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:02,823 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.1334 (max= 1.5643), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:02,823 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.1334 (max= 1.5643), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:02,823 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.1334 (max= 1.5643), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:02,823 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.1334 (max= 1.5643), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:34,686 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.1169 (max= 1.4664), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:34,686 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.1169 (max= 1.4664), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:34,686 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.1169 (max= 1.4664), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:34,686 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.1169 (max= 1.4664), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:34,686 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.1169 (max= 1.4664), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:34,686 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.1169 (max= 1.4664), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:34,686 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.1169 (max= 1.4664), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:41:34,686 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.1169 (max= 1.4664), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:06,510 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.1300 (max= 1.7364), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:06,510 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.1300 (max= 1.7364), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:06,510 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.1300 (max= 1.7364), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:06,510 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.1300 (max= 1.7364), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:06,510 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.1300 (max= 1.7364), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:06,510 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.1300 (max= 1.7364), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:06,510 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.1300 (max= 1.7364), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:06,510 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.1300 (max= 1.7364), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:38,370 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.1262 (max= 1.5498), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:38,370 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.1262 (max= 1.5498), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:38,370 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.1262 (max= 1.5498), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:38,370 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.1262 (max= 1.5498), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:38,370 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.1262 (max= 1.5498), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:38,370 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.1262 (max= 1.5498), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:38,370 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.1262 (max= 1.5498), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:42:38,370 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.1262 (max= 1.5498), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:10,306 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.1233 (max= 1.4938), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:10,306 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.1233 (max= 1.4938), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:10,306 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.1233 (max= 1.4938), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:10,306 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.1233 (max= 1.4938), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:10,306 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.1233 (max= 1.4938), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:10,306 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.1233 (max= 1.4938), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:10,306 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.1233 (max= 1.4938), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:10,307 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.1233 (max= 1.4938), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:42,250 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.1145 (max= 1.6999), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:42,250 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.1145 (max= 1.6999), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:42,250 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.1145 (max= 1.6999), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:42,250 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.1145 (max= 1.6999), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:42,250 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.1145 (max= 1.6999), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:42,250 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.1145 (max= 1.6999), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:42,250 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.1145 (max= 1.6999), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:43:42,250 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.1145 (max= 1.6999), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,150 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.1121 (max= 1.6165), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,150 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.1121 (max= 1.6165), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,150 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.1121 (max= 1.6165), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,150 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.1121 (max= 1.6165), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,150 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.1121 (max= 1.6165), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,150 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.1121 (max= 1.6165), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,150 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.1121 (max= 1.6165), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,150 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.1121 (max= 1.6165), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:14,737 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:2929508 +2025-10-25 11:44:46,014 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.1089 (max= 1.5813), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:46,014 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.1089 (max= 1.5813), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:46,014 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.1089 (max= 1.5813), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:46,014 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.1089 (max= 1.5813), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:46,014 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.1089 (max= 1.5813), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:46,014 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.1089 (max= 1.5813), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:46,014 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.1089 (max= 1.5813), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:44:46,014 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.1089 (max= 1.5813), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:17,890 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.1387 (max= 1.7583), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:17,890 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.1387 (max= 1.7583), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:17,890 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.1387 (max= 1.7583), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:17,890 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.1387 (max= 1.7583), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:17,890 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.1387 (max= 1.7583), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:17,890 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.1387 (max= 1.7583), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:17,890 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.1387 (max= 1.7583), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:17,890 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.1387 (max= 1.7583), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:49,709 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.1203 (max= 1.4766), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:49,709 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.1203 (max= 1.4766), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:49,709 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.1203 (max= 1.4766), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:49,709 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.1203 (max= 1.4766), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:49,709 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.1203 (max= 1.4766), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:49,709 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.1203 (max= 1.4766), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:49,709 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.1203 (max= 1.4766), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:45:49,709 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.1203 (max= 1.4766), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:21,566 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.1348 (max= 1.5542), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:21,566 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.1348 (max= 1.5542), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:21,566 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.1348 (max= 1.5542), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:21,566 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.1348 (max= 1.5542), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:21,566 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.1348 (max= 1.5542), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:21,566 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.1348 (max= 1.5542), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:21,566 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.1348 (max= 1.5542), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:21,566 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.1348 (max= 1.5542), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:53,370 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.1290 (max= 1.5311), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:53,371 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.1290 (max= 1.5311), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:53,371 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.1290 (max= 1.5311), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:53,371 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.1290 (max= 1.5311), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:53,371 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.1290 (max= 1.5311), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:53,371 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.1290 (max= 1.5311), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:53,371 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.1290 (max= 1.5311), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:46:53,371 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.1290 (max= 1.5311), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:25,215 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.1108 (max= 1.6303), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:25,215 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.1108 (max= 1.6303), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:25,215 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.1108 (max= 1.6303), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:25,215 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.1108 (max= 1.6303), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:25,215 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.1108 (max= 1.6303), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:25,215 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.1108 (max= 1.6303), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:25,215 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.1108 (max= 1.6303), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:25,215 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.1108 (max= 1.6303), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:57,017 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.1070 (max= 1.5167), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:57,018 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.1070 (max= 1.5167), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:57,018 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.1070 (max= 1.5167), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:57,018 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.1070 (max= 1.5167), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:57,018 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.1070 (max= 1.5167), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:57,018 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.1070 (max= 1.5167), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:57,018 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.1070 (max= 1.5167), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:47:57,018 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.1070 (max= 1.5167), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:48:28,877 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.1021 (max= 1.4640), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:48:28,877 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.1021 (max= 1.4640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:48:28,877 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.1021 (max= 1.4640), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:48:28,877 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.1021 (max= 1.4640), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:48:28,877 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.1021 (max= 1.4640), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:48:28,877 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.1021 (max= 1.4640), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:48:28,877 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.1021 (max= 1.4640), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:48:28,877 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.1021 (max= 1.4640), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:00,813 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.1279 (max= 1.4811), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:00,813 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.1279 (max= 1.4811), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:00,813 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.1279 (max= 1.4811), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:00,813 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.1279 (max= 1.4811), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:00,814 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.1279 (max= 1.4811), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:00,814 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.1279 (max= 1.4811), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:00,814 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.1279 (max= 1.4811), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:00,814 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.1279 (max= 1.4811), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:32,666 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.1231 (max= 1.7101), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:32,666 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.1231 (max= 1.7101), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:32,666 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.1231 (max= 1.7101), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:32,666 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.1231 (max= 1.7101), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:32,666 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.1231 (max= 1.7101), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:32,666 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.1231 (max= 1.7101), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:32,666 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.1231 (max= 1.7101), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:49:32,666 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.1231 (max= 1.7101), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:04,497 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.1177 (max= 1.6059), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:04,497 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.1177 (max= 1.6059), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:04,497 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.1177 (max= 1.6059), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:04,497 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.1177 (max= 1.6059), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:04,497 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.1177 (max= 1.6059), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:04,497 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.1177 (max= 1.6059), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:04,497 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.1177 (max= 1.6059), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:04,497 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.1177 (max= 1.6059), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:36,371 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.1509 (max= 1.5928), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:36,371 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.1509 (max= 1.5928), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:36,371 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.1509 (max= 1.5928), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:36,371 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.1509 (max= 1.5928), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:36,371 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.1509 (max= 1.5928), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:36,371 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.1509 (max= 1.5928), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:36,372 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.1509 (max= 1.5928), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:50:36,372 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.1509 (max= 1.5928), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:08,184 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.1099 (max= 1.5681), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:08,184 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.1099 (max= 1.5681), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:08,184 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.1099 (max= 1.5681), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:08,184 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.1099 (max= 1.5681), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:08,184 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.1099 (max= 1.5681), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:08,185 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.1099 (max= 1.5681), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:08,185 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.1099 (max= 1.5681), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:08,185 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.1099 (max= 1.5681), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:40,009 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.1095 (max= 1.5071), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:40,009 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.1095 (max= 1.5071), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:40,009 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.1095 (max= 1.5071), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:40,009 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.1095 (max= 1.5071), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:40,009 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.1095 (max= 1.5071), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:40,009 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.1095 (max= 1.5071), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:40,009 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.1095 (max= 1.5071), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:51:40,009 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.1095 (max= 1.5071), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:11,821 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.1022 (max= 1.8040), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:11,821 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.1022 (max= 1.8040), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:11,821 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.1022 (max= 1.8040), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:11,821 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.1022 (max= 1.8040), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:11,821 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.1022 (max= 1.8040), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:11,821 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.1022 (max= 1.8040), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:11,821 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.1022 (max= 1.8040), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:11,821 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.1022 (max= 1.8040), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:43,706 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.1003 (max= 1.4933), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:43,706 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.1003 (max= 1.4933), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:43,706 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.1003 (max= 1.4933), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:43,706 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.1003 (max= 1.4933), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:43,706 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.1003 (max= 1.4933), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:43,706 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.1003 (max= 1.4933), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:43,706 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.1003 (max= 1.4933), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:52:43,706 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.1003 (max= 1.4933), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:15,525 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.1002 (max= 1.5648), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:15,525 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.1002 (max= 1.5648), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:15,525 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.1002 (max= 1.5648), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:15,525 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.1002 (max= 1.5648), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:15,525 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.1002 (max= 1.5648), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:15,525 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.1002 (max= 1.5648), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:15,525 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.1002 (max= 1.5648), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:15,526 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.1002 (max= 1.5648), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:47,361 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.0919 (max= 1.6279), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:47,361 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.0919 (max= 1.6279), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:47,361 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.0919 (max= 1.6279), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:47,361 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.0919 (max= 1.6279), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:47,361 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.0919 (max= 1.6279), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:47,361 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.0919 (max= 1.6279), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:47,361 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.0919 (max= 1.6279), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:53:47,362 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.0919 (max= 1.6279), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:19,251 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.1127 (max= 1.5485), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:19,251 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.1127 (max= 1.5485), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:19,251 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.1127 (max= 1.5485), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:19,251 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.1127 (max= 1.5485), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:19,251 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.1127 (max= 1.5485), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:19,251 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.1127 (max= 1.5485), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:19,251 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.1127 (max= 1.5485), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:19,251 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.1127 (max= 1.5485), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:51,047 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.0959 (max= 1.4488), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:51,047 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.0959 (max= 1.4488), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:51,047 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.0959 (max= 1.4488), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:51,047 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.0959 (max= 1.4488), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:51,047 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.0959 (max= 1.4488), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:51,047 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.0959 (max= 1.4488), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:51,047 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.0959 (max= 1.4488), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:54:51,047 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.0959 (max= 1.4488), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:22,950 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.1239 (max= 1.6136), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:22,950 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.1239 (max= 1.6136), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:22,950 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.1239 (max= 1.6136), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:22,950 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.1239 (max= 1.6136), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:22,950 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.1239 (max= 1.6136), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:22,950 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.1239 (max= 1.6136), tps=20545, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:22,950 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.1239 (max= 1.6136), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:22,950 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.1239 (max= 1.6136), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:54,798 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.1222 (max= 1.5127), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:54,798 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.1222 (max= 1.5127), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:54,798 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.1222 (max= 1.5127), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:54,798 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.1222 (max= 1.5127), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:54,798 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.1222 (max= 1.5127), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:54,798 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.1222 (max= 1.5127), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:54,798 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.1222 (max= 1.5127), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:55:54,798 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.1222 (max= 1.5127), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:26,679 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.1007 (max= 1.4715), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:26,679 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.1007 (max= 1.4715), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:26,679 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.1007 (max= 1.4715), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:26,679 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.1007 (max= 1.4715), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:26,679 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.1007 (max= 1.4715), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:26,679 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.1007 (max= 1.4715), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:26,679 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.1007 (max= 1.4715), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:26,679 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.1007 (max= 1.4715), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:58,550 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.1132 (max= 1.5276), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:58,550 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.1132 (max= 1.5276), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:58,550 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.1132 (max= 1.5276), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:58,550 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.1132 (max= 1.5276), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:58,550 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.1132 (max= 1.5276), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:58,550 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.1132 (max= 1.5276), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:58,550 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.1132 (max= 1.5276), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:56:58,550 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.1132 (max= 1.5276), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:57:30,387 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.1108 (max= 1.6534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:57:30,387 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.1108 (max= 1.6534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:57:30,387 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.1108 (max= 1.6534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:57:30,387 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.1108 (max= 1.6534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:57:30,387 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.1108 (max= 1.6534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:57:30,387 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.1108 (max= 1.6534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:57:30,387 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.1108 (max= 1.6534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:57:30,387 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.1108 (max= 1.6534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:02,364 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.1052 (max= 1.5392), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:02,364 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.1052 (max= 1.5392), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:02,364 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.1052 (max= 1.5392), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:02,364 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.1052 (max= 1.5392), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:02,364 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.1052 (max= 1.5392), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:02,364 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.1052 (max= 1.5392), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:02,364 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.1052 (max= 1.5392), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:02,364 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.1052 (max= 1.5392), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:34,295 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.1278 (max= 1.5244), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:34,296 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.1278 (max= 1.5244), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:34,296 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.1278 (max= 1.5244), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:34,296 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.1278 (max= 1.5244), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:34,296 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.1278 (max= 1.5244), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:34,296 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.1278 (max= 1.5244), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:34,296 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.1278 (max= 1.5244), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:58:34,296 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.1278 (max= 1.5244), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:06,165 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.1200 (max= 1.6378), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:06,165 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.1200 (max= 1.6378), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:06,165 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.1200 (max= 1.6378), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:06,165 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.1200 (max= 1.6378), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:06,165 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.1200 (max= 1.6378), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:06,165 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.1200 (max= 1.6378), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:06,166 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.1200 (max= 1.6378), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:06,166 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.1200 (max= 1.6378), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:38,105 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.1132 (max= 1.6465), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:38,106 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.1132 (max= 1.6465), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:38,106 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.1132 (max= 1.6465), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:38,106 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.1132 (max= 1.6465), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:38,106 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.1132 (max= 1.6465), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:38,106 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.1132 (max= 1.6465), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:38,106 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.1132 (max= 1.6465), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 11:59:38,106 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.1132 (max= 1.6465), tps=20521, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:10,048 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.1080 (max= 1.5303), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:10,048 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.1080 (max= 1.5303), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:10,048 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.1080 (max= 1.5303), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:10,048 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.1080 (max= 1.5303), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:10,048 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.1080 (max= 1.5303), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:10,048 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.1080 (max= 1.5303), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:10,048 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.1080 (max= 1.5303), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:10,049 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.1080 (max= 1.5303), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:41,907 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.1150 (max= 1.5409), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:41,907 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.1150 (max= 1.5409), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:41,907 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.1150 (max= 1.5409), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:41,908 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.1150 (max= 1.5409), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:41,908 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.1150 (max= 1.5409), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:41,908 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.1150 (max= 1.5409), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:41,908 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.1150 (max= 1.5409), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:00:41,908 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.1150 (max= 1.5409), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:13,696 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.1175 (max= 1.5571), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:13,696 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.1175 (max= 1.5571), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:13,696 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.1175 (max= 1.5571), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:13,696 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.1175 (max= 1.5571), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:13,696 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.1175 (max= 1.5571), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:13,697 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.1175 (max= 1.5571), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:13,697 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.1175 (max= 1.5571), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:13,697 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.1175 (max= 1.5571), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:45,528 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.1396 (max= 1.6170), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:45,528 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.1396 (max= 1.6170), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:45,529 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.1396 (max= 1.6170), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:45,529 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.1396 (max= 1.6170), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:45,529 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.1396 (max= 1.6170), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:45,529 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.1396 (max= 1.6170), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:45,529 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.1396 (max= 1.6170), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:01:45,529 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.1396 (max= 1.6170), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:17,407 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.1351 (max= 1.6061), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:17,407 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.1351 (max= 1.6061), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:17,407 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.1351 (max= 1.6061), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:17,407 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.1351 (max= 1.6061), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:17,407 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.1351 (max= 1.6061), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:17,407 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.1351 (max= 1.6061), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:17,407 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.1351 (max= 1.6061), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:17,407 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.1351 (max= 1.6061), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:49,285 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.1100 (max= 1.5290), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:49,286 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.1100 (max= 1.5290), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:49,286 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.1100 (max= 1.5290), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:49,286 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.1100 (max= 1.5290), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:49,286 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.1100 (max= 1.5290), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:49,286 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.1100 (max= 1.5290), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:49,286 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.1100 (max= 1.5290), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:02:49,286 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.1100 (max= 1.5290), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:21,128 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.0997 (max= 1.5118), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:21,128 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.0997 (max= 1.5118), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:21,128 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.0997 (max= 1.5118), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:21,128 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.0997 (max= 1.5118), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:21,128 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.0997 (max= 1.5118), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:21,128 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.0997 (max= 1.5118), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:21,128 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.0997 (max= 1.5118), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:21,128 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.0997 (max= 1.5118), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:52,977 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.0998 (max= 1.5307), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:52,977 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.0998 (max= 1.5307), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:52,977 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.0998 (max= 1.5307), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:52,977 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.0998 (max= 1.5307), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:52,977 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.0998 (max= 1.5307), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:52,977 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.0998 (max= 1.5307), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:52,977 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.0998 (max= 1.5307), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:03:52,977 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.0998 (max= 1.5307), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:24,791 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.1034 (max= 1.5161), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:24,791 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.1034 (max= 1.5161), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:24,791 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.1034 (max= 1.5161), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:24,791 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.1034 (max= 1.5161), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:24,791 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.1034 (max= 1.5161), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:24,791 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.1034 (max= 1.5161), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:24,791 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.1034 (max= 1.5161), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:24,791 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.1034 (max= 1.5161), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:56,661 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.1194 (max= 1.6789), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:56,661 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.1194 (max= 1.6789), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:56,661 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.1194 (max= 1.6789), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:56,661 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.1194 (max= 1.6789), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:56,661 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.1194 (max= 1.6789), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:56,661 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.1194 (max= 1.6789), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:56,661 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.1194 (max= 1.6789), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:04:56,661 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.1194 (max= 1.6789), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:05:28,491 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.1229 (max= 1.5362), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:05:28,491 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.1229 (max= 1.5362), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:05:28,491 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.1229 (max= 1.5362), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:05:28,491 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.1229 (max= 1.5362), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:05:28,491 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.1229 (max= 1.5362), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:05:28,491 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.1229 (max= 1.5362), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:05:28,491 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.1229 (max= 1.5362), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:05:28,492 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.1229 (max= 1.5362), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:00,379 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.1080 (max= 1.5453), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:00,379 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.1080 (max= 1.5453), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:00,379 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.1080 (max= 1.5453), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:00,379 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.1080 (max= 1.5453), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:00,380 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.1080 (max= 1.5453), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:00,380 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.1080 (max= 1.5453), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:00,380 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.1080 (max= 1.5453), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:00,380 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.1080 (max= 1.5453), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:32,293 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.1052 (max= 1.6273), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:32,293 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.1052 (max= 1.6273), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:32,293 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.1052 (max= 1.6273), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:32,293 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.1052 (max= 1.6273), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:32,293 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.1052 (max= 1.6273), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:32,293 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.1052 (max= 1.6273), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:32,293 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.1052 (max= 1.6273), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:06:32,293 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.1052 (max= 1.6273), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:04,157 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.0867 (max= 1.4869), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:04,157 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.0867 (max= 1.4869), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:04,157 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.0867 (max= 1.4869), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:04,157 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.0867 (max= 1.4869), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:04,158 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.0867 (max= 1.4869), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:04,158 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.0867 (max= 1.4869), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:04,158 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.0867 (max= 1.4869), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:04,158 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.0867 (max= 1.4869), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:36,025 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.0910 (max= 1.6265), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:36,025 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.0910 (max= 1.6265), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:36,025 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.0910 (max= 1.6265), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:36,025 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.0910 (max= 1.6265), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:36,025 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.0910 (max= 1.6265), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:36,025 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.0910 (max= 1.6265), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:36,025 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.0910 (max= 1.6265), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:07:36,025 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.0910 (max= 1.6265), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:07,936 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.1137 (max= 1.5634), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:07,936 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.1137 (max= 1.5634), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:07,936 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.1137 (max= 1.5634), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:07,936 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.1137 (max= 1.5634), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:07,936 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.1137 (max= 1.5634), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:07,936 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.1137 (max= 1.5634), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:07,936 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.1137 (max= 1.5634), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:07,936 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.1137 (max= 1.5634), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:39,839 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.1160 (max= 1.5210), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:39,839 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.1160 (max= 1.5210), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:39,839 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.1160 (max= 1.5210), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:39,839 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.1160 (max= 1.5210), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:39,839 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.1160 (max= 1.5210), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:39,839 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.1160 (max= 1.5210), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:39,839 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.1160 (max= 1.5210), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:08:39,839 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.1160 (max= 1.5210), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:11,736 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.0930 (max= 1.5631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:11,736 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.0930 (max= 1.5631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:11,736 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.0930 (max= 1.5631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:11,736 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.0930 (max= 1.5631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:11,736 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.0930 (max= 1.5631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:11,736 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.0930 (max= 1.5631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:11,736 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.0930 (max= 1.5631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:11,736 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.0930 (max= 1.5631), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:43,738 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.0923 (max= 1.5122), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:43,738 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.0923 (max= 1.5122), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:43,738 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.0923 (max= 1.5122), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:43,738 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.0923 (max= 1.5122), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:43,738 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.0923 (max= 1.5122), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:43,738 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.0923 (max= 1.5122), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:43,738 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.0923 (max= 1.5122), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:09:43,738 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.0923 (max= 1.5122), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:15,572 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.1059 (max= 1.5714), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:15,572 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.1059 (max= 1.5714), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:15,572 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.1059 (max= 1.5714), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:15,573 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.1059 (max= 1.5714), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:15,573 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.1059 (max= 1.5714), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:15,573 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.1059 (max= 1.5714), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:15,573 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.1059 (max= 1.5714), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:15,573 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.1059 (max= 1.5714), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:47,407 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.0855 (max= 1.4866), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:47,408 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.0855 (max= 1.4866), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:47,408 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.0855 (max= 1.4866), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:47,408 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.0855 (max= 1.4866), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:47,408 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.0855 (max= 1.4866), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:47,408 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.0855 (max= 1.4866), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:47,408 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.0855 (max= 1.4866), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:10:47,408 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.0855 (max= 1.4866), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-2000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-2000! Save time: 4.497503280639648 +2025-10-25 12:11:19,291 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.1011 (max= 1.6464), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:11:19,291 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-25 12:11:19,291 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 12:11:19,291 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.1011 (max= 1.6464), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:11:19,291 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.1011 (max= 1.6464), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:11:19,291 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-25 12:11:19,291 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.1011 (max= 1.6464), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:11:19,291 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 12:11:19,291 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-25 12:11:19,291 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 12:11:19,291 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-25 12:11:19,291 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 12:11:19,291 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.1011 (max= 1.6464), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:11:19,292 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.1011 (max= 1.6464), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:11:19,292 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.1011 (max= 1.6464), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:11:19,292 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-25 12:11:19,292 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 12:11:19,292 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-25 12:11:19,292 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-25 12:11:19,292 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 12:11:19,292 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 12:11:19,292 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.1011 (max= 1.6464), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:11:19,292 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-25 12:11:19,292 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 12:11:33,050 - root - INFO - Finished saving the checkpoint in 13.76 seconds +2025-10-25 12:11:33,057 - root - INFO - Finished saving the checkpoint in 13.77 seconds +2025-10-25 12:11:33,057 - root - INFO - Finished saving the checkpoint in 13.77 seconds +2025-10-25 12:11:33,057 - root - INFO - Finished saving the checkpoint in 13.77 seconds +2025-10-25 12:11:33,057 - root - INFO - Finished saving the checkpoint in 13.77 seconds +2025-10-25 12:11:33,057 - root - INFO - Finished saving the checkpoint in 13.77 seconds +2025-10-25 12:11:33,058 - root - INFO - Finished saving the checkpoint in 13.77 seconds +2025-10-25 12:11:33,058 - root - INFO - Finished saving the checkpoint in 13.77 seconds +2025-10-25 12:12:04,883 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.0810 (max= 1.6378), tps=14376, mfu=29.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:04,883 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.0810 (max= 1.6378), tps=14376, mfu=29.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:04,883 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.0810 (max= 1.6378), tps=14376, mfu=29.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:04,883 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.0810 (max= 1.6378), tps=14376, mfu=29.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:04,883 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.0810 (max= 1.6378), tps=14376, mfu=29.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:04,883 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.0810 (max= 1.6378), tps=14376, mfu=29.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:04,883 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.0810 (max= 1.6378), tps=14376, mfu=29.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:04,883 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.0810 (max= 1.6378), tps=14376, mfu=29.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:36,721 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.0748 (max= 1.5305), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:36,721 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.0748 (max= 1.5305), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:36,721 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.0748 (max= 1.5305), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:36,721 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.0748 (max= 1.5305), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:36,721 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.0748 (max= 1.5305), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:36,721 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.0748 (max= 1.5305), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:36,721 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.0748 (max= 1.5305), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:12:36,722 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.0748 (max= 1.5305), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:08,575 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.0960 (max= 1.6201), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:08,575 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.0960 (max= 1.6201), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:08,576 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.0960 (max= 1.6201), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:08,576 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.0960 (max= 1.6201), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:08,576 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.0960 (max= 1.6201), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:08,576 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.0960 (max= 1.6201), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:08,576 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.0960 (max= 1.6201), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:08,576 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.0960 (max= 1.6201), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:40,587 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.1013 (max= 1.5362), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:40,588 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.1013 (max= 1.5362), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:40,588 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.1013 (max= 1.5362), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:40,588 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.1013 (max= 1.5362), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:40,588 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.1013 (max= 1.5362), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:40,588 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.1013 (max= 1.5362), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:40,588 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.1013 (max= 1.5362), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:13:40,588 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.1013 (max= 1.5362), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:12,491 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.1067 (max= 1.6573), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:12,491 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.1067 (max= 1.6573), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:12,491 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.1067 (max= 1.6573), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:12,491 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.1067 (max= 1.6573), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:12,491 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.1067 (max= 1.6573), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:12,491 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.1067 (max= 1.6573), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:12,491 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.1067 (max= 1.6573), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:12,491 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.1067 (max= 1.6573), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:44,417 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.1056 (max= 1.4724), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:44,417 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.1056 (max= 1.4724), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:44,418 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.1056 (max= 1.4724), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:44,418 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.1056 (max= 1.4724), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:44,418 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.1056 (max= 1.4724), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:44,418 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.1056 (max= 1.4724), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:44,418 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.1056 (max= 1.4724), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:14:44,418 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.1056 (max= 1.4724), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:16,260 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.0960 (max= 1.6067), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:16,260 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.0960 (max= 1.6067), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:16,260 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.0960 (max= 1.6067), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:16,260 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.0960 (max= 1.6067), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:16,260 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.0960 (max= 1.6067), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:16,260 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.0960 (max= 1.6067), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:16,260 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.0960 (max= 1.6067), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:16,260 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.0960 (max= 1.6067), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:48,124 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.1071 (max= 1.5745), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:48,124 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.1071 (max= 1.5745), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:48,124 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.1071 (max= 1.5745), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:48,124 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.1071 (max= 1.5745), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:48,124 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.1071 (max= 1.5745), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:48,124 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.1071 (max= 1.5745), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:48,124 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.1071 (max= 1.5745), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:15:48,124 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.1071 (max= 1.5745), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:19,979 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.0915 (max= 1.5617), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:19,979 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.0915 (max= 1.5617), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:19,979 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.0915 (max= 1.5617), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:19,979 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.0915 (max= 1.5617), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:19,979 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.0915 (max= 1.5617), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:19,979 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.0915 (max= 1.5617), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:19,979 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.0915 (max= 1.5617), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:19,979 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.0915 (max= 1.5617), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:51,805 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.0911 (max= 1.4638), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:51,805 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.0911 (max= 1.4638), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:51,805 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.0911 (max= 1.4638), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:51,805 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.0911 (max= 1.4638), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:51,805 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.0911 (max= 1.4638), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:51,805 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.0911 (max= 1.4638), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:51,805 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.0911 (max= 1.4638), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:16:51,806 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.0911 (max= 1.4638), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:23,685 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.1079 (max= 1.4211), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:23,685 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.1079 (max= 1.4211), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:23,685 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.1079 (max= 1.4211), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:23,685 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.1079 (max= 1.4211), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:23,685 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.1079 (max= 1.4211), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:23,685 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.1079 (max= 1.4211), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:23,685 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.1079 (max= 1.4211), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:23,685 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.1079 (max= 1.4211), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:55,500 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.1168 (max= 1.5861), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:55,500 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.1168 (max= 1.5861), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:55,500 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.1168 (max= 1.5861), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:55,500 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.1168 (max= 1.5861), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:55,500 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.1168 (max= 1.5861), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:55,501 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.1168 (max= 1.5861), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:55,500 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.1168 (max= 1.5861), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:17:55,501 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.1168 (max= 1.5861), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:27,388 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.0859 (max= 1.4745), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:27,388 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.0859 (max= 1.4745), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:27,388 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.0859 (max= 1.4745), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:27,388 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.0859 (max= 1.4745), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:27,388 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.0859 (max= 1.4745), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:27,388 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.0859 (max= 1.4745), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:27,388 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.0859 (max= 1.4745), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:27,388 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.0859 (max= 1.4745), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:50,221 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:7465904 +2025-10-25 12:18:59,263 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.1014 (max= 1.6241), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:59,263 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.1014 (max= 1.6241), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:59,263 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.1014 (max= 1.6241), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:59,263 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.1014 (max= 1.6241), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:59,263 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.1014 (max= 1.6241), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:59,263 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.1014 (max= 1.6241), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:59,263 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.1014 (max= 1.6241), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:18:59,263 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.1014 (max= 1.6241), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:19:20,668 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:4976544 +2025-10-25 12:19:31,068 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.1095 (max= 1.5435), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:19:31,069 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.1095 (max= 1.5435), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:19:31,069 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.1095 (max= 1.5435), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:19:31,069 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.1095 (max= 1.5435), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:19:31,069 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.1095 (max= 1.5435), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:19:31,069 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.1095 (max= 1.5435), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:19:31,069 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.1095 (max= 1.5435), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:19:31,069 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.1095 (max= 1.5435), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:02,930 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.0882 (max= 1.5851), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:02,930 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.0882 (max= 1.5851), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:02,930 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.0882 (max= 1.5851), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:02,930 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.0882 (max= 1.5851), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:02,930 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.0882 (max= 1.5851), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:02,930 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.0882 (max= 1.5851), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:02,930 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.0882 (max= 1.5851), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:02,930 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.0882 (max= 1.5851), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:34,794 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.0831 (max= 1.6588), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:34,794 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.0831 (max= 1.6588), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:34,794 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.0831 (max= 1.6588), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:34,794 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.0831 (max= 1.6588), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:34,794 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.0831 (max= 1.6588), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:34,794 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.0831 (max= 1.6588), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:34,794 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.0831 (max= 1.6588), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:20:34,794 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.0831 (max= 1.6588), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:06,588 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.1072 (max= 1.5372), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:06,588 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.1072 (max= 1.5372), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:06,588 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.1072 (max= 1.5372), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:06,588 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.1072 (max= 1.5372), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:06,588 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.1072 (max= 1.5372), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:06,588 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.1072 (max= 1.5372), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:06,588 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.1072 (max= 1.5372), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:06,588 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.1072 (max= 1.5372), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:38,459 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.1060 (max= 1.5008), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:38,459 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.1060 (max= 1.5008), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:38,459 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.1060 (max= 1.5008), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:38,459 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.1060 (max= 1.5008), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:38,459 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.1060 (max= 1.5008), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:38,459 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.1060 (max= 1.5008), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:38,459 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.1060 (max= 1.5008), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:21:38,459 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.1060 (max= 1.5008), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:10,307 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.1121 (max= 1.6342), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:10,307 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.1121 (max= 1.6342), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:10,307 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.1121 (max= 1.6342), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:10,307 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.1121 (max= 1.6342), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:10,307 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.1121 (max= 1.6342), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:10,307 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.1121 (max= 1.6342), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:10,307 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.1121 (max= 1.6342), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:10,307 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.1121 (max= 1.6342), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:42,149 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.0761 (max= 1.7345), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:42,150 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.0761 (max= 1.7345), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:42,150 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.0761 (max= 1.7345), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:42,150 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.0761 (max= 1.7345), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:42,150 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.0761 (max= 1.7345), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:42,150 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.0761 (max= 1.7345), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:42,150 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.0761 (max= 1.7345), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:22:42,150 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.0761 (max= 1.7345), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:14,048 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.1229 (max= 1.5434), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:14,049 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.1229 (max= 1.5434), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:14,049 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.1229 (max= 1.5434), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:14,049 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.1229 (max= 1.5434), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:14,049 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.1229 (max= 1.5434), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:14,049 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.1229 (max= 1.5434), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:14,049 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.1229 (max= 1.5434), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:14,049 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.1229 (max= 1.5434), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:22,765 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:3728356 +2025-10-25 12:23:45,969 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.1094 (max= 1.6302), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:45,969 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.1094 (max= 1.6302), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:45,969 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.1094 (max= 1.6302), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:45,969 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.1094 (max= 1.6302), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:45,969 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.1094 (max= 1.6302), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:45,969 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.1094 (max= 1.6302), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:45,969 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.1094 (max= 1.6302), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:23:45,969 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.1094 (max= 1.6302), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:17,849 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.0975 (max= 1.5891), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:17,849 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.0975 (max= 1.5891), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:17,849 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.0975 (max= 1.5891), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:17,849 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.0975 (max= 1.5891), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:17,849 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.0975 (max= 1.5891), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:17,849 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.0975 (max= 1.5891), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:17,849 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.0975 (max= 1.5891), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:17,850 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.0975 (max= 1.5891), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:49,740 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.0820 (max= 1.4344), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:49,740 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.0820 (max= 1.4344), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:49,740 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.0820 (max= 1.4344), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:49,740 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.0820 (max= 1.4344), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:49,740 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.0820 (max= 1.4344), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:49,740 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.0820 (max= 1.4344), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:49,740 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.0820 (max= 1.4344), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:24:49,740 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.0820 (max= 1.4344), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:21,544 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.0866 (max= 1.6170), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:21,544 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.0866 (max= 1.6170), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:21,544 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.0866 (max= 1.6170), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:21,544 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.0866 (max= 1.6170), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:21,544 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.0866 (max= 1.6170), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:21,544 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.0866 (max= 1.6170), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:21,544 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.0866 (max= 1.6170), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:21,544 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.0866 (max= 1.6170), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:53,453 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.0675 (max= 1.6187), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:53,453 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.0675 (max= 1.6187), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:53,453 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.0675 (max= 1.6187), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:53,453 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.0675 (max= 1.6187), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:53,453 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.0675 (max= 1.6187), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:53,453 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.0675 (max= 1.6187), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:53,453 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.0675 (max= 1.6187), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:25:53,453 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.0675 (max= 1.6187), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:25,328 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.1079 (max= 1.5118), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:25,328 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.1079 (max= 1.5118), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:25,328 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.1079 (max= 1.5118), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:25,328 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.1079 (max= 1.5118), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:25,328 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.1079 (max= 1.5118), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:25,328 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.1079 (max= 1.5118), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:25,328 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.1079 (max= 1.5118), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:25,328 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.1079 (max= 1.5118), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:57,248 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.0892 (max= 1.4870), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:57,248 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.0892 (max= 1.4870), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:57,248 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.0892 (max= 1.4870), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:57,248 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.0892 (max= 1.4870), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:57,248 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.0892 (max= 1.4870), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:57,248 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.0892 (max= 1.4870), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:57,248 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.0892 (max= 1.4870), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:26:57,248 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.0892 (max= 1.4870), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:27:21,884 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:1809275 +2025-10-25 12:27:29,132 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.0878 (max= 1.7563), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:27:29,132 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.0878 (max= 1.7563), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:27:29,132 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.0878 (max= 1.7563), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:27:29,132 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.0878 (max= 1.7563), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:27:29,132 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.0878 (max= 1.7563), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:27:29,132 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.0878 (max= 1.7563), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:27:29,132 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.0878 (max= 1.7563), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:27:29,132 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.0878 (max= 1.7563), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:01,011 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.0925 (max= 1.5556), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:01,011 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.0925 (max= 1.5556), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:01,011 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.0925 (max= 1.5556), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:01,011 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.0925 (max= 1.5556), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:01,011 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.0925 (max= 1.5556), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:01,011 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.0925 (max= 1.5556), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:01,011 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.0925 (max= 1.5556), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:01,011 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.0925 (max= 1.5556), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:32,921 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.1038 (max= 1.4919), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:32,921 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.1038 (max= 1.4919), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:32,921 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.1038 (max= 1.4919), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:32,921 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.1038 (max= 1.4919), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:32,921 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.1038 (max= 1.4919), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:32,921 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.1038 (max= 1.4919), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:32,921 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.1038 (max= 1.4919), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:28:32,921 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.1038 (max= 1.4919), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:04,827 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.1133 (max= 1.6231), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:04,827 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.1133 (max= 1.6231), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:04,827 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.1133 (max= 1.6231), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:04,827 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.1133 (max= 1.6231), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:04,827 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.1133 (max= 1.6231), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:04,827 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.1133 (max= 1.6231), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:04,827 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.1133 (max= 1.6231), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:04,827 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.1133 (max= 1.6231), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:36,600 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.1092 (max= 1.5418), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:36,600 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.1092 (max= 1.5418), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:36,600 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.1092 (max= 1.5418), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:36,600 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.1092 (max= 1.5418), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:36,600 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.1092 (max= 1.5418), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:36,600 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.1092 (max= 1.5418), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:36,600 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.1092 (max= 1.5418), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:29:36,600 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.1092 (max= 1.5418), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:08,361 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.1144 (max= 1.6269), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:08,361 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.1144 (max= 1.6269), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:08,361 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.1144 (max= 1.6269), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:08,361 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.1144 (max= 1.6269), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:08,361 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.1144 (max= 1.6269), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:08,361 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.1144 (max= 1.6269), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:08,361 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.1144 (max= 1.6269), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:08,361 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.1144 (max= 1.6269), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:40,246 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.1184 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:40,246 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.1184 (max= 1.6319), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:40,246 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.1184 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:40,246 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.1184 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:40,246 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.1184 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:40,246 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.1184 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:40,246 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.1184 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:30:40,246 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.1184 (max= 1.6319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:12,057 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.0995 (max= 1.5927), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:12,057 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.0995 (max= 1.5927), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:12,057 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.0995 (max= 1.5927), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:12,057 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.0995 (max= 1.5927), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:12,057 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.0995 (max= 1.5927), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:12,057 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.0995 (max= 1.5927), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:12,057 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.0995 (max= 1.5927), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:12,057 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.0995 (max= 1.5927), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:43,921 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.1248 (max= 1.6205), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:43,921 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.1248 (max= 1.6205), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:43,921 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.1248 (max= 1.6205), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:43,921 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.1248 (max= 1.6205), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:43,921 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.1248 (max= 1.6205), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:43,921 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.1248 (max= 1.6205), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:43,921 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.1248 (max= 1.6205), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:31:43,921 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.1248 (max= 1.6205), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:15,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.1044 (max= 1.6033), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:15,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.1044 (max= 1.6033), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:15,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.1044 (max= 1.6033), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:15,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.1044 (max= 1.6033), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:15,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.1044 (max= 1.6033), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:15,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.1044 (max= 1.6033), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:15,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.1044 (max= 1.6033), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:15,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.1044 (max= 1.6033), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:47,584 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.1089 (max= 1.7929), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:47,584 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.1089 (max= 1.7929), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:47,584 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.1089 (max= 1.7929), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:47,584 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.1089 (max= 1.7929), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:47,584 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.1089 (max= 1.7929), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:47,584 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.1089 (max= 1.7929), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:47,584 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.1089 (max= 1.7929), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:32:47,584 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.1089 (max= 1.7929), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:19,448 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.0950 (max= 1.6448), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:19,448 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.0950 (max= 1.6448), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:19,448 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.0950 (max= 1.6448), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:19,448 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.0950 (max= 1.6448), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:19,448 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.0950 (max= 1.6448), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:19,448 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.0950 (max= 1.6448), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:19,448 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.0950 (max= 1.6448), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:19,448 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.0950 (max= 1.6448), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:51,257 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.1303 (max= 1.5139), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:51,257 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.1303 (max= 1.5139), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:51,257 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.1303 (max= 1.5139), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:51,257 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.1303 (max= 1.5139), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:51,257 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.1303 (max= 1.5139), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:51,257 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.1303 (max= 1.5139), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:51,257 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.1303 (max= 1.5139), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:33:51,257 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.1303 (max= 1.5139), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:23,125 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.1088 (max= 1.6990), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:23,125 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.1088 (max= 1.6990), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:23,125 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.1088 (max= 1.6990), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:23,125 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.1088 (max= 1.6990), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:23,125 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.1088 (max= 1.6990), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:23,125 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.1088 (max= 1.6990), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:23,125 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.1088 (max= 1.6990), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:23,125 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.1088 (max= 1.6990), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:55,006 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.1035 (max= 1.4940), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:55,006 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.1035 (max= 1.4940), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:55,006 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.1035 (max= 1.4940), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:55,006 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.1035 (max= 1.4940), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:55,006 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.1035 (max= 1.4940), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:55,006 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.1035 (max= 1.4940), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:55,006 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.1035 (max= 1.4940), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:34:55,007 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.1035 (max= 1.4940), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:26,840 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.0843 (max= 1.4639), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:26,840 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.0843 (max= 1.4639), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:26,840 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.0843 (max= 1.4639), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:26,841 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.0843 (max= 1.4639), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:26,841 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.0843 (max= 1.4639), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:26,841 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.0843 (max= 1.4639), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:26,841 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.0843 (max= 1.4639), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:26,841 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.0843 (max= 1.4639), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:58,695 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.1047 (max= 1.6086), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:58,696 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.1047 (max= 1.6086), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:58,696 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.1047 (max= 1.6086), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:58,696 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.1047 (max= 1.6086), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:58,696 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.1047 (max= 1.6086), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:58,696 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.1047 (max= 1.6086), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:58,696 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.1047 (max= 1.6086), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:35:58,696 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.1047 (max= 1.6086), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:36:30,581 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.1011 (max= 1.4956), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:36:30,581 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.1011 (max= 1.4956), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:36:30,582 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.1011 (max= 1.4956), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:36:30,582 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.1011 (max= 1.4956), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:36:30,582 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.1011 (max= 1.4956), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:36:30,582 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.1011 (max= 1.4956), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:36:30,582 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.1011 (max= 1.4956), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:36:30,582 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.1011 (max= 1.4956), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:02,524 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.0918 (max= 1.4828), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:02,524 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.0918 (max= 1.4828), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:02,524 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.0918 (max= 1.4828), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:02,524 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.0918 (max= 1.4828), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:02,524 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.0918 (max= 1.4828), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:02,524 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.0918 (max= 1.4828), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:02,524 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.0918 (max= 1.4828), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:02,525 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.0918 (max= 1.4828), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:34,389 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.1068 (max= 1.4798), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:34,389 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.1068 (max= 1.4798), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:34,389 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.1068 (max= 1.4798), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:34,389 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.1068 (max= 1.4798), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:34,389 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.1068 (max= 1.4798), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:34,389 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.1068 (max= 1.4798), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:34,389 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.1068 (max= 1.4798), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:37:34,389 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.1068 (max= 1.4798), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:06,265 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.1166 (max= 1.5456), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:06,265 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.1166 (max= 1.5456), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:06,265 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.1166 (max= 1.5456), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:06,265 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.1166 (max= 1.5456), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:06,265 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.1166 (max= 1.5456), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:06,265 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.1166 (max= 1.5456), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:06,265 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.1166 (max= 1.5456), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:06,265 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.1166 (max= 1.5456), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:38,066 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.1046 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:38,066 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.1046 (max= 1.5470), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:38,066 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.1046 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:38,066 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.1046 (max= 1.5470), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:38,066 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.1046 (max= 1.5470), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:38,066 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.1046 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:38,066 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.1046 (max= 1.5470), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:38:38,066 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.1046 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:09,884 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.1092 (max= 1.5097), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:09,884 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.1092 (max= 1.5097), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:09,884 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.1092 (max= 1.5097), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:09,884 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.1092 (max= 1.5097), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:09,884 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.1092 (max= 1.5097), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:09,885 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.1092 (max= 1.5097), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:09,885 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.1092 (max= 1.5097), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:09,885 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.1092 (max= 1.5097), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:41,760 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.1224 (max= 1.5546), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:41,760 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.1224 (max= 1.5546), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:41,760 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.1224 (max= 1.5546), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:41,760 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.1224 (max= 1.5546), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:41,760 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.1224 (max= 1.5546), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:41,760 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.1224 (max= 1.5546), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:41,760 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.1224 (max= 1.5546), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:39:41,760 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.1224 (max= 1.5546), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:13,600 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.1081 (max= 1.7631), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:13,600 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.1081 (max= 1.7631), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:13,600 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.1081 (max= 1.7631), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:13,600 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.1081 (max= 1.7631), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:13,600 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.1081 (max= 1.7631), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:13,600 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.1081 (max= 1.7631), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:13,601 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.1081 (max= 1.7631), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:13,601 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.1081 (max= 1.7631), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:45,464 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.1048 (max= 1.7563), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:45,464 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.1048 (max= 1.7563), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:45,464 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.1048 (max= 1.7563), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:45,464 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.1048 (max= 1.7563), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:45,464 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.1048 (max= 1.7563), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:45,464 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.1048 (max= 1.7563), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:45,464 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.1048 (max= 1.7563), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:40:45,465 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.1048 (max= 1.7563), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:17,319 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.1241 (max= 1.6069), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:17,319 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.1241 (max= 1.6069), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:17,319 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.1241 (max= 1.6069), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:17,319 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.1241 (max= 1.6069), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:17,319 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.1241 (max= 1.6069), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:17,319 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.1241 (max= 1.6069), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:17,319 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.1241 (max= 1.6069), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:17,320 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.1241 (max= 1.6069), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:49,280 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.1083 (max= 1.5831), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:49,280 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.1083 (max= 1.5831), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:49,280 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.1083 (max= 1.5831), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:49,280 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.1083 (max= 1.5831), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:49,280 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.1083 (max= 1.5831), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:49,280 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.1083 (max= 1.5831), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:49,280 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.1083 (max= 1.5831), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:41:49,281 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.1083 (max= 1.5831), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:21,084 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.1178 (max= 1.7840), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:21,084 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.1178 (max= 1.7840), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:21,084 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.1178 (max= 1.7840), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:21,084 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.1178 (max= 1.7840), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:21,084 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.1178 (max= 1.7840), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:21,084 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.1178 (max= 1.7840), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:21,084 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.1178 (max= 1.7840), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:21,084 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.1178 (max= 1.7840), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:52,951 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.0935 (max= 1.6181), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:52,951 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.0935 (max= 1.6181), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:52,951 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.0935 (max= 1.6181), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:52,951 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.0935 (max= 1.6181), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:52,951 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.0935 (max= 1.6181), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:52,951 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.0935 (max= 1.6181), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:52,951 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.0935 (max= 1.6181), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:42:52,951 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.0935 (max= 1.6181), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:24,915 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.1183 (max= 1.5288), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:24,915 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.1183 (max= 1.5288), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:24,915 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.1183 (max= 1.5288), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:24,915 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.1183 (max= 1.5288), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:24,915 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.1183 (max= 1.5288), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:24,915 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.1183 (max= 1.5288), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:24,915 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.1183 (max= 1.5288), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:24,915 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.1183 (max= 1.5288), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:56,759 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.1108 (max= 1.5320), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:56,760 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.1108 (max= 1.5320), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:56,760 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.1108 (max= 1.5320), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:56,760 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.1108 (max= 1.5320), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:56,760 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.1108 (max= 1.5320), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:56,760 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.1108 (max= 1.5320), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:56,760 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.1108 (max= 1.5320), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:43:56,760 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.1108 (max= 1.5320), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:44:28,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.0907 (max= 1.5128), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:44:28,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.0907 (max= 1.5128), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:44:28,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.0907 (max= 1.5128), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:44:28,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.0907 (max= 1.5128), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:44:28,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.0907 (max= 1.5128), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:44:28,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.0907 (max= 1.5128), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:44:28,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.0907 (max= 1.5128), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:44:28,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.0907 (max= 1.5128), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:00,416 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.0942 (max= 1.5901), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:00,416 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.0942 (max= 1.5901), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:00,416 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.0942 (max= 1.5901), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:00,416 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.0942 (max= 1.5901), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:00,416 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.0942 (max= 1.5901), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:00,416 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.0942 (max= 1.5901), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:00,416 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.0942 (max= 1.5901), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:00,416 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.0942 (max= 1.5901), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:32,269 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.1047 (max= 1.5746), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:32,269 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.1047 (max= 1.5746), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:32,269 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.1047 (max= 1.5746), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:32,269 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.1047 (max= 1.5746), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:32,270 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.1047 (max= 1.5746), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:32,270 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.1047 (max= 1.5746), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:32,270 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.1047 (max= 1.5746), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:45:32,270 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.1047 (max= 1.5746), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:04,090 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.1149 (max= 1.5003), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:04,091 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.1149 (max= 1.5003), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:04,091 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.1149 (max= 1.5003), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:04,091 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.1149 (max= 1.5003), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:04,091 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.1149 (max= 1.5003), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:04,091 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.1149 (max= 1.5003), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:04,091 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.1149 (max= 1.5003), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:04,091 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.1149 (max= 1.5003), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:36,014 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.1005 (max= 1.4691), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:36,014 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.1005 (max= 1.4691), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:36,014 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.1005 (max= 1.4691), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:36,014 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.1005 (max= 1.4691), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:36,014 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.1005 (max= 1.4691), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:36,014 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.1005 (max= 1.4691), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:36,014 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.1005 (max= 1.4691), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:46:36,014 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.1005 (max= 1.4691), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:07,908 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.0815 (max= 1.5748), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:07,908 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.0815 (max= 1.5748), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:07,908 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.0815 (max= 1.5748), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:07,908 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.0815 (max= 1.5748), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:07,908 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.0815 (max= 1.5748), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:07,908 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.0815 (max= 1.5748), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:07,908 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.0815 (max= 1.5748), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:07,908 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.0815 (max= 1.5748), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:39,768 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.0867 (max= 1.4647), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:39,768 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.0867 (max= 1.4647), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:39,768 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.0867 (max= 1.4647), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:39,768 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.0867 (max= 1.4647), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:39,769 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.0867 (max= 1.4647), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:39,769 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.0867 (max= 1.4647), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:39,769 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.0867 (max= 1.4647), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:47:39,769 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.0867 (max= 1.4647), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:11,672 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.1097 (max= 1.5458), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:11,672 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.1097 (max= 1.5458), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:11,672 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.1097 (max= 1.5458), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:11,672 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.1097 (max= 1.5458), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:11,672 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.1097 (max= 1.5458), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:11,672 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.1097 (max= 1.5458), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:11,672 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.1097 (max= 1.5458), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:11,672 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.1097 (max= 1.5458), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:43,500 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.1065 (max= 2.0225), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:43,500 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.1065 (max= 2.0225), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:43,500 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.1065 (max= 2.0225), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:43,500 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.1065 (max= 2.0225), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:43,500 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.1065 (max= 2.0225), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:43,500 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.1065 (max= 2.0225), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:43,500 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.1065 (max= 2.0225), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:48:43,500 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.1065 (max= 2.0225), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:15,364 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.1129 (max= 1.5072), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:15,364 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.1129 (max= 1.5072), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:15,364 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.1129 (max= 1.5072), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:15,364 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.1129 (max= 1.5072), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:15,364 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.1129 (max= 1.5072), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:15,364 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.1129 (max= 1.5072), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:15,364 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.1129 (max= 1.5072), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:15,365 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.1129 (max= 1.5072), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:47,297 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.1325 (max= 1.5791), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:47,298 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.1325 (max= 1.5791), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:47,298 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.1325 (max= 1.5791), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:47,298 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.1325 (max= 1.5791), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:47,298 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.1325 (max= 1.5791), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:47,298 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.1325 (max= 1.5791), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:47,298 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.1325 (max= 1.5791), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:49:47,298 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.1325 (max= 1.5791), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:19,085 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.1391 (max= 1.5637), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:19,085 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.1391 (max= 1.5637), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:19,085 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.1391 (max= 1.5637), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:19,085 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.1391 (max= 1.5637), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:19,086 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.1391 (max= 1.5637), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:19,086 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.1391 (max= 1.5637), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:19,086 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.1391 (max= 1.5637), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:19,086 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.1391 (max= 1.5637), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:50,898 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.1009 (max= 1.4806), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:50,898 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.1009 (max= 1.4806), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:50,898 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.1009 (max= 1.4806), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:50,898 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.1009 (max= 1.4806), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:50,898 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.1009 (max= 1.4806), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:50,898 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.1009 (max= 1.4806), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:50,898 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.1009 (max= 1.4806), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:50:50,898 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.1009 (max= 1.4806), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:22,767 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.1242 (max= 1.8634), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:22,768 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.1242 (max= 1.8634), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:22,768 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.1242 (max= 1.8634), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:22,768 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.1242 (max= 1.8634), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:22,768 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.1242 (max= 1.8634), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:22,768 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.1242 (max= 1.8634), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:22,768 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.1242 (max= 1.8634), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:22,768 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.1242 (max= 1.8634), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:54,594 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.1064 (max= 1.4598), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:54,594 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.1064 (max= 1.4598), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:54,594 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.1064 (max= 1.4598), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:54,594 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.1064 (max= 1.4598), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:54,595 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.1064 (max= 1.4598), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:54,595 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.1064 (max= 1.4598), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:54,595 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.1064 (max= 1.4598), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:51:54,595 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.1064 (max= 1.4598), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:26,501 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.1190 (max= 1.5858), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:26,501 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.1190 (max= 1.5858), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:26,501 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.1190 (max= 1.5858), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:26,501 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.1190 (max= 1.5858), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:26,501 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.1190 (max= 1.5858), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:26,501 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.1190 (max= 1.5858), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:26,501 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.1190 (max= 1.5858), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:26,501 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.1190 (max= 1.5858), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:58,365 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.1175 (max= 1.5925), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:58,365 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.1175 (max= 1.5925), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:58,365 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.1175 (max= 1.5925), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:58,366 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.1175 (max= 1.5925), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:58,366 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.1175 (max= 1.5925), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:58,366 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.1175 (max= 1.5925), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:58,366 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.1175 (max= 1.5925), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:52:58,366 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.1175 (max= 1.5925), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:53:30,232 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.1531 (max= 1.6393), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:53:30,232 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.1531 (max= 1.6393), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:53:30,232 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.1531 (max= 1.6393), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:53:30,232 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.1531 (max= 1.6393), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:53:30,232 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.1531 (max= 1.6393), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:53:30,232 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.1531 (max= 1.6393), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:53:30,232 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.1531 (max= 1.6393), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:53:30,233 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.1531 (max= 1.6393), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:02,078 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.1084 (max= 1.8585), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:02,078 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.1084 (max= 1.8585), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:02,078 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.1084 (max= 1.8585), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:02,078 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.1084 (max= 1.8585), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:02,078 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.1084 (max= 1.8585), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:02,078 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.1084 (max= 1.8585), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:02,078 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.1084 (max= 1.8585), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:02,078 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.1084 (max= 1.8585), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:33,915 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.1094 (max= 1.7118), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:33,916 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.1094 (max= 1.7118), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:33,916 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.1094 (max= 1.7118), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:33,916 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.1094 (max= 1.7118), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:33,916 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.1094 (max= 1.7118), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:33,916 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.1094 (max= 1.7118), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:33,916 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.1094 (max= 1.7118), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:54:33,916 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.1094 (max= 1.7118), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:05,849 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.1410 (max= 1.5990), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:05,849 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.1410 (max= 1.5990), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:05,849 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.1410 (max= 1.5990), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:05,849 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.1410 (max= 1.5990), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:05,849 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.1410 (max= 1.5990), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:05,849 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.1410 (max= 1.5990), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:05,849 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.1410 (max= 1.5990), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:05,850 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.1410 (max= 1.5990), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:37,734 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.1280 (max= 1.6979), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:37,734 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.1280 (max= 1.6979), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:37,734 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.1280 (max= 1.6979), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:37,734 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.1280 (max= 1.6979), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:37,734 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.1280 (max= 1.6979), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:37,734 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.1280 (max= 1.6979), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:37,734 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.1280 (max= 1.6979), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:37,734 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.1280 (max= 1.6979), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:55:44,691 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:904495 +2025-10-25 12:55:54,228 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:4429741 +2025-10-25 12:56:09,535 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.1165 (max= 1.5985), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:09,535 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.1165 (max= 1.5985), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:09,535 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.1165 (max= 1.5985), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:09,535 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.1165 (max= 1.5985), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:09,535 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.1165 (max= 1.5985), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:09,535 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.1165 (max= 1.5985), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:09,535 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.1165 (max= 1.5985), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:09,535 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.1165 (max= 1.5985), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:41,412 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.0983 (max= 1.4443), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:41,412 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.0983 (max= 1.4443), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:41,413 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.0983 (max= 1.4443), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:41,413 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.0983 (max= 1.4443), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:41,413 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.0983 (max= 1.4443), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:41,413 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.0983 (max= 1.4443), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:41,413 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.0983 (max= 1.4443), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:56:41,413 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.0983 (max= 1.4443), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:57:13,990 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.1361 (max= 1.6306), tps=20119, mfu=41.92%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.08s, 5.12%) +2025-10-25 12:57:13,990 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.1361 (max= 1.6306), tps=20119, mfu=41.92%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.08s, 5.12%) +2025-10-25 12:57:13,990 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.1361 (max= 1.6306), tps=20119, mfu=41.92%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.08s, 5.12%) +2025-10-25 12:57:13,990 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.1361 (max= 1.6306), tps=20119, mfu=41.92%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.08s, 5.12%) +2025-10-25 12:57:13,990 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.1361 (max= 1.6306), tps=20119, mfu=41.92%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.08s, 5.12%) +2025-10-25 12:57:13,990 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.1361 (max= 1.6306), tps=20119, mfu=41.92%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.08s, 5.12%) +2025-10-25 12:57:13,990 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.1361 (max= 1.6306), tps=20119, mfu=41.92%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.08s, 5.12%) +2025-10-25 12:57:13,990 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.1361 (max= 1.6306), tps=20119, mfu=41.92%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.08s, 5.12%) +2025-10-25 12:57:45,902 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.1148 (max= 1.5743), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:57:45,902 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.1148 (max= 1.5743), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:57:45,903 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.1148 (max= 1.5743), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:57:45,903 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.1148 (max= 1.5743), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:57:45,903 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.1148 (max= 1.5743), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:57:45,903 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.1148 (max= 1.5743), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:57:45,903 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.1148 (max= 1.5743), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:57:45,903 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.1148 (max= 1.5743), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:17,672 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.1304 (max= 1.6039), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:17,672 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.1304 (max= 1.6039), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:17,672 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.1304 (max= 1.6039), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:17,672 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.1304 (max= 1.6039), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:17,672 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.1304 (max= 1.6039), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:17,672 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.1304 (max= 1.6039), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:17,672 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.1304 (max= 1.6039), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:17,672 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.1304 (max= 1.6039), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:49,497 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.0994 (max= 1.8988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:49,497 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.0994 (max= 1.8988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:49,498 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.0994 (max= 1.8988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:49,498 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.0994 (max= 1.8988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:49,498 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.0994 (max= 1.8988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:49,498 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.0994 (max= 1.8988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:49,498 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.0994 (max= 1.8988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:58:49,498 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.0994 (max= 1.8988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:21,309 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.1080 (max= 1.6015), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:21,309 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.1080 (max= 1.6015), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:21,309 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.1080 (max= 1.6015), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:21,309 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.1080 (max= 1.6015), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:21,309 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.1080 (max= 1.6015), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:21,309 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.1080 (max= 1.6015), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:21,309 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.1080 (max= 1.6015), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:21,309 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.1080 (max= 1.6015), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:53,113 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.1143 (max= 1.5729), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:53,114 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.1143 (max= 1.5729), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:53,114 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.1143 (max= 1.5729), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:53,114 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.1143 (max= 1.5729), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:53,114 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.1143 (max= 1.5729), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:53,114 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.1143 (max= 1.5729), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:53,114 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.1143 (max= 1.5729), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 12:59:53,114 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.1143 (max= 1.5729), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:24,938 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.1069 (max= 1.5747), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:24,939 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.1069 (max= 1.5747), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:24,939 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.1069 (max= 1.5747), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:24,939 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.1069 (max= 1.5747), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:24,939 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.1069 (max= 1.5747), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:24,939 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.1069 (max= 1.5747), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:24,939 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.1069 (max= 1.5747), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:24,939 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.1069 (max= 1.5747), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:56,803 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.1216 (max= 1.5206), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:56,803 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.1216 (max= 1.5206), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:56,803 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.1216 (max= 1.5206), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:56,803 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.1216 (max= 1.5206), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:56,803 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.1216 (max= 1.5206), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:56,803 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.1216 (max= 1.5206), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:56,803 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.1216 (max= 1.5206), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:00:56,803 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.1216 (max= 1.5206), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:01:28,589 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.1238 (max= 1.5876), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:01:28,589 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.1238 (max= 1.5876), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:01:28,589 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.1238 (max= 1.5876), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:01:28,589 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.1238 (max= 1.5876), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:01:28,589 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.1238 (max= 1.5876), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:01:28,589 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.1238 (max= 1.5876), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:01:28,589 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.1238 (max= 1.5876), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:01:28,589 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.1238 (max= 1.5876), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:00,419 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.0990 (max= 1.6033), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:00,419 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.0990 (max= 1.6033), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:00,419 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.0990 (max= 1.6033), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:00,419 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.0990 (max= 1.6033), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:00,419 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.0990 (max= 1.6033), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:00,419 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.0990 (max= 1.6033), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:00,419 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.0990 (max= 1.6033), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:00,419 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.0990 (max= 1.6033), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:32,279 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.1043 (max= 1.5101), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:32,279 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.1043 (max= 1.5101), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:32,279 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.1043 (max= 1.5101), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:32,279 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.1043 (max= 1.5101), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:32,279 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.1043 (max= 1.5101), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:32,280 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.1043 (max= 1.5101), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:32,280 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.1043 (max= 1.5101), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:02:32,280 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.1043 (max= 1.5101), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:04,162 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.1341 (max= 1.5554), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:04,162 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.1341 (max= 1.5554), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:04,162 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.1341 (max= 1.5554), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:04,162 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.1341 (max= 1.5554), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:04,162 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.1341 (max= 1.5554), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:04,162 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.1341 (max= 1.5554), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:04,162 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.1341 (max= 1.5554), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:04,163 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.1341 (max= 1.5554), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:36,068 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.1044 (max= 1.5747), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:36,068 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.1044 (max= 1.5747), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:36,068 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.1044 (max= 1.5747), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:36,068 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.1044 (max= 1.5747), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:36,068 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.1044 (max= 1.5747), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:36,068 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.1044 (max= 1.5747), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:36,068 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.1044 (max= 1.5747), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:03:36,068 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.1044 (max= 1.5747), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:07,941 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.1189 (max= 1.4943), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:07,941 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.1189 (max= 1.4943), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:07,941 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.1189 (max= 1.4943), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:07,941 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.1189 (max= 1.4943), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:07,941 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.1189 (max= 1.4943), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:07,941 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.1189 (max= 1.4943), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:07,941 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.1189 (max= 1.4943), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:07,941 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.1189 (max= 1.4943), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-3000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-3000! Save time: 4.40405011177063 +2025-10-25 13:04:39,829 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.1114 (max= 1.6964), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:39,829 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.1114 (max= 1.6964), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:39,829 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-25 13:04:39,829 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:04:39,829 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-25 13:04:39,829 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:04:39,829 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.1114 (max= 1.6964), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:39,829 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.1114 (max= 1.6964), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:39,829 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.1114 (max= 1.6964), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:39,829 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.1114 (max= 1.6964), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:39,829 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-25 13:04:39,829 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-25 13:04:39,829 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:04:39,829 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:04:39,829 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-25 13:04:39,829 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-25 13:04:39,829 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:04:39,829 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:04:39,829 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.1114 (max= 1.6964), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:39,829 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-25 13:04:39,829 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:04:39,829 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.1114 (max= 1.6964), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:04:39,830 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-25 13:04:39,830 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:04:55,338 - root - INFO - Finished saving the checkpoint in 15.51 seconds +2025-10-25 13:04:55,344 - root - INFO - Finished saving the checkpoint in 15.52 seconds +2025-10-25 13:04:55,345 - root - INFO - Finished saving the checkpoint in 15.52 seconds +2025-10-25 13:04:55,345 - root - INFO - Finished saving the checkpoint in 15.52 seconds +2025-10-25 13:04:55,346 - root - INFO - Finished saving the checkpoint in 15.52 seconds +2025-10-25 13:04:55,346 - root - INFO - Finished saving the checkpoint in 15.52 seconds +2025-10-25 13:04:55,346 - root - INFO - Finished saving the checkpoint in 15.52 seconds +2025-10-25 13:04:55,346 - root - INFO - Finished saving the checkpoint in 15.52 seconds +2025-10-25 13:05:27,164 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.1206 (max= 1.5708), tps=13846, mfu=28.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:27,164 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.1206 (max= 1.5708), tps=13846, mfu=28.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:27,164 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.1206 (max= 1.5708), tps=13846, mfu=28.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:27,164 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.1206 (max= 1.5708), tps=13846, mfu=28.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:27,164 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.1206 (max= 1.5708), tps=13846, mfu=28.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:27,164 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.1206 (max= 1.5708), tps=13846, mfu=28.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:27,164 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.1206 (max= 1.5708), tps=13846, mfu=28.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:27,164 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.1206 (max= 1.5708), tps=13846, mfu=28.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:59,004 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.1233 (max= 1.5650), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:59,004 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.1233 (max= 1.5650), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:59,004 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.1233 (max= 1.5650), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:59,004 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.1233 (max= 1.5650), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:59,004 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.1233 (max= 1.5650), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:59,004 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.1233 (max= 1.5650), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:59,004 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.1233 (max= 1.5650), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:05:59,004 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.1233 (max= 1.5650), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:06:30,841 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.1198 (max= 1.5987), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:06:30,841 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.1198 (max= 1.5987), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:06:30,841 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.1198 (max= 1.5987), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:06:30,841 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.1198 (max= 1.5987), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:06:30,841 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.1198 (max= 1.5987), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:06:30,841 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.1198 (max= 1.5987), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:06:30,841 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.1198 (max= 1.5987), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:06:30,841 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.1198 (max= 1.5987), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:02,662 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.1293 (max= 1.5148), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:02,662 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.1293 (max= 1.5148), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:02,662 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.1293 (max= 1.5148), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:02,662 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.1293 (max= 1.5148), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:02,662 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.1293 (max= 1.5148), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:02,662 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.1293 (max= 1.5148), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:02,662 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.1293 (max= 1.5148), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:02,662 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.1293 (max= 1.5148), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:34,552 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.0997 (max= 2.0066), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:34,552 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.0997 (max= 2.0066), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:34,552 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.0997 (max= 2.0066), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:34,552 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.0997 (max= 2.0066), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:34,552 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.0997 (max= 2.0066), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:34,552 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.0997 (max= 2.0066), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:34,552 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.0997 (max= 2.0066), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:07:34,552 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.0997 (max= 2.0066), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:06,356 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.1033 (max= 1.5620), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:06,356 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.1033 (max= 1.5620), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:06,356 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.1033 (max= 1.5620), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:06,356 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.1033 (max= 1.5620), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:06,356 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.1033 (max= 1.5620), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:06,356 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.1033 (max= 1.5620), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:06,356 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.1033 (max= 1.5620), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:06,357 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.1033 (max= 1.5620), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:38,251 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.1326 (max= 1.5878), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:38,251 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.1326 (max= 1.5878), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:38,251 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.1326 (max= 1.5878), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:38,251 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.1326 (max= 1.5878), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:38,251 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.1326 (max= 1.5878), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:38,251 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.1326 (max= 1.5878), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:38,251 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.1326 (max= 1.5878), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:08:38,251 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.1326 (max= 1.5878), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:10,078 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.1115 (max= 1.7089), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:10,078 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.1115 (max= 1.7089), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:10,078 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.1115 (max= 1.7089), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:10,078 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.1115 (max= 1.7089), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:10,078 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.1115 (max= 1.7089), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:10,078 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.1115 (max= 1.7089), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:10,078 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.1115 (max= 1.7089), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:10,079 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.1115 (max= 1.7089), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:41,998 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.1236 (max= 1.7096), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:41,998 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.1236 (max= 1.7096), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:41,998 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.1236 (max= 1.7096), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:41,998 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.1236 (max= 1.7096), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:41,999 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.1236 (max= 1.7096), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:41,999 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.1236 (max= 1.7096), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:41,999 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.1236 (max= 1.7096), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:09:41,999 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.1236 (max= 1.7096), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:13,859 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.1011 (max= 1.5931), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:13,859 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.1011 (max= 1.5931), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:13,859 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.1011 (max= 1.5931), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:13,859 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.1011 (max= 1.5931), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:13,859 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.1011 (max= 1.5931), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:13,859 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.1011 (max= 1.5931), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:13,859 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.1011 (max= 1.5931), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:13,859 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.1011 (max= 1.5931), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:45,751 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.1193 (max= 1.5221), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:45,751 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.1193 (max= 1.5221), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:45,751 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.1193 (max= 1.5221), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:45,751 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.1193 (max= 1.5221), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:45,752 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.1193 (max= 1.5221), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:45,752 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.1193 (max= 1.5221), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:45,752 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.1193 (max= 1.5221), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:10:45,752 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.1193 (max= 1.5221), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:17,678 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.1170 (max= 1.6568), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:17,678 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.1170 (max= 1.6568), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:17,678 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.1170 (max= 1.6568), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:17,678 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.1170 (max= 1.6568), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:17,678 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.1170 (max= 1.6568), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:17,678 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.1170 (max= 1.6568), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:17,678 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.1170 (max= 1.6568), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:17,679 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.1170 (max= 1.6568), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:49,532 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.1303 (max= 1.6317), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:49,532 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.1303 (max= 1.6317), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:49,532 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.1303 (max= 1.6317), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:49,532 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.1303 (max= 1.6317), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:49,533 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.1303 (max= 1.6317), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:49,533 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.1303 (max= 1.6317), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:49,533 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.1303 (max= 1.6317), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:11:49,533 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.1303 (max= 1.6317), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:21,334 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.1272 (max= 1.5468), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:21,334 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.1272 (max= 1.5468), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:21,334 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.1272 (max= 1.5468), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:21,334 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.1272 (max= 1.5468), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:21,334 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.1272 (max= 1.5468), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:21,334 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.1272 (max= 1.5468), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:21,334 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.1272 (max= 1.5468), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:21,334 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.1272 (max= 1.5468), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:53,158 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.1201 (max= 1.6291), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:53,158 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.1201 (max= 1.6291), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:53,158 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.1201 (max= 1.6291), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:53,158 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.1201 (max= 1.6291), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:53,158 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.1201 (max= 1.6291), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:53,158 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.1201 (max= 1.6291), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:53,158 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.1201 (max= 1.6291), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:12:53,158 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.1201 (max= 1.6291), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:25,064 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.1215 (max= 1.6002), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:25,064 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.1215 (max= 1.6002), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:25,064 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.1215 (max= 1.6002), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:25,064 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.1215 (max= 1.6002), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:25,064 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.1215 (max= 1.6002), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:25,065 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.1215 (max= 1.6002), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:25,065 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.1215 (max= 1.6002), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:25,065 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.1215 (max= 1.6002), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:56,870 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.1305 (max= 1.5498), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:56,870 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.1305 (max= 1.5498), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:56,870 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.1305 (max= 1.5498), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:56,870 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.1305 (max= 1.5498), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:56,870 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.1305 (max= 1.5498), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:56,870 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.1305 (max= 1.5498), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:56,870 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.1305 (max= 1.5498), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:13:56,871 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.1305 (max= 1.5498), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:14:28,766 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.1214 (max= 1.6989), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:14:28,766 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.1214 (max= 1.6989), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:14:28,766 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.1214 (max= 1.6989), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:14:28,766 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.1214 (max= 1.6989), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:14:28,766 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.1214 (max= 1.6989), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:14:28,766 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.1214 (max= 1.6989), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:14:28,766 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.1214 (max= 1.6989), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:14:28,766 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.1214 (max= 1.6989), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:00,540 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.1295 (max= 1.6828), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:00,540 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.1295 (max= 1.6828), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:00,540 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.1295 (max= 1.6828), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:00,540 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.1295 (max= 1.6828), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:00,540 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.1295 (max= 1.6828), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:00,540 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.1295 (max= 1.6828), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:00,540 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.1295 (max= 1.6828), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:00,540 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.1295 (max= 1.6828), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:32,401 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.1373 (max= 1.6229), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:32,401 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.1373 (max= 1.6229), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:32,401 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.1373 (max= 1.6229), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:32,401 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.1373 (max= 1.6229), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:32,401 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.1373 (max= 1.6229), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:32,401 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.1373 (max= 1.6229), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:32,401 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.1373 (max= 1.6229), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:15:32,401 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.1373 (max= 1.6229), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:04,267 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.1047 (max= 1.5978), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:04,267 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.1047 (max= 1.5978), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:04,267 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.1047 (max= 1.5978), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:04,267 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.1047 (max= 1.5978), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:04,267 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.1047 (max= 1.5978), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:04,267 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.1047 (max= 1.5978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:04,267 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.1047 (max= 1.5978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:04,268 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.1047 (max= 1.5978), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:36,074 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.1041 (max= 1.5754), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:36,074 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.1041 (max= 1.5754), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:36,074 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.1041 (max= 1.5754), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:36,074 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.1041 (max= 1.5754), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:36,074 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.1041 (max= 1.5754), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:36,074 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.1041 (max= 1.5754), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:36,074 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.1041 (max= 1.5754), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:16:36,074 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.1041 (max= 1.5754), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:07,963 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.1329 (max= 1.6034), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:07,963 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.1329 (max= 1.6034), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:07,963 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.1329 (max= 1.6034), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:07,963 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.1329 (max= 1.6034), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:07,963 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.1329 (max= 1.6034), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:07,963 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.1329 (max= 1.6034), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:07,963 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.1329 (max= 1.6034), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:07,963 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.1329 (max= 1.6034), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:39,770 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.1500 (max= 1.5355), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:39,770 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.1500 (max= 1.5355), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:39,770 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.1500 (max= 1.5355), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:39,770 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.1500 (max= 1.5355), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:39,770 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.1500 (max= 1.5355), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:39,770 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.1500 (max= 1.5355), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:39,770 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.1500 (max= 1.5355), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:17:39,770 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.1500 (max= 1.5355), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:11,579 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.1261 (max= 1.8887), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:11,579 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.1261 (max= 1.8887), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:11,579 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.1261 (max= 1.8887), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:11,579 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.1261 (max= 1.8887), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:11,579 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.1261 (max= 1.8887), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:11,579 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.1261 (max= 1.8887), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:11,579 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.1261 (max= 1.8887), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:11,579 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.1261 (max= 1.8887), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:43,434 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.0895 (max= 1.6204), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:43,434 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.0895 (max= 1.6204), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:43,434 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.0895 (max= 1.6204), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:43,434 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.0895 (max= 1.6204), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:43,434 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.0895 (max= 1.6204), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:43,434 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.0895 (max= 1.6204), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:43,435 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.0895 (max= 1.6204), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:18:43,435 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.0895 (max= 1.6204), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:15,340 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.1405 (max= 1.6284), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:15,340 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.1405 (max= 1.6284), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:15,340 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.1405 (max= 1.6284), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:15,340 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.1405 (max= 1.6284), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:15,340 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.1405 (max= 1.6284), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:15,340 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.1405 (max= 1.6284), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:15,340 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.1405 (max= 1.6284), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:15,340 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.1405 (max= 1.6284), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:47,252 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.1272 (max= 1.5909), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:47,252 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.1272 (max= 1.5909), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:47,252 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.1272 (max= 1.5909), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:47,252 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.1272 (max= 1.5909), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:47,253 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.1272 (max= 1.5909), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:47,253 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.1272 (max= 1.5909), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:47,253 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.1272 (max= 1.5909), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:19:47,253 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.1272 (max= 1.5909), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:19,026 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.1185 (max= 1.7277), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:19,026 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.1185 (max= 1.7277), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:19,026 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.1185 (max= 1.7277), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:19,026 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.1185 (max= 1.7277), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:19,027 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.1185 (max= 1.7277), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:19,027 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.1185 (max= 1.7277), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:19,027 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.1185 (max= 1.7277), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:19,027 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.1185 (max= 1.7277), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:50,854 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.1280 (max= 1.4932), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:50,854 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.1280 (max= 1.4932), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:50,854 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.1280 (max= 1.4932), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:50,854 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.1280 (max= 1.4932), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:50,854 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.1280 (max= 1.4932), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:50,854 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.1280 (max= 1.4932), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:50,854 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.1280 (max= 1.4932), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:20:50,854 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.1280 (max= 1.4932), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:22,794 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.1184 (max= 1.6871), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:22,794 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.1184 (max= 1.6871), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:22,794 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.1184 (max= 1.6871), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:22,794 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.1184 (max= 1.6871), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:22,794 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.1184 (max= 1.6871), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:22,794 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.1184 (max= 1.6871), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:22,794 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.1184 (max= 1.6871), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:22,794 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.1184 (max= 1.6871), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:54,575 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.1161 (max= 1.6356), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:54,575 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.1161 (max= 1.6356), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:54,575 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.1161 (max= 1.6356), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:54,576 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.1161 (max= 1.6356), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:54,576 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.1161 (max= 1.6356), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:54,576 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.1161 (max= 1.6356), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:54,576 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.1161 (max= 1.6356), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:21:54,576 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.1161 (max= 1.6356), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:26,397 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.1329 (max= 1.5085), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:26,398 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.1329 (max= 1.5085), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:26,398 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.1329 (max= 1.5085), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:26,398 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.1329 (max= 1.5085), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:26,398 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.1329 (max= 1.5085), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:26,398 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.1329 (max= 1.5085), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:26,398 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.1329 (max= 1.5085), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:26,398 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.1329 (max= 1.5085), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:58,221 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.1185 (max= 1.6091), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:58,221 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.1185 (max= 1.6091), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:58,221 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.1185 (max= 1.6091), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:58,221 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.1185 (max= 1.6091), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:58,221 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.1185 (max= 1.6091), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:58,221 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.1185 (max= 1.6091), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:58,221 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.1185 (max= 1.6091), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:22:58,221 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.1185 (max= 1.6091), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:23:30,037 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.1276 (max= 1.5357), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:23:30,038 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.1276 (max= 1.5357), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:23:30,038 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.1276 (max= 1.5357), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:23:30,038 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.1276 (max= 1.5357), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:23:30,038 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.1276 (max= 1.5357), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:23:30,038 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.1276 (max= 1.5357), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:23:30,038 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.1276 (max= 1.5357), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:23:30,038 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.1276 (max= 1.5357), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:01,819 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.1293 (max= 1.6032), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:01,819 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.1293 (max= 1.6032), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:01,819 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.1293 (max= 1.6032), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:01,819 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.1293 (max= 1.6032), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:01,819 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.1293 (max= 1.6032), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:01,819 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.1293 (max= 1.6032), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:01,819 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.1293 (max= 1.6032), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:01,819 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.1293 (max= 1.6032), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:33,726 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.1145 (max= 1.5580), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:33,726 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.1145 (max= 1.5580), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:33,726 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.1145 (max= 1.5580), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:33,726 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.1145 (max= 1.5580), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:33,726 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.1145 (max= 1.5580), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:33,726 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.1145 (max= 1.5580), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:33,726 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.1145 (max= 1.5580), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:24:33,727 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.1145 (max= 1.5580), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:05,563 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.1394 (max= 1.6043), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:05,563 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.1394 (max= 1.6043), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:05,563 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.1394 (max= 1.6043), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:05,563 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.1394 (max= 1.6043), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:05,563 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.1394 (max= 1.6043), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:05,563 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.1394 (max= 1.6043), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:05,563 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.1394 (max= 1.6043), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:05,563 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.1394 (max= 1.6043), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:37,493 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.1413 (max= 1.6018), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:37,493 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.1413 (max= 1.6018), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:37,493 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.1413 (max= 1.6018), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:37,493 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.1413 (max= 1.6018), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:37,493 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.1413 (max= 1.6018), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:37,493 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.1413 (max= 1.6018), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:37,493 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.1413 (max= 1.6018), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:25:37,493 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.1413 (max= 1.6018), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:09,368 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.1271 (max= 1.7684), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:09,368 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.1271 (max= 1.7684), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:09,368 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.1271 (max= 1.7684), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:09,368 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.1271 (max= 1.7684), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:09,368 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.1271 (max= 1.7684), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:09,368 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.1271 (max= 1.7684), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:09,368 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.1271 (max= 1.7684), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:09,368 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.1271 (max= 1.7684), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:41,276 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.1069 (max= 1.5340), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:41,276 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.1069 (max= 1.5340), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:41,276 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.1069 (max= 1.5340), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:41,276 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.1069 (max= 1.5340), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:41,276 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.1069 (max= 1.5340), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:41,276 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.1069 (max= 1.5340), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:41,276 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.1069 (max= 1.5340), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:26:41,276 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.1069 (max= 1.5340), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:13,097 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.0983 (max= 1.5735), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:13,097 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.0983 (max= 1.5735), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:13,097 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.0983 (max= 1.5735), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:13,097 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.0983 (max= 1.5735), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:13,097 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.0983 (max= 1.5735), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:13,097 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.0983 (max= 1.5735), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:13,097 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.0983 (max= 1.5735), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:13,097 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.0983 (max= 1.5735), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:44,951 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.1041 (max= 1.5519), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:44,951 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.1041 (max= 1.5519), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:44,951 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.1041 (max= 1.5519), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:44,951 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.1041 (max= 1.5519), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:44,951 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.1041 (max= 1.5519), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:44,951 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.1041 (max= 1.5519), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:44,951 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.1041 (max= 1.5519), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:27:44,951 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.1041 (max= 1.5519), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:16,822 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.1388 (max= 1.7069), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:16,822 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.1388 (max= 1.7069), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:16,822 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.1388 (max= 1.7069), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:16,822 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.1388 (max= 1.7069), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:16,822 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.1388 (max= 1.7069), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:16,822 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.1388 (max= 1.7069), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:16,822 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.1388 (max= 1.7069), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:16,822 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.1388 (max= 1.7069), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:48,611 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.1139 (max= 1.7077), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:48,611 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.1139 (max= 1.7077), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:48,611 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.1139 (max= 1.7077), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:48,612 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.1139 (max= 1.7077), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:48,612 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.1139 (max= 1.7077), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:48,612 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.1139 (max= 1.7077), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:48,612 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.1139 (max= 1.7077), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:28:48,612 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.1139 (max= 1.7077), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:20,503 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.1309 (max= 1.4583), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:20,503 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.1309 (max= 1.4583), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:20,503 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.1309 (max= 1.4583), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:20,503 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.1309 (max= 1.4583), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:20,503 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.1309 (max= 1.4583), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:20,503 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.1309 (max= 1.4583), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:20,503 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.1309 (max= 1.4583), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:20,503 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.1309 (max= 1.4583), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:52,377 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.1117 (max= 1.7213), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:52,377 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.1117 (max= 1.7213), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:52,377 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.1117 (max= 1.7213), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:52,377 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.1117 (max= 1.7213), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:52,377 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.1117 (max= 1.7213), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:52,377 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.1117 (max= 1.7213), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:52,378 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.1117 (max= 1.7213), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:29:52,378 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.1117 (max= 1.7213), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:24,254 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.1211 (max= 1.5747), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:24,254 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.1211 (max= 1.5747), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:24,254 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.1211 (max= 1.5747), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:24,254 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.1211 (max= 1.5747), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:24,254 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.1211 (max= 1.5747), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:24,254 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.1211 (max= 1.5747), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:24,254 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.1211 (max= 1.5747), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:24,254 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.1211 (max= 1.5747), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:56,065 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.1207 (max= 1.5076), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:56,065 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.1207 (max= 1.5076), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:56,065 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.1207 (max= 1.5076), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:56,065 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.1207 (max= 1.5076), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:56,065 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.1207 (max= 1.5076), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:56,065 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.1207 (max= 1.5076), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:56,065 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.1207 (max= 1.5076), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:30:56,065 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.1207 (max= 1.5076), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:27,972 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.1619 (max= 1.5882), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:27,972 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.1619 (max= 1.5882), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:27,972 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.1619 (max= 1.5882), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:27,972 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.1619 (max= 1.5882), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:27,972 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.1619 (max= 1.5882), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:27,972 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.1619 (max= 1.5882), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:27,972 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.1619 (max= 1.5882), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:27,972 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.1619 (max= 1.5882), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:59,897 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.1127 (max= 1.5750), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:59,898 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.1127 (max= 1.5750), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:59,898 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.1127 (max= 1.5750), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:59,898 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.1127 (max= 1.5750), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:59,898 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.1127 (max= 1.5750), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:59,898 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.1127 (max= 1.5750), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:59,898 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.1127 (max= 1.5750), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:31:59,898 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.1127 (max= 1.5750), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:32:31,758 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.1114 (max= 1.5648), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:32:31,758 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.1114 (max= 1.5648), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:32:31,758 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.1114 (max= 1.5648), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:32:31,758 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.1114 (max= 1.5648), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:32:31,758 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.1114 (max= 1.5648), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:32:31,758 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.1114 (max= 1.5648), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:32:31,758 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.1114 (max= 1.5648), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:32:31,758 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.1114 (max= 1.5648), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:03,632 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.1285 (max= 1.6891), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:03,632 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.1285 (max= 1.6891), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:03,632 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.1285 (max= 1.6891), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:03,632 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.1285 (max= 1.6891), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:03,632 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.1285 (max= 1.6891), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:03,632 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.1285 (max= 1.6891), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:03,632 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.1285 (max= 1.6891), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:03,632 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.1285 (max= 1.6891), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:35,506 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.1292 (max= 1.6304), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:35,506 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.1292 (max= 1.6304), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:35,506 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.1292 (max= 1.6304), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:35,506 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.1292 (max= 1.6304), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:35,506 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.1292 (max= 1.6304), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:35,506 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.1292 (max= 1.6304), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:35,506 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.1292 (max= 1.6304), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:33:35,506 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.1292 (max= 1.6304), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:07,313 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.1258 (max= 1.5927), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:07,313 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.1258 (max= 1.5927), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:07,313 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.1258 (max= 1.5927), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:07,313 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.1258 (max= 1.5927), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:07,313 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.1258 (max= 1.5927), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:07,314 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.1258 (max= 1.5927), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:07,314 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.1258 (max= 1.5927), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:07,314 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.1258 (max= 1.5927), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:39,168 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.1299 (max= 1.4828), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:39,168 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.1299 (max= 1.4828), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:39,168 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.1299 (max= 1.4828), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:39,168 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.1299 (max= 1.4828), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:39,168 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.1299 (max= 1.4828), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:39,168 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.1299 (max= 1.4828), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:39,168 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.1299 (max= 1.4828), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:34:39,169 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.1299 (max= 1.4828), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:11,010 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.1354 (max= 1.6221), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:11,010 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.1354 (max= 1.6221), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:11,010 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.1354 (max= 1.6221), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:11,010 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.1354 (max= 1.6221), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:11,010 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.1354 (max= 1.6221), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:11,010 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.1354 (max= 1.6221), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:11,010 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.1354 (max= 1.6221), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:11,010 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.1354 (max= 1.6221), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:42,864 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.1009 (max= 1.8996), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:42,864 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.1009 (max= 1.8996), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:42,864 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.1009 (max= 1.8996), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:42,864 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.1009 (max= 1.8996), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:42,864 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.1009 (max= 1.8996), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:42,864 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.1009 (max= 1.8996), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:42,864 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.1009 (max= 1.8996), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:35:42,864 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.1009 (max= 1.8996), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:14,768 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.1150 (max= 1.5341), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:14,768 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.1150 (max= 1.5341), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:14,768 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.1150 (max= 1.5341), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:14,768 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.1150 (max= 1.5341), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:14,768 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.1150 (max= 1.5341), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:14,768 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.1150 (max= 1.5341), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:14,768 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.1150 (max= 1.5341), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:14,768 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.1150 (max= 1.5341), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:46,621 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.1163 (max= 1.5905), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:46,621 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.1163 (max= 1.5905), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:46,621 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.1163 (max= 1.5905), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:46,621 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.1163 (max= 1.5905), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:46,621 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.1163 (max= 1.5905), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:46,621 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.1163 (max= 1.5905), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:46,621 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.1163 (max= 1.5905), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:36:46,621 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.1163 (max= 1.5905), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:18,483 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.1032 (max= 1.6156), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:18,483 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.1032 (max= 1.6156), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:18,483 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.1032 (max= 1.6156), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:18,483 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.1032 (max= 1.6156), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:18,483 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.1032 (max= 1.6156), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:18,484 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.1032 (max= 1.6156), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:18,484 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.1032 (max= 1.6156), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:18,484 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.1032 (max= 1.6156), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:50,323 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.0915 (max= 1.6108), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:50,323 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.0915 (max= 1.6108), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:50,323 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.0915 (max= 1.6108), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:50,323 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.0915 (max= 1.6108), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:50,323 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.0915 (max= 1.6108), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:50,323 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.0915 (max= 1.6108), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:50,323 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.0915 (max= 1.6108), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:37:50,323 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.0915 (max= 1.6108), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:22,179 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.1030 (max= 1.4982), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:22,179 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.1030 (max= 1.4982), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:22,179 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.1030 (max= 1.4982), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:22,179 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.1030 (max= 1.4982), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:22,179 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.1030 (max= 1.4982), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:22,179 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.1030 (max= 1.4982), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:22,179 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.1030 (max= 1.4982), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:22,179 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.1030 (max= 1.4982), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:54,074 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.1031 (max= 1.9180), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:54,074 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.1031 (max= 1.9180), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:54,074 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.1031 (max= 1.9180), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:54,074 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.1031 (max= 1.9180), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:54,075 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.1031 (max= 1.9180), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:54,075 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.1031 (max= 1.9180), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:54,075 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.1031 (max= 1.9180), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:38:54,075 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.1031 (max= 1.9180), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:25,910 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.1296 (max= 1.6018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:25,910 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.1296 (max= 1.6018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:25,910 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.1296 (max= 1.6018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:25,910 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.1296 (max= 1.6018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:25,910 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.1296 (max= 1.6018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:25,910 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.1296 (max= 1.6018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:25,910 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.1296 (max= 1.6018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:25,910 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.1296 (max= 1.6018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:57,750 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.1229 (max= 1.5984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:57,750 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.1229 (max= 1.5984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:57,750 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.1229 (max= 1.5984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:57,750 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.1229 (max= 1.5984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:57,750 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.1229 (max= 1.5984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:57,750 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.1229 (max= 1.5984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:57,750 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.1229 (max= 1.5984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:39:57,750 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.1229 (max= 1.5984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:40:29,538 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.1339 (max= 1.6572), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:40:29,538 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.1339 (max= 1.6572), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:40:29,538 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.1339 (max= 1.6572), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:40:29,538 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.1339 (max= 1.6572), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:40:29,538 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.1339 (max= 1.6572), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:40:29,538 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.1339 (max= 1.6572), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:40:29,539 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.1339 (max= 1.6572), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:40:29,539 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.1339 (max= 1.6572), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:01,344 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.1126 (max= 1.6058), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:01,344 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.1126 (max= 1.6058), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:01,344 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.1126 (max= 1.6058), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:01,344 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.1126 (max= 1.6058), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:01,345 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.1126 (max= 1.6058), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:01,345 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.1126 (max= 1.6058), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:01,345 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.1126 (max= 1.6058), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:01,345 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.1126 (max= 1.6058), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:33,212 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.1076 (max= 1.5010), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:33,213 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.1076 (max= 1.5010), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:33,213 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.1076 (max= 1.5010), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:33,213 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.1076 (max= 1.5010), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:33,213 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.1076 (max= 1.5010), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:33,213 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.1076 (max= 1.5010), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:33,213 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.1076 (max= 1.5010), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:41:33,213 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.1076 (max= 1.5010), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:05,143 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.1392 (max= 1.6353), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:05,143 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.1392 (max= 1.6353), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:05,143 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.1392 (max= 1.6353), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:05,143 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.1392 (max= 1.6353), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:05,144 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.1392 (max= 1.6353), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:05,144 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.1392 (max= 1.6353), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:05,144 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.1392 (max= 1.6353), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:05,144 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.1392 (max= 1.6353), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:36,954 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.1208 (max= 1.8052), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:36,954 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.1208 (max= 1.8052), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:36,954 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.1208 (max= 1.8052), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:36,954 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.1208 (max= 1.8052), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:36,955 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.1208 (max= 1.8052), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:36,955 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.1208 (max= 1.8052), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:36,955 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.1208 (max= 1.8052), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:42:36,955 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.1208 (max= 1.8052), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:08,823 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.1249 (max= 1.5077), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:08,823 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.1249 (max= 1.5077), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:08,823 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.1249 (max= 1.5077), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:08,823 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.1249 (max= 1.5077), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:08,823 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.1249 (max= 1.5077), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:08,823 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.1249 (max= 1.5077), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:08,824 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.1249 (max= 1.5077), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:08,824 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.1249 (max= 1.5077), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:40,721 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.1085 (max= 1.5465), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:40,721 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.1085 (max= 1.5465), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:40,721 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.1085 (max= 1.5465), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:40,721 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.1085 (max= 1.5465), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:40,721 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.1085 (max= 1.5465), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:40,721 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.1085 (max= 1.5465), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:40,721 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.1085 (max= 1.5465), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:43:40,721 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.1085 (max= 1.5465), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:12,522 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.0902 (max= 1.6180), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:12,522 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.0902 (max= 1.6180), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:12,522 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.0902 (max= 1.6180), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:12,522 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.0902 (max= 1.6180), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:12,522 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.0902 (max= 1.6180), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:12,522 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.0902 (max= 1.6180), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:12,522 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.0902 (max= 1.6180), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:12,522 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.0902 (max= 1.6180), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:44,398 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.1240 (max= 1.4919), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:44,398 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.1240 (max= 1.4919), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:44,398 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.1240 (max= 1.4919), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:44,398 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.1240 (max= 1.4919), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:44,398 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.1240 (max= 1.4919), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:44,398 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.1240 (max= 1.4919), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:44,399 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.1240 (max= 1.4919), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:44:44,399 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.1240 (max= 1.4919), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:16,313 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.1328 (max= 1.6798), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:16,313 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.1328 (max= 1.6798), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:16,313 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.1328 (max= 1.6798), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:16,313 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.1328 (max= 1.6798), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:16,313 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.1328 (max= 1.6798), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:16,313 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.1328 (max= 1.6798), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:16,313 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.1328 (max= 1.6798), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:16,313 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.1328 (max= 1.6798), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:48,201 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.1151 (max= 1.5245), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:48,201 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.1151 (max= 1.5245), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:48,201 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.1151 (max= 1.5245), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:48,201 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.1151 (max= 1.5245), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:48,201 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.1151 (max= 1.5245), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:48,201 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.1151 (max= 1.5245), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:48,201 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.1151 (max= 1.5245), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:45:48,201 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.1151 (max= 1.5245), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:20,031 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.1109 (max= 1.5715), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:20,031 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.1109 (max= 1.5715), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:20,031 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.1109 (max= 1.5715), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:20,031 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.1109 (max= 1.5715), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:20,031 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.1109 (max= 1.5715), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:20,031 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.1109 (max= 1.5715), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:20,031 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.1109 (max= 1.5715), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:20,031 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.1109 (max= 1.5715), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:51,835 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.1280 (max= 1.7894), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:51,835 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.1280 (max= 1.7894), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:51,835 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.1280 (max= 1.7894), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:51,835 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.1280 (max= 1.7894), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:51,836 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.1280 (max= 1.7894), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:51,836 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.1280 (max= 1.7894), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:51,836 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.1280 (max= 1.7894), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:46:51,836 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.1280 (max= 1.7894), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:23,647 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.0993 (max= 1.7672), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:23,647 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.0993 (max= 1.7672), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:23,647 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.0993 (max= 1.7672), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:23,647 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.0993 (max= 1.7672), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:23,647 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.0993 (max= 1.7672), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:23,647 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.0993 (max= 1.7672), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:23,648 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.0993 (max= 1.7672), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:23,648 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.0993 (max= 1.7672), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:55,493 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.1406 (max= 1.5659), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:55,493 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.1406 (max= 1.5659), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:55,493 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.1406 (max= 1.5659), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:55,493 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.1406 (max= 1.5659), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:55,493 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.1406 (max= 1.5659), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:55,493 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.1406 (max= 1.5659), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:55,494 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.1406 (max= 1.5659), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:47:55,494 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.1406 (max= 1.5659), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:27,388 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.0851 (max= 1.6130), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:27,389 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.0851 (max= 1.6130), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:27,389 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.0851 (max= 1.6130), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:27,389 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.0851 (max= 1.6130), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:27,389 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.0851 (max= 1.6130), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:27,389 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.0851 (max= 1.6130), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:27,389 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.0851 (max= 1.6130), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:27,389 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.0851 (max= 1.6130), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:59,225 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.1177 (max= 1.7857), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:59,225 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.1177 (max= 1.7857), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:59,225 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.1177 (max= 1.7857), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:59,225 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.1177 (max= 1.7857), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:59,225 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.1177 (max= 1.7857), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:59,225 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.1177 (max= 1.7857), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:59,225 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.1177 (max= 1.7857), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:48:59,225 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.1177 (max= 1.7857), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:49:31,153 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.1333 (max= 1.5074), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:49:31,153 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.1333 (max= 1.5074), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:49:31,153 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.1333 (max= 1.5074), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:49:31,153 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.1333 (max= 1.5074), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:49:31,154 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.1333 (max= 1.5074), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:49:31,154 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.1333 (max= 1.5074), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:49:31,154 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.1333 (max= 1.5074), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:49:31,154 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.1333 (max= 1.5074), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:02,978 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.1342 (max= 1.6107), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:02,978 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.1342 (max= 1.6107), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:02,979 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.1342 (max= 1.6107), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:02,979 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.1342 (max= 1.6107), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:02,979 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.1342 (max= 1.6107), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:02,979 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.1342 (max= 1.6107), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:02,979 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.1342 (max= 1.6107), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:02,979 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.1342 (max= 1.6107), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:34,918 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.1137 (max= 1.4987), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:34,918 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.1137 (max= 1.4987), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:34,918 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.1137 (max= 1.4987), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:34,918 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.1137 (max= 1.4987), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:34,918 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.1137 (max= 1.4987), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:34,918 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.1137 (max= 1.4987), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:34,918 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.1137 (max= 1.4987), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:50:34,918 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.1137 (max= 1.4987), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:06,796 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.1085 (max= 1.5596), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:06,796 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.1085 (max= 1.5596), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:06,796 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.1085 (max= 1.5596), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:06,796 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.1085 (max= 1.5596), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:06,796 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.1085 (max= 1.5596), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:06,796 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.1085 (max= 1.5596), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:06,796 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.1085 (max= 1.5596), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:06,796 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.1085 (max= 1.5596), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:38,657 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.1348 (max= 1.6659), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:38,657 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.1348 (max= 1.6659), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:38,657 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.1348 (max= 1.6659), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:38,657 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.1348 (max= 1.6659), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:38,657 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.1348 (max= 1.6659), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:38,657 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.1348 (max= 1.6659), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:38,657 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.1348 (max= 1.6659), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:51:38,657 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.1348 (max= 1.6659), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:10,467 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.1244 (max= 1.8306), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:10,467 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.1244 (max= 1.8306), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:10,467 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.1244 (max= 1.8306), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:10,467 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.1244 (max= 1.8306), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:10,467 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.1244 (max= 1.8306), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:10,467 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.1244 (max= 1.8306), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:10,467 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.1244 (max= 1.8306), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:10,467 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.1244 (max= 1.8306), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:42,304 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.1079 (max= 1.4952), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:42,304 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.1079 (max= 1.4952), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:42,304 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.1079 (max= 1.4952), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:42,304 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.1079 (max= 1.4952), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:42,304 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.1079 (max= 1.4952), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:42,304 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.1079 (max= 1.4952), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:42,304 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.1079 (max= 1.4952), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:52:42,304 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.1079 (max= 1.4952), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:14,166 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.1134 (max= 1.6622), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:14,167 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.1134 (max= 1.6622), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:14,167 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.1134 (max= 1.6622), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:14,167 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.1134 (max= 1.6622), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:14,167 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.1134 (max= 1.6622), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:14,167 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.1134 (max= 1.6622), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:14,167 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.1134 (max= 1.6622), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:14,167 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.1134 (max= 1.6622), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:46,051 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.1316 (max= 1.6126), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:46,051 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.1316 (max= 1.6126), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:46,051 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.1316 (max= 1.6126), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:46,051 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.1316 (max= 1.6126), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:46,051 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.1316 (max= 1.6126), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:46,051 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.1316 (max= 1.6126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:46,051 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.1316 (max= 1.6126), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:53:46,051 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.1316 (max= 1.6126), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:17,904 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.1219 (max= 1.5876), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:17,905 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.1219 (max= 1.5876), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:17,905 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.1219 (max= 1.5876), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:17,905 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.1219 (max= 1.5876), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:17,905 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.1219 (max= 1.5876), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:17,905 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.1219 (max= 1.5876), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:17,905 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.1219 (max= 1.5876), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:17,905 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.1219 (max= 1.5876), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:49,809 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.1196 (max= 1.6554), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:49,809 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.1196 (max= 1.6554), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:49,809 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.1196 (max= 1.6554), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:49,809 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.1196 (max= 1.6554), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:49,809 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.1196 (max= 1.6554), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:49,809 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.1196 (max= 1.6554), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:49,809 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.1196 (max= 1.6554), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:54:49,809 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.1196 (max= 1.6554), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:21,604 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.1243 (max= 1.5381), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:21,604 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.1243 (max= 1.5381), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:21,604 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.1243 (max= 1.5381), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:21,604 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.1243 (max= 1.5381), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:21,604 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.1243 (max= 1.5381), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:21,604 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.1243 (max= 1.5381), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:21,605 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.1243 (max= 1.5381), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:21,605 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.1243 (max= 1.5381), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:53,443 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.1153 (max= 1.5701), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:53,443 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.1153 (max= 1.5701), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:53,444 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.1153 (max= 1.5701), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:53,444 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.1153 (max= 1.5701), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:53,444 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.1153 (max= 1.5701), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:53,444 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.1153 (max= 1.5701), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:53,444 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.1153 (max= 1.5701), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:55:53,444 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.1153 (max= 1.5701), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:25,276 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.1360 (max= 1.6505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:25,277 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.1360 (max= 1.6505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:25,277 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.1360 (max= 1.6505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:25,277 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.1360 (max= 1.6505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:25,277 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.1360 (max= 1.6505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:25,277 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.1360 (max= 1.6505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:25,277 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.1360 (max= 1.6505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:25,277 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.1360 (max= 1.6505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:57,088 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.1300 (max= 1.4751), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:57,088 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.1300 (max= 1.4751), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:57,088 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.1300 (max= 1.4751), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:57,088 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.1300 (max= 1.4751), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:57,088 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.1300 (max= 1.4751), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:57,088 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.1300 (max= 1.4751), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:57,088 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.1300 (max= 1.4751), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:56:57,089 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.1300 (max= 1.4751), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:57:29,013 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.1235 (max= 1.6648), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:57:29,013 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.1235 (max= 1.6648), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:57:29,013 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.1235 (max= 1.6648), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:57:29,013 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.1235 (max= 1.6648), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:57:29,013 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.1235 (max= 1.6648), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:57:29,013 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.1235 (max= 1.6648), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:57:29,013 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.1235 (max= 1.6648), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:57:29,013 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.1235 (max= 1.6648), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-4000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-4000! Save time: 4.486633539199829 +2025-10-25 13:58:00,883 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.1343 (max= 1.7393), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:00,883 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-25 13:58:00,883 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.1343 (max= 1.7393), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:00,883 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:58:00,883 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.1343 (max= 1.7393), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:00,883 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-25 13:58:00,883 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-25 13:58:00,883 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.1343 (max= 1.7393), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:00,883 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:58:00,883 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:58:00,883 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-25 13:58:00,883 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:58:00,883 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.1343 (max= 1.7393), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:00,883 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.1343 (max= 1.7393), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:00,883 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.1343 (max= 1.7393), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:00,883 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.1343 (max= 1.7393), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:00,883 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-25 13:58:00,883 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-25 13:58:00,883 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-25 13:58:00,883 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:58:00,883 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:58:00,883 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:58:00,883 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-25 13:58:00,883 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 13:58:15,937 - root - INFO - Finished saving the checkpoint in 15.05 seconds +2025-10-25 13:58:15,944 - root - INFO - Finished saving the checkpoint in 15.06 seconds +2025-10-25 13:58:15,944 - root - INFO - Finished saving the checkpoint in 15.06 seconds +2025-10-25 13:58:15,944 - root - INFO - Finished saving the checkpoint in 15.06 seconds +2025-10-25 13:58:15,945 - root - INFO - Finished saving the checkpoint in 15.06 seconds +2025-10-25 13:58:15,945 - root - INFO - Finished saving the checkpoint in 15.06 seconds +2025-10-25 13:58:15,945 - root - INFO - Finished saving the checkpoint in 15.06 seconds +2025-10-25 13:58:15,946 - root - INFO - Finished saving the checkpoint in 15.06 seconds +2025-10-25 13:58:47,764 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.1596 (max= 1.6448), tps=13980, mfu=29.13%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:47,764 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.1596 (max= 1.6448), tps=13980, mfu=29.13%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:47,764 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.1596 (max= 1.6448), tps=13980, mfu=29.13%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:47,764 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.1596 (max= 1.6448), tps=13980, mfu=29.13%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:47,764 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.1596 (max= 1.6448), tps=13980, mfu=29.13%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:47,764 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.1596 (max= 1.6448), tps=13980, mfu=29.13%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:47,764 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.1596 (max= 1.6448), tps=13980, mfu=29.13%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:58:47,764 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.1596 (max= 1.6448), tps=13980, mfu=29.13%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:19,592 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.1276 (max= 1.5557), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:19,592 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.1276 (max= 1.5557), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:19,592 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.1276 (max= 1.5557), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:19,592 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.1276 (max= 1.5557), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:19,592 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.1276 (max= 1.5557), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:19,592 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.1276 (max= 1.5557), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:19,592 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.1276 (max= 1.5557), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:19,592 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.1276 (max= 1.5557), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:51,461 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.1514 (max= 1.5503), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:51,461 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.1514 (max= 1.5503), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:51,461 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.1514 (max= 1.5503), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:51,461 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.1514 (max= 1.5503), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:51,461 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.1514 (max= 1.5503), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:51,461 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.1514 (max= 1.5503), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:51,462 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.1514 (max= 1.5503), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 13:59:51,462 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.1514 (max= 1.5503), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:23,265 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.1348 (max= 1.5389), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:23,265 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.1348 (max= 1.5389), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:23,265 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.1348 (max= 1.5389), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:23,265 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.1348 (max= 1.5389), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:23,265 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.1348 (max= 1.5389), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:23,265 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.1348 (max= 1.5389), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:23,265 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.1348 (max= 1.5389), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:23,265 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.1348 (max= 1.5389), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:55,202 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.1547 (max= 1.6122), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:55,202 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.1547 (max= 1.6122), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:55,202 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.1547 (max= 1.6122), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:55,202 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.1547 (max= 1.6122), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:55,202 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.1547 (max= 1.6122), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:55,202 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.1547 (max= 1.6122), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:55,202 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.1547 (max= 1.6122), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:00:55,202 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.1547 (max= 1.6122), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:27,048 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.1466 (max= 1.6434), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:27,048 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.1466 (max= 1.6434), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:27,048 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.1466 (max= 1.6434), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:27,048 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.1466 (max= 1.6434), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:27,048 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.1466 (max= 1.6434), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:27,048 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.1466 (max= 1.6434), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:27,048 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.1466 (max= 1.6434), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:27,048 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.1466 (max= 1.6434), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:58,829 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.1472 (max= 1.7627), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:58,829 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.1472 (max= 1.7627), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:58,829 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.1472 (max= 1.7627), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:58,829 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.1472 (max= 1.7627), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:58,830 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.1472 (max= 1.7627), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:58,830 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.1472 (max= 1.7627), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:58,830 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.1472 (max= 1.7627), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:01:58,830 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.1472 (max= 1.7627), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:02:30,660 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.1335 (max= 1.5083), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:02:30,660 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.1335 (max= 1.5083), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:02:30,660 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.1335 (max= 1.5083), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:02:30,660 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.1335 (max= 1.5083), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:02:30,660 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.1335 (max= 1.5083), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:02:30,660 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.1335 (max= 1.5083), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:02:30,660 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.1335 (max= 1.5083), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:02:30,661 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.1335 (max= 1.5083), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:02,549 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.1462 (max= 1.6077), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:02,550 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.1462 (max= 1.6077), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:02,550 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.1462 (max= 1.6077), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:02,550 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.1462 (max= 1.6077), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:02,550 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.1462 (max= 1.6077), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:02,550 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.1462 (max= 1.6077), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:02,550 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.1462 (max= 1.6077), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:02,550 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.1462 (max= 1.6077), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:34,445 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.1179 (max= 1.5654), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:34,445 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.1179 (max= 1.5654), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:34,445 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.1179 (max= 1.5654), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:34,445 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.1179 (max= 1.5654), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:34,445 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.1179 (max= 1.5654), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:34,446 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.1179 (max= 1.5654), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:34,446 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.1179 (max= 1.5654), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:03:34,446 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.1179 (max= 1.5654), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:06,305 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.1647 (max= 1.5850), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:06,305 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.1647 (max= 1.5850), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:06,305 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.1647 (max= 1.5850), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:06,305 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.1647 (max= 1.5850), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:06,305 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.1647 (max= 1.5850), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:06,306 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.1647 (max= 1.5850), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:06,306 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.1647 (max= 1.5850), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:06,306 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.1647 (max= 1.5850), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:38,188 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.1735 (max= 1.6271), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:38,188 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.1735 (max= 1.6271), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:38,188 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.1735 (max= 1.6271), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:38,188 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.1735 (max= 1.6271), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:38,188 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.1735 (max= 1.6271), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:38,188 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.1735 (max= 1.6271), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:38,188 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.1735 (max= 1.6271), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:04:38,188 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.1735 (max= 1.6271), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:09,977 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.1669 (max= 1.5725), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:09,977 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.1669 (max= 1.5725), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:09,977 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.1669 (max= 1.5725), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:09,977 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.1669 (max= 1.5725), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:09,977 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.1669 (max= 1.5725), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:09,977 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.1669 (max= 1.5725), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:09,977 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.1669 (max= 1.5725), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:09,978 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.1669 (max= 1.5725), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:41,802 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.1532 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:41,802 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.1532 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:41,802 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.1532 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:41,802 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.1532 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:41,802 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.1532 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:41,802 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.1532 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:41,802 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.1532 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:05:41,802 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.1532 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:13,647 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.1349 (max= 1.5055), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:13,647 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.1349 (max= 1.5055), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:13,647 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.1349 (max= 1.5055), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:13,647 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.1349 (max= 1.5055), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:13,647 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.1349 (max= 1.5055), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:13,647 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.1349 (max= 1.5055), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:13,647 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.1349 (max= 1.5055), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:13,647 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.1349 (max= 1.5055), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:45,536 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.1294 (max= 1.5804), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:45,536 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.1294 (max= 1.5804), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:45,536 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.1294 (max= 1.5804), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:45,536 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.1294 (max= 1.5804), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:45,536 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.1294 (max= 1.5804), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:45,536 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.1294 (max= 1.5804), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:45,536 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.1294 (max= 1.5804), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:06:45,536 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.1294 (max= 1.5804), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:17,342 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.1394 (max= 1.5227), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:17,342 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.1394 (max= 1.5227), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:17,342 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.1394 (max= 1.5227), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:17,342 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.1394 (max= 1.5227), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:17,342 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.1394 (max= 1.5227), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:17,342 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.1394 (max= 1.5227), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:17,342 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.1394 (max= 1.5227), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:17,344 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.1394 (max= 1.5227), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:49,182 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.1184 (max= 1.6238), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:49,182 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.1184 (max= 1.6238), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:49,182 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.1184 (max= 1.6238), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:49,182 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.1184 (max= 1.6238), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:49,183 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.1184 (max= 1.6238), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:49,183 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.1184 (max= 1.6238), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:49,183 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.1184 (max= 1.6238), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:07:49,183 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.1184 (max= 1.6238), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:20,999 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.1450 (max= 1.6569), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:20,999 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.1450 (max= 1.6569), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:20,999 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.1450 (max= 1.6569), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:20,999 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.1450 (max= 1.6569), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:20,999 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.1450 (max= 1.6569), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:20,999 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.1450 (max= 1.6569), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:20,999 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.1450 (max= 1.6569), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:20,999 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.1450 (max= 1.6569), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:52,836 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.1352 (max= 1.6530), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:52,837 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.1352 (max= 1.6530), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:52,837 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.1352 (max= 1.6530), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:52,837 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.1352 (max= 1.6530), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:52,837 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.1352 (max= 1.6530), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:52,837 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.1352 (max= 1.6530), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:52,837 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.1352 (max= 1.6530), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:08:52,837 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.1352 (max= 1.6530), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:24,638 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.1412 (max= 1.7043), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:24,638 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.1412 (max= 1.7043), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:24,638 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.1412 (max= 1.7043), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:24,638 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.1412 (max= 1.7043), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:24,638 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.1412 (max= 1.7043), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:24,638 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.1412 (max= 1.7043), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:24,638 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.1412 (max= 1.7043), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:24,638 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.1412 (max= 1.7043), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:56,478 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.1412 (max= 1.5692), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:56,478 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.1412 (max= 1.5692), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:56,479 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.1412 (max= 1.5692), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:56,479 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.1412 (max= 1.5692), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:56,479 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.1412 (max= 1.5692), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:56,479 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.1412 (max= 1.5692), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:56,479 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.1412 (max= 1.5692), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:09:56,479 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.1412 (max= 1.5692), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:10:28,241 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.1148 (max= 1.5319), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:10:28,241 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.1148 (max= 1.5319), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:10:28,241 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.1148 (max= 1.5319), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:10:28,241 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.1148 (max= 1.5319), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:10:28,241 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.1148 (max= 1.5319), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:10:28,241 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.1148 (max= 1.5319), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:10:28,241 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.1148 (max= 1.5319), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:10:28,241 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.1148 (max= 1.5319), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:00,181 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.1327 (max= 1.7264), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:00,181 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.1327 (max= 1.7264), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:00,181 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.1327 (max= 1.7264), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:00,181 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.1327 (max= 1.7264), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:00,181 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.1327 (max= 1.7264), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:00,181 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.1327 (max= 1.7264), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:00,181 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.1327 (max= 1.7264), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:00,182 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.1327 (max= 1.7264), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:32,083 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.1484 (max= 1.5243), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:32,083 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.1484 (max= 1.5243), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:32,083 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.1484 (max= 1.5243), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:32,083 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.1484 (max= 1.5243), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:32,084 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.1484 (max= 1.5243), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:32,084 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.1484 (max= 1.5243), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:32,084 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.1484 (max= 1.5243), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:11:32,084 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.1484 (max= 1.5243), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:03,951 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.1375 (max= 1.5807), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:03,952 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.1375 (max= 1.5807), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:03,952 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.1375 (max= 1.5807), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:03,952 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.1375 (max= 1.5807), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:03,952 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.1375 (max= 1.5807), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:03,952 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.1375 (max= 1.5807), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:03,952 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.1375 (max= 1.5807), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:03,952 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.1375 (max= 1.5807), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:35,731 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.1445 (max= 1.4773), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:35,731 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.1445 (max= 1.4773), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:35,732 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.1445 (max= 1.4773), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:35,732 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.1445 (max= 1.4773), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:35,732 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.1445 (max= 1.4773), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:35,732 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.1445 (max= 1.4773), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:35,732 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.1445 (max= 1.4773), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:12:35,732 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.1445 (max= 1.4773), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:07,628 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.1258 (max= 1.5279), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:07,628 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.1258 (max= 1.5279), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:07,628 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.1258 (max= 1.5279), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:07,628 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.1258 (max= 1.5279), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:07,628 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.1258 (max= 1.5279), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:07,628 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.1258 (max= 1.5279), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:07,629 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.1258 (max= 1.5279), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:07,629 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.1258 (max= 1.5279), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:39,456 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.1391 (max= 1.5908), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:39,457 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.1391 (max= 1.5908), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:39,457 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.1391 (max= 1.5908), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:39,457 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.1391 (max= 1.5908), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:39,457 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.1391 (max= 1.5908), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:39,457 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.1391 (max= 1.5908), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:39,457 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.1391 (max= 1.5908), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:13:39,457 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.1391 (max= 1.5908), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:11,386 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.1620 (max= 1.5990), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:11,387 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.1620 (max= 1.5990), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:11,387 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.1620 (max= 1.5990), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:11,387 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.1620 (max= 1.5990), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:11,387 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.1620 (max= 1.5990), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:11,387 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.1620 (max= 1.5990), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:11,387 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.1620 (max= 1.5990), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:11,387 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.1620 (max= 1.5990), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:43,251 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.1384 (max= 1.5843), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:43,252 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.1384 (max= 1.5843), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:43,252 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.1384 (max= 1.5843), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:43,252 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.1384 (max= 1.5843), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:43,252 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.1384 (max= 1.5843), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:43,252 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.1384 (max= 1.5843), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:43,252 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.1384 (max= 1.5843), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:14:43,252 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.1384 (max= 1.5843), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:15,111 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.1291 (max= 1.6478), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:15,111 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.1291 (max= 1.6478), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:15,111 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.1291 (max= 1.6478), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:15,111 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.1291 (max= 1.6478), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:15,111 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.1291 (max= 1.6478), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:15,111 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.1291 (max= 1.6478), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:15,112 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.1291 (max= 1.6478), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:15,112 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.1291 (max= 1.6478), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:46,917 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.1247 (max= 1.5522), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:46,918 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.1247 (max= 1.5522), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:46,918 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.1247 (max= 1.5522), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:46,918 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.1247 (max= 1.5522), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:46,918 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.1247 (max= 1.5522), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:46,918 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.1247 (max= 1.5522), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:46,918 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.1247 (max= 1.5522), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:15:46,918 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.1247 (max= 1.5522), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:18,714 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.1198 (max= 1.5604), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:18,714 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.1198 (max= 1.5604), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:18,714 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.1198 (max= 1.5604), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:18,714 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.1198 (max= 1.5604), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:18,714 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.1198 (max= 1.5604), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:18,714 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.1198 (max= 1.5604), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:18,714 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.1198 (max= 1.5604), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:18,714 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.1198 (max= 1.5604), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:50,572 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.1288 (max= 1.6701), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:50,572 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.1288 (max= 1.6701), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:50,572 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.1288 (max= 1.6701), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:50,572 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.1288 (max= 1.6701), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:50,573 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.1288 (max= 1.6701), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:50,573 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.1288 (max= 1.6701), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:50,573 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.1288 (max= 1.6701), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:16:50,573 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.1288 (max= 1.6701), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:22,451 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.1026 (max= 1.4985), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:22,452 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.1026 (max= 1.4985), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:22,452 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.1026 (max= 1.4985), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:22,452 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.1026 (max= 1.4985), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:22,452 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.1026 (max= 1.4985), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:22,452 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.1026 (max= 1.4985), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:22,452 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.1026 (max= 1.4985), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:22,452 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.1026 (max= 1.4985), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:54,217 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.1365 (max= 1.5090), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:54,217 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.1365 (max= 1.5090), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:54,217 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.1365 (max= 1.5090), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:54,217 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.1365 (max= 1.5090), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:54,217 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.1365 (max= 1.5090), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:54,217 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.1365 (max= 1.5090), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:54,217 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.1365 (max= 1.5090), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:17:54,217 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.1365 (max= 1.5090), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:26,112 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.1512 (max= 1.5851), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:26,112 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.1512 (max= 1.5851), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:26,112 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.1512 (max= 1.5851), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:26,112 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.1512 (max= 1.5851), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:26,112 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.1512 (max= 1.5851), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:26,112 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.1512 (max= 1.5851), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:26,112 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.1512 (max= 1.5851), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:26,112 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.1512 (max= 1.5851), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:57,959 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.1561 (max= 1.6456), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:57,959 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.1561 (max= 1.6456), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:57,959 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.1561 (max= 1.6456), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:57,959 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.1561 (max= 1.6456), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:57,959 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.1561 (max= 1.6456), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:57,959 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.1561 (max= 1.6456), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:57,959 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.1561 (max= 1.6456), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:18:57,959 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.1561 (max= 1.6456), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:19:29,910 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.1211 (max= 1.7857), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:19:29,910 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.1211 (max= 1.7857), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:19:29,910 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.1211 (max= 1.7857), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:19:29,910 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.1211 (max= 1.7857), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:19:29,911 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.1211 (max= 1.7857), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:19:29,911 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.1211 (max= 1.7857), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:19:29,911 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.1211 (max= 1.7857), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:19:29,911 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.1211 (max= 1.7857), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:01,739 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.1329 (max= 1.5734), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:01,739 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.1329 (max= 1.5734), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:01,739 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.1329 (max= 1.5734), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:01,739 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.1329 (max= 1.5734), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:01,739 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.1329 (max= 1.5734), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:01,739 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.1329 (max= 1.5734), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:01,739 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.1329 (max= 1.5734), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:01,739 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.1329 (max= 1.5734), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:33,638 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.1502 (max= 1.5993), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:33,638 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.1502 (max= 1.5993), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:33,638 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.1502 (max= 1.5993), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:33,638 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.1502 (max= 1.5993), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:33,638 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.1502 (max= 1.5993), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:33,638 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.1502 (max= 1.5993), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:33,638 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.1502 (max= 1.5993), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:20:33,638 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.1502 (max= 1.5993), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:05,492 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.1484 (max= 1.5664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:05,492 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.1484 (max= 1.5664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:05,492 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.1484 (max= 1.5664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:05,492 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.1484 (max= 1.5664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:05,492 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.1484 (max= 1.5664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:05,492 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.1484 (max= 1.5664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:05,492 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.1484 (max= 1.5664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:05,492 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.1484 (max= 1.5664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:37,362 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.1423 (max= 1.7547), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:37,362 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.1423 (max= 1.7547), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:37,362 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.1423 (max= 1.7547), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:37,362 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.1423 (max= 1.7547), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:37,362 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.1423 (max= 1.7547), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:37,362 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.1423 (max= 1.7547), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:37,362 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.1423 (max= 1.7547), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:21:37,363 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.1423 (max= 1.7547), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:09,159 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.1213 (max= 1.5547), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:09,159 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.1213 (max= 1.5547), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:09,159 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.1213 (max= 1.5547), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:09,159 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.1213 (max= 1.5547), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:09,159 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.1213 (max= 1.5547), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:09,159 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.1213 (max= 1.5547), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:09,159 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.1213 (max= 1.5547), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:09,159 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.1213 (max= 1.5547), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:41,110 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.1212 (max= 1.6019), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:41,110 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.1212 (max= 1.6019), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:41,110 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.1212 (max= 1.6019), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:41,111 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.1212 (max= 1.6019), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:41,111 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.1212 (max= 1.6019), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:41,111 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.1212 (max= 1.6019), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:41,111 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.1212 (max= 1.6019), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:22:41,111 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.1212 (max= 1.6019), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:12,953 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.1433 (max= 1.5572), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:12,953 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.1433 (max= 1.5572), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:12,953 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.1433 (max= 1.5572), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:12,953 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.1433 (max= 1.5572), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:12,953 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.1433 (max= 1.5572), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:12,953 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.1433 (max= 1.5572), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:12,954 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.1433 (max= 1.5572), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:12,954 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.1433 (max= 1.5572), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:44,787 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.1322 (max= 1.6425), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:44,787 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.1322 (max= 1.6425), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:44,787 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.1322 (max= 1.6425), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:44,787 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.1322 (max= 1.6425), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:44,788 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.1322 (max= 1.6425), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:44,788 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.1322 (max= 1.6425), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:44,788 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.1322 (max= 1.6425), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:23:44,788 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.1322 (max= 1.6425), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:16,698 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.1094 (max= 1.6013), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:16,698 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.1094 (max= 1.6013), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:16,698 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.1094 (max= 1.6013), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:16,698 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.1094 (max= 1.6013), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:16,698 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.1094 (max= 1.6013), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:16,698 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.1094 (max= 1.6013), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:16,698 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.1094 (max= 1.6013), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:16,698 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.1094 (max= 1.6013), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:39,524 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:4250249 +2025-10-25 14:24:48,474 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.1281 (max= 1.4924), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:48,474 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.1281 (max= 1.4924), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:48,474 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.1281 (max= 1.4924), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:48,474 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.1281 (max= 1.4924), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:48,474 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.1281 (max= 1.4924), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:48,474 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.1281 (max= 1.4924), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:48,474 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.1281 (max= 1.4924), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:24:48,474 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.1281 (max= 1.4924), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:20,332 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.1207 (max= 1.5569), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:20,332 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.1207 (max= 1.5569), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:20,332 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.1207 (max= 1.5569), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:20,332 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.1207 (max= 1.5569), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:20,332 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.1207 (max= 1.5569), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:20,332 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.1207 (max= 1.5569), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:20,332 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.1207 (max= 1.5569), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:20,332 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.1207 (max= 1.5569), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:25:52,227 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.1133 (max= 1.6613), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-25 14:25:52,227 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.1133 (max= 1.6613), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-25 14:25:52,227 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.1133 (max= 1.6613), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-25 14:25:52,227 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.1133 (max= 1.6613), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-25 14:25:52,227 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.1133 (max= 1.6613), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-25 14:25:52,227 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.1133 (max= 1.6613), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-25 14:25:52,227 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.1133 (max= 1.6613), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-25 14:25:52,227 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.1133 (max= 1.6613), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-25 14:26:24,161 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.1381 (max= 1.5797), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:24,161 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.1381 (max= 1.5797), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:24,161 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.1381 (max= 1.5797), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:24,161 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.1381 (max= 1.5797), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:24,161 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.1381 (max= 1.5797), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:24,161 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.1381 (max= 1.5797), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:24,162 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.1381 (max= 1.5797), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:24,162 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.1381 (max= 1.5797), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:56,082 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.1423 (max= 1.7550), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:56,082 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.1423 (max= 1.7550), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:56,083 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.1423 (max= 1.7550), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:56,083 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.1423 (max= 1.7550), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:56,083 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.1423 (max= 1.7550), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:56,083 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.1423 (max= 1.7550), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:56,083 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.1423 (max= 1.7550), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:26:56,083 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.1423 (max= 1.7550), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:27,971 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.1346 (max= 1.8767), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:27,971 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.1346 (max= 1.8767), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:27,971 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.1346 (max= 1.8767), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:27,971 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.1346 (max= 1.8767), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:27,971 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.1346 (max= 1.8767), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:27,972 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.1346 (max= 1.8767), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:27,972 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.1346 (max= 1.8767), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:27,972 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.1346 (max= 1.8767), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:59,725 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.1282 (max= 1.6602), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:59,725 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.1282 (max= 1.6602), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:59,725 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.1282 (max= 1.6602), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:59,725 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.1282 (max= 1.6602), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:59,725 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.1282 (max= 1.6602), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:59,725 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.1282 (max= 1.6602), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:59,725 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.1282 (max= 1.6602), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:27:59,725 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.1282 (max= 1.6602), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:28:31,578 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.1106 (max= 1.8744), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:28:31,578 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.1106 (max= 1.8744), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:28:31,578 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.1106 (max= 1.8744), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:28:31,578 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.1106 (max= 1.8744), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:28:31,578 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.1106 (max= 1.8744), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:28:31,578 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.1106 (max= 1.8744), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:28:31,578 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.1106 (max= 1.8744), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:28:31,578 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.1106 (max= 1.8744), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:03,445 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.1058 (max= 1.5659), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:03,445 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.1058 (max= 1.5659), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:03,445 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.1058 (max= 1.5659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:03,445 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.1058 (max= 1.5659), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:03,445 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.1058 (max= 1.5659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:03,445 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.1058 (max= 1.5659), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:03,445 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.1058 (max= 1.5659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:03,445 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.1058 (max= 1.5659), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:35,313 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.1329 (max= 1.5832), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:35,313 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.1329 (max= 1.5832), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:35,313 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.1329 (max= 1.5832), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:35,313 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.1329 (max= 1.5832), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:35,313 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.1329 (max= 1.5832), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:35,313 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.1329 (max= 1.5832), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:35,313 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.1329 (max= 1.5832), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:29:35,313 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.1329 (max= 1.5832), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:07,162 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.1181 (max= 1.5780), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:07,162 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.1181 (max= 1.5780), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:07,162 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.1181 (max= 1.5780), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:07,162 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.1181 (max= 1.5780), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:07,162 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.1181 (max= 1.5780), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:07,163 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.1181 (max= 1.5780), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:07,163 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.1181 (max= 1.5780), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:07,163 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.1181 (max= 1.5780), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:38,916 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.1242 (max= 1.7929), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:38,916 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.1242 (max= 1.7929), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:38,917 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.1242 (max= 1.7929), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:38,917 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.1242 (max= 1.7929), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:38,917 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.1242 (max= 1.7929), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:38,917 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.1242 (max= 1.7929), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:38,917 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.1242 (max= 1.7929), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:30:38,917 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.1242 (max= 1.7929), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:10,771 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.1202 (max= 1.6289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:10,771 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.1202 (max= 1.6289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:10,771 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.1202 (max= 1.6289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:10,771 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.1202 (max= 1.6289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:10,771 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.1202 (max= 1.6289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:10,771 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.1202 (max= 1.6289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:10,771 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.1202 (max= 1.6289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:10,772 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.1202 (max= 1.6289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:42,589 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.1520 (max= 1.5751), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:42,589 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.1520 (max= 1.5751), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:42,589 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.1520 (max= 1.5751), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:42,590 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.1520 (max= 1.5751), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:42,590 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.1520 (max= 1.5751), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:42,590 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.1520 (max= 1.5751), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:42,590 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.1520 (max= 1.5751), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:31:42,590 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.1520 (max= 1.5751), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:14,450 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.1276 (max= 1.5129), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:14,450 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.1276 (max= 1.5129), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:14,450 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.1276 (max= 1.5129), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:14,450 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.1276 (max= 1.5129), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:14,450 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.1276 (max= 1.5129), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:14,450 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.1276 (max= 1.5129), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:14,450 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.1276 (max= 1.5129), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:14,450 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.1276 (max= 1.5129), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:46,356 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.1210 (max= 1.7729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:46,356 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.1210 (max= 1.7729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:46,356 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.1210 (max= 1.7729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:46,356 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.1210 (max= 1.7729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:46,356 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.1210 (max= 1.7729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:46,356 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.1210 (max= 1.7729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:46,356 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.1210 (max= 1.7729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:32:46,357 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.1210 (max= 1.7729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:18,278 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.1298 (max= 1.6121), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:18,278 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.1298 (max= 1.6121), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:18,278 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.1298 (max= 1.6121), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:18,278 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.1298 (max= 1.6121), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:18,278 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.1298 (max= 1.6121), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:18,278 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.1298 (max= 1.6121), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:18,278 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.1298 (max= 1.6121), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:18,278 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.1298 (max= 1.6121), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:50,173 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.1316 (max= 1.6034), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:50,173 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.1316 (max= 1.6034), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:50,173 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.1316 (max= 1.6034), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:50,173 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.1316 (max= 1.6034), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:50,173 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.1316 (max= 1.6034), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:50,174 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.1316 (max= 1.6034), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:50,174 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.1316 (max= 1.6034), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:33:50,174 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.1316 (max= 1.6034), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:22,038 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.1030 (max= 1.5573), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:22,038 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.1030 (max= 1.5573), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:22,038 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.1030 (max= 1.5573), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:22,038 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.1030 (max= 1.5573), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:22,038 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.1030 (max= 1.5573), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:22,038 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.1030 (max= 1.5573), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:22,038 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.1030 (max= 1.5573), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:22,038 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.1030 (max= 1.5573), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:53,838 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.1363 (max= 1.5691), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:53,838 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.1363 (max= 1.5691), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:53,838 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.1363 (max= 1.5691), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:53,838 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.1363 (max= 1.5691), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:53,838 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.1363 (max= 1.5691), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:53,838 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.1363 (max= 1.5691), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:53,838 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.1363 (max= 1.5691), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:34:53,838 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.1363 (max= 1.5691), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:25,737 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.1329 (max= 1.4882), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:25,737 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.1329 (max= 1.4882), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:25,737 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.1329 (max= 1.4882), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:25,737 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.1329 (max= 1.4882), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:25,737 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.1329 (max= 1.4882), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:25,737 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.1329 (max= 1.4882), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:25,737 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.1329 (max= 1.4882), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:25,737 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.1329 (max= 1.4882), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:57,572 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.1263 (max= 1.5927), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:57,572 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.1263 (max= 1.5927), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:57,572 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.1263 (max= 1.5927), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:57,572 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.1263 (max= 1.5927), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:57,572 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.1263 (max= 1.5927), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:57,572 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.1263 (max= 1.5927), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:57,572 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.1263 (max= 1.5927), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:35:57,572 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.1263 (max= 1.5927), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:36:29,438 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.0870 (max= 1.6062), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:36:29,438 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.0870 (max= 1.6062), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:36:29,438 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.0870 (max= 1.6062), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:36:29,438 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.0870 (max= 1.6062), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:36:29,438 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.0870 (max= 1.6062), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:36:29,438 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.0870 (max= 1.6062), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:36:29,438 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.0870 (max= 1.6062), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:36:29,439 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.0870 (max= 1.6062), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:01,384 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.1280 (max= 1.5720), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:01,384 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.1280 (max= 1.5720), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:01,384 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.1280 (max= 1.5720), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:01,384 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.1280 (max= 1.5720), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:01,384 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.1280 (max= 1.5720), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:01,384 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.1280 (max= 1.5720), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:01,384 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.1280 (max= 1.5720), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:01,384 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.1280 (max= 1.5720), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:33,300 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.1166 (max= 1.6998), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:33,300 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.1166 (max= 1.6998), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:33,300 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.1166 (max= 1.6998), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:33,300 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.1166 (max= 1.6998), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:33,300 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.1166 (max= 1.6998), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:33,300 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.1166 (max= 1.6998), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:33,300 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.1166 (max= 1.6998), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:37:33,300 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.1166 (max= 1.6998), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:05,148 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.1174 (max= 1.5396), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:05,149 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.1174 (max= 1.5396), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:05,149 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.1174 (max= 1.5396), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:05,149 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.1174 (max= 1.5396), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:05,149 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.1174 (max= 1.5396), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:05,149 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.1174 (max= 1.5396), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:05,149 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.1174 (max= 1.5396), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:05,149 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.1174 (max= 1.5396), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:37,037 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.1333 (max= 1.4871), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:37,037 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.1333 (max= 1.4871), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:37,037 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.1333 (max= 1.4871), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:37,037 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.1333 (max= 1.4871), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:37,037 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.1333 (max= 1.4871), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:37,037 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.1333 (max= 1.4871), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:37,037 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.1333 (max= 1.4871), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:38:37,037 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.1333 (max= 1.4871), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:08,971 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.1406 (max= 1.6310), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:08,971 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.1406 (max= 1.6310), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:08,971 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.1406 (max= 1.6310), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:08,971 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.1406 (max= 1.6310), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:08,971 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.1406 (max= 1.6310), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:08,971 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.1406 (max= 1.6310), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:08,971 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.1406 (max= 1.6310), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:08,971 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.1406 (max= 1.6310), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:40,769 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.1064 (max= 1.7322), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:40,769 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.1064 (max= 1.7322), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:40,769 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.1064 (max= 1.7322), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:40,769 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.1064 (max= 1.7322), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:40,770 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.1064 (max= 1.7322), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:40,770 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.1064 (max= 1.7322), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:40,770 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.1064 (max= 1.7322), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:39:40,770 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.1064 (max= 1.7322), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:12,656 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.1475 (max= 1.7633), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:12,656 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.1475 (max= 1.7633), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:12,656 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.1475 (max= 1.7633), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:12,656 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.1475 (max= 1.7633), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:12,656 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.1475 (max= 1.7633), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:12,656 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.1475 (max= 1.7633), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:12,656 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.1475 (max= 1.7633), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:12,656 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.1475 (max= 1.7633), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:44,448 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.1460 (max= 1.6562), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:44,448 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.1460 (max= 1.6562), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:44,448 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.1460 (max= 1.6562), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:44,448 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.1460 (max= 1.6562), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:44,448 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.1460 (max= 1.6562), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:44,448 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.1460 (max= 1.6562), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:44,448 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.1460 (max= 1.6562), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:40:44,448 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.1460 (max= 1.6562), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:16,311 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.1411 (max= 1.5256), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:16,311 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.1411 (max= 1.5256), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:16,311 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.1411 (max= 1.5256), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:16,311 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.1411 (max= 1.5256), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:16,311 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.1411 (max= 1.5256), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:16,312 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.1411 (max= 1.5256), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:16,312 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.1411 (max= 1.5256), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:16,312 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.1411 (max= 1.5256), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:48,242 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.1563 (max= 1.7830), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:48,242 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.1563 (max= 1.7830), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:48,242 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.1563 (max= 1.7830), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:48,242 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.1563 (max= 1.7830), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:48,242 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.1563 (max= 1.7830), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:48,242 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.1563 (max= 1.7830), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:48,242 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.1563 (max= 1.7830), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:41:48,242 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.1563 (max= 1.7830), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:20,118 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.1264 (max= 1.6158), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:20,118 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.1264 (max= 1.6158), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:20,118 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.1264 (max= 1.6158), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:20,118 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.1264 (max= 1.6158), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:20,118 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.1264 (max= 1.6158), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:20,119 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.1264 (max= 1.6158), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:20,119 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.1264 (max= 1.6158), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:20,119 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.1264 (max= 1.6158), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:51,988 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.1185 (max= 1.8180), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:51,988 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.1185 (max= 1.8180), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:51,988 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.1185 (max= 1.8180), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:51,988 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.1185 (max= 1.8180), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:51,988 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.1185 (max= 1.8180), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:51,989 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.1185 (max= 1.8180), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:51,989 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.1185 (max= 1.8180), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:42:51,989 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.1185 (max= 1.8180), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:23,803 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.1470 (max= 1.5279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:23,804 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.1470 (max= 1.5279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:23,804 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.1470 (max= 1.5279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:23,804 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.1470 (max= 1.5279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:23,804 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.1470 (max= 1.5279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:23,804 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.1470 (max= 1.5279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:23,804 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.1470 (max= 1.5279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:23,804 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.1470 (max= 1.5279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:55,668 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.1245 (max= 1.5717), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:55,668 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.1245 (max= 1.5717), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:55,668 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.1245 (max= 1.5717), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:55,668 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.1245 (max= 1.5717), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:55,668 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.1245 (max= 1.5717), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:55,668 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.1245 (max= 1.5717), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:55,668 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.1245 (max= 1.5717), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:43:55,668 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.1245 (max= 1.5717), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:27,522 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.1336 (max= 1.4881), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:27,523 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.1336 (max= 1.4881), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:27,523 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.1336 (max= 1.4881), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:27,523 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.1336 (max= 1.4881), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:27,523 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.1336 (max= 1.4881), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:27,523 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.1336 (max= 1.4881), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:27,523 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.1336 (max= 1.4881), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:27,523 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.1336 (max= 1.4881), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:59,348 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.1439 (max= 1.5868), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:59,348 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.1439 (max= 1.5868), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:59,348 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.1439 (max= 1.5868), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:59,348 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.1439 (max= 1.5868), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:59,348 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.1439 (max= 1.5868), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:59,348 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.1439 (max= 1.5868), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:59,348 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.1439 (max= 1.5868), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:44:59,348 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.1439 (max= 1.5868), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:45:31,183 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.1311 (max= 1.6156), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:45:31,183 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.1311 (max= 1.6156), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:45:31,183 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.1311 (max= 1.6156), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:45:31,183 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.1311 (max= 1.6156), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:45:31,183 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.1311 (max= 1.6156), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:45:31,183 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.1311 (max= 1.6156), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:45:31,183 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.1311 (max= 1.6156), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:45:31,183 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.1311 (max= 1.6156), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:03,040 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.1312 (max= 1.4852), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:03,040 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.1312 (max= 1.4852), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:03,040 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.1312 (max= 1.4852), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:03,040 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.1312 (max= 1.4852), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:03,040 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.1312 (max= 1.4852), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:03,040 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.1312 (max= 1.4852), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:03,040 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.1312 (max= 1.4852), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:03,040 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.1312 (max= 1.4852), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:34,892 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.1620 (max= 1.5473), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:34,892 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.1620 (max= 1.5473), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:34,892 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.1620 (max= 1.5473), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:34,892 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.1620 (max= 1.5473), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:34,892 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.1620 (max= 1.5473), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:34,892 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.1620 (max= 1.5473), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:34,892 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.1620 (max= 1.5473), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:46:34,892 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.1620 (max= 1.5473), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:06,821 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.1343 (max= 1.5891), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:06,821 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.1343 (max= 1.5891), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:06,821 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.1343 (max= 1.5891), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:06,821 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.1343 (max= 1.5891), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:06,821 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.1343 (max= 1.5891), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:06,821 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.1343 (max= 1.5891), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:06,821 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.1343 (max= 1.5891), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:06,821 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.1343 (max= 1.5891), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:38,668 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.1373 (max= 1.6219), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:38,668 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.1373 (max= 1.6219), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:38,668 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.1373 (max= 1.6219), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:38,668 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.1373 (max= 1.6219), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:38,668 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.1373 (max= 1.6219), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:38,668 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.1373 (max= 1.6219), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:38,668 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.1373 (max= 1.6219), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:47:38,668 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.1373 (max= 1.6219), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:10,601 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.1484 (max= 1.8253), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:10,602 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.1484 (max= 1.8253), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:10,602 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.1484 (max= 1.8253), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:10,602 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.1484 (max= 1.8253), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:10,602 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.1484 (max= 1.8253), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:10,602 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.1484 (max= 1.8253), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:10,602 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.1484 (max= 1.8253), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:10,602 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.1484 (max= 1.8253), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:42,510 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.1281 (max= 1.5809), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:42,510 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.1281 (max= 1.5809), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:42,510 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.1281 (max= 1.5809), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:42,510 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.1281 (max= 1.5809), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:42,510 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.1281 (max= 1.5809), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:42,510 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.1281 (max= 1.5809), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:42,510 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.1281 (max= 1.5809), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:48:42,510 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.1281 (max= 1.5809), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:14,397 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.1512 (max= 1.5156), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:14,397 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.1512 (max= 1.5156), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:14,397 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.1512 (max= 1.5156), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:14,397 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.1512 (max= 1.5156), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:14,397 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.1512 (max= 1.5156), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:14,397 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.1512 (max= 1.5156), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:14,397 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.1512 (max= 1.5156), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:14,397 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.1512 (max= 1.5156), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:46,266 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.1611 (max= 1.7091), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:46,266 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.1611 (max= 1.7091), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:46,266 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.1611 (max= 1.7091), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:46,266 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.1611 (max= 1.7091), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:46,266 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.1611 (max= 1.7091), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:46,266 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.1611 (max= 1.7091), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:46,266 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.1611 (max= 1.7091), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:49:46,266 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.1611 (max= 1.7091), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:18,070 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.1606 (max= 1.5809), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:18,070 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.1606 (max= 1.5809), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:18,070 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.1606 (max= 1.5809), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:18,070 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.1606 (max= 1.5809), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:18,070 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.1606 (max= 1.5809), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:18,070 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.1606 (max= 1.5809), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:18,070 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.1606 (max= 1.5809), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:18,070 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.1606 (max= 1.5809), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:49,907 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.1635 (max= 1.6138), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:49,907 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.1635 (max= 1.6138), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:49,907 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.1635 (max= 1.6138), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:49,907 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.1635 (max= 1.6138), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:49,907 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.1635 (max= 1.6138), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:49,907 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.1635 (max= 1.6138), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:49,907 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.1635 (max= 1.6138), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:50:49,907 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.1635 (max= 1.6138), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-5000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-5000! Save time: 4.479858875274658 +2025-10-25 14:51:21,854 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.1551 (max= 1.5560), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:51:21,854 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.1551 (max= 1.5560), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:51:21,854 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-25 14:51:21,854 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 14:51:21,854 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-25 14:51:21,854 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 14:51:21,855 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.1551 (max= 1.5560), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:51:21,855 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.1551 (max= 1.5560), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:51:21,855 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.1551 (max= 1.5560), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:51:21,855 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-25 14:51:21,855 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-25 14:51:21,855 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.1551 (max= 1.5560), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:51:21,855 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 14:51:21,855 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 14:51:21,855 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-25 14:51:21,855 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 14:51:21,855 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-25 14:51:21,855 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 14:51:21,855 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.1551 (max= 1.5560), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:51:21,855 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-25 14:51:21,855 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 14:51:21,855 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.1551 (max= 1.5560), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:51:21,855 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-25 14:51:21,855 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 14:51:36,890 - root - INFO - Finished saving the checkpoint in 15.03 seconds +2025-10-25 14:51:36,898 - root - INFO - Finished saving the checkpoint in 15.04 seconds +2025-10-25 14:51:36,898 - root - INFO - Finished saving the checkpoint in 15.04 seconds +2025-10-25 14:51:36,898 - root - INFO - Finished saving the checkpoint in 15.04 seconds +2025-10-25 14:51:36,899 - root - INFO - Finished saving the checkpoint in 15.04 seconds +2025-10-25 14:51:36,899 - root - INFO - Finished saving the checkpoint in 15.04 seconds +2025-10-25 14:51:36,899 - root - INFO - Finished saving the checkpoint in 15.04 seconds +2025-10-25 14:51:36,899 - root - INFO - Finished saving the checkpoint in 15.04 seconds +2025-10-25 14:52:08,759 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.1534 (max= 1.6897), tps=13973, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:08,760 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.1534 (max= 1.6897), tps=13973, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:08,760 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.1534 (max= 1.6897), tps=13973, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:08,760 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.1534 (max= 1.6897), tps=13973, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:08,760 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.1534 (max= 1.6897), tps=13973, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:08,760 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.1534 (max= 1.6897), tps=13973, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:08,760 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.1534 (max= 1.6897), tps=13973, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:08,760 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.1534 (max= 1.6897), tps=13973, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:40,638 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.1570 (max= 1.7244), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:40,638 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.1570 (max= 1.7244), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:40,638 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.1570 (max= 1.7244), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:40,638 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.1570 (max= 1.7244), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:40,638 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.1570 (max= 1.7244), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:40,638 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.1570 (max= 1.7244), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:40,638 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.1570 (max= 1.7244), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:52:40,638 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.1570 (max= 1.7244), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:12,516 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.1257 (max= 1.6245), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:12,516 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.1257 (max= 1.6245), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:12,516 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.1257 (max= 1.6245), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:12,516 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.1257 (max= 1.6245), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:12,516 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.1257 (max= 1.6245), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:12,516 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.1257 (max= 1.6245), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:12,516 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.1257 (max= 1.6245), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:12,516 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.1257 (max= 1.6245), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:44,350 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.1608 (max= 1.5388), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:44,350 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.1608 (max= 1.5388), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:44,350 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.1608 (max= 1.5388), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:44,350 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.1608 (max= 1.5388), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:44,350 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.1608 (max= 1.5388), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:44,350 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.1608 (max= 1.5388), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:44,350 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.1608 (max= 1.5388), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:53:44,350 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.1608 (max= 1.5388), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:16,183 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.1630 (max= 1.5775), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:16,183 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.1630 (max= 1.5775), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:16,183 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.1630 (max= 1.5775), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:16,184 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.1630 (max= 1.5775), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:16,184 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.1630 (max= 1.5775), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:16,184 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.1630 (max= 1.5775), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:16,184 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.1630 (max= 1.5775), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:16,184 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.1630 (max= 1.5775), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:48,021 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.1492 (max= 1.6991), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:48,021 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.1492 (max= 1.6991), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:48,021 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.1492 (max= 1.6991), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:48,021 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.1492 (max= 1.6991), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:48,021 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.1492 (max= 1.6991), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:48,021 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.1492 (max= 1.6991), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:48,021 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.1492 (max= 1.6991), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:54:48,021 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.1492 (max= 1.6991), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:19,863 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.1498 (max= 1.5785), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:19,863 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.1498 (max= 1.5785), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:19,863 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.1498 (max= 1.5785), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:19,863 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.1498 (max= 1.5785), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:19,863 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.1498 (max= 1.5785), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:19,863 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.1498 (max= 1.5785), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:19,863 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.1498 (max= 1.5785), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:19,864 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.1498 (max= 1.5785), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:51,653 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.1478 (max= 1.6059), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:51,653 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.1478 (max= 1.6059), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:51,653 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.1478 (max= 1.6059), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:51,653 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.1478 (max= 1.6059), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:51,653 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.1478 (max= 1.6059), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:51,653 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.1478 (max= 1.6059), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:51,653 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.1478 (max= 1.6059), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:55:51,653 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.1478 (max= 1.6059), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:23,516 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.1349 (max= 1.5797), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:23,516 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.1349 (max= 1.5797), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:23,516 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.1349 (max= 1.5797), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:23,516 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.1349 (max= 1.5797), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:23,516 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.1349 (max= 1.5797), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:23,516 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.1349 (max= 1.5797), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:23,516 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.1349 (max= 1.5797), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:23,516 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.1349 (max= 1.5797), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:55,407 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.1571 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:55,407 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.1571 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:55,407 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.1571 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:55,407 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.1571 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:55,407 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.1571 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:55,407 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.1571 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:55,407 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.1571 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:56:55,407 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.1571 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:27,305 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.1341 (max= 1.5914), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:27,305 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.1341 (max= 1.5914), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:27,305 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.1341 (max= 1.5914), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:27,305 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.1341 (max= 1.5914), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:27,305 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.1341 (max= 1.5914), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:27,305 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.1341 (max= 1.5914), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:27,305 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.1341 (max= 1.5914), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:27,305 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.1341 (max= 1.5914), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:59,132 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.1407 (max= 1.5201), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:59,132 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.1407 (max= 1.5201), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:59,132 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.1407 (max= 1.5201), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:59,132 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.1407 (max= 1.5201), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:59,132 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.1407 (max= 1.5201), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:59,132 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.1407 (max= 1.5201), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:59,132 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.1407 (max= 1.5201), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:57:59,132 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.1407 (max= 1.5201), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:58:31,062 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.1178 (max= 1.8409), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:58:31,063 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.1178 (max= 1.8409), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:58:31,063 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.1178 (max= 1.8409), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:58:31,063 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.1178 (max= 1.8409), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:58:31,063 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.1178 (max= 1.8409), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:58:31,063 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.1178 (max= 1.8409), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:58:31,063 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.1178 (max= 1.8409), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:58:31,063 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.1178 (max= 1.8409), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:02,954 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.1684 (max= 1.6896), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:02,954 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.1684 (max= 1.6896), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:02,954 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.1684 (max= 1.6896), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:02,954 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.1684 (max= 1.6896), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:02,954 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.1684 (max= 1.6896), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:02,954 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.1684 (max= 1.6896), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:02,954 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.1684 (max= 1.6896), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:02,954 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.1684 (max= 1.6896), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:34,894 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.1626 (max= 1.7580), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:34,894 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.1626 (max= 1.7580), tps=20521, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:34,894 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.1626 (max= 1.7580), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:34,894 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.1626 (max= 1.7580), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:34,894 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.1626 (max= 1.7580), tps=20521, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:34,894 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.1626 (max= 1.7580), tps=20521, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:34,894 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.1626 (max= 1.7580), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 14:59:34,894 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.1626 (max= 1.7580), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:06,719 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.1519 (max= 1.5736), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:06,719 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.1519 (max= 1.5736), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:06,719 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.1519 (max= 1.5736), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:06,719 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.1519 (max= 1.5736), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:06,719 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.1519 (max= 1.5736), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:06,719 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.1519 (max= 1.5736), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:06,719 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.1519 (max= 1.5736), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:06,719 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.1519 (max= 1.5736), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:38,564 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.1404 (max= 1.5668), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:38,564 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.1404 (max= 1.5668), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:38,564 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.1404 (max= 1.5668), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:38,564 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.1404 (max= 1.5668), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:38,564 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.1404 (max= 1.5668), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:38,564 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.1404 (max= 1.5668), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:38,564 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.1404 (max= 1.5668), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:00:38,564 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.1404 (max= 1.5668), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:10,454 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.1561 (max= 1.7778), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:10,455 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.1561 (max= 1.7778), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:10,455 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.1561 (max= 1.7778), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:10,455 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.1561 (max= 1.7778), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:10,455 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.1561 (max= 1.7778), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:10,455 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.1561 (max= 1.7778), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:10,455 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.1561 (max= 1.7778), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:10,455 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.1561 (max= 1.7778), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:42,372 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.1576 (max= 1.5782), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:42,372 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.1576 (max= 1.5782), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:42,372 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.1576 (max= 1.5782), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:42,372 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.1576 (max= 1.5782), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:42,372 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.1576 (max= 1.5782), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:42,372 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.1576 (max= 1.5782), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:42,372 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.1576 (max= 1.5782), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:01:42,372 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.1576 (max= 1.5782), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:14,275 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.1547 (max= 1.6566), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:14,275 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.1547 (max= 1.6566), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:14,275 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.1547 (max= 1.6566), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:14,275 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.1547 (max= 1.6566), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:14,275 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.1547 (max= 1.6566), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:14,275 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.1547 (max= 1.6566), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:14,275 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.1547 (max= 1.6566), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:14,275 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.1547 (max= 1.6566), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:46,079 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.1460 (max= 2.0945), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:46,079 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.1460 (max= 2.0945), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:46,079 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.1460 (max= 2.0945), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:46,079 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.1460 (max= 2.0945), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:46,079 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.1460 (max= 2.0945), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:46,079 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.1460 (max= 2.0945), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:46,079 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.1460 (max= 2.0945), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:02:46,079 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.1460 (max= 2.0945), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:17,913 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.1546 (max= 1.6146), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:17,913 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.1546 (max= 1.6146), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:17,913 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.1546 (max= 1.6146), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:17,913 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.1546 (max= 1.6146), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:17,913 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.1546 (max= 1.6146), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:17,913 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.1546 (max= 1.6146), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:17,913 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.1546 (max= 1.6146), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:17,913 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.1546 (max= 1.6146), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:49,843 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.1440 (max= 1.7494), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:49,843 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.1440 (max= 1.7494), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:49,843 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.1440 (max= 1.7494), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:49,843 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.1440 (max= 1.7494), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:49,843 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.1440 (max= 1.7494), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:49,843 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.1440 (max= 1.7494), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:49,843 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.1440 (max= 1.7494), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:03:49,843 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.1440 (max= 1.7494), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:21,641 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.1411 (max= 1.8632), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:21,641 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.1411 (max= 1.8632), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:21,641 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.1411 (max= 1.8632), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:21,641 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.1411 (max= 1.8632), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:21,641 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.1411 (max= 1.8632), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:21,641 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.1411 (max= 1.8632), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:21,641 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.1411 (max= 1.8632), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:21,641 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.1411 (max= 1.8632), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:53,557 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.1578 (max= 1.6894), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:53,557 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.1578 (max= 1.6894), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:53,557 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.1578 (max= 1.6894), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:53,557 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.1578 (max= 1.6894), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:53,557 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.1578 (max= 1.6894), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:53,557 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.1578 (max= 1.6894), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:53,558 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.1578 (max= 1.6894), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:04:53,558 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.1578 (max= 1.6894), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:25,422 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.1390 (max= 1.7343), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:25,422 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.1390 (max= 1.7343), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:25,422 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.1390 (max= 1.7343), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:25,422 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.1390 (max= 1.7343), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:25,422 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.1390 (max= 1.7343), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:25,422 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.1390 (max= 1.7343), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:25,422 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.1390 (max= 1.7343), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:25,423 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.1390 (max= 1.7343), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:57,278 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.1577 (max= 1.7456), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:57,278 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.1577 (max= 1.7456), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:57,279 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.1577 (max= 1.7456), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:57,279 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.1577 (max= 1.7456), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:57,279 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.1577 (max= 1.7456), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:57,279 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.1577 (max= 1.7456), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:57,279 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.1577 (max= 1.7456), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:05:57,279 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.1577 (max= 1.7456), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:06:29,039 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.1158 (max= 1.5441), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:06:29,039 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.1158 (max= 1.5441), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:06:29,039 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.1158 (max= 1.5441), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:06:29,039 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.1158 (max= 1.5441), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:06:29,039 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.1158 (max= 1.5441), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:06:29,040 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.1158 (max= 1.5441), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:06:29,040 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.1158 (max= 1.5441), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:06:29,040 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.1158 (max= 1.5441), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:00,896 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.1612 (max= 1.5403), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:00,896 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.1612 (max= 1.5403), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:00,896 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.1612 (max= 1.5403), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:00,896 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.1612 (max= 1.5403), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:00,896 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.1612 (max= 1.5403), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:00,896 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.1612 (max= 1.5403), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:00,896 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.1612 (max= 1.5403), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:00,896 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.1612 (max= 1.5403), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:32,781 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.1279 (max= 1.7256), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:32,781 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.1279 (max= 1.7256), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:32,781 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.1279 (max= 1.7256), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:32,781 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.1279 (max= 1.7256), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:32,781 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.1279 (max= 1.7256), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:32,781 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.1279 (max= 1.7256), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:32,781 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.1279 (max= 1.7256), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:07:32,781 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.1279 (max= 1.7256), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:04,623 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.1483 (max= 1.6870), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:04,623 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.1483 (max= 1.6870), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:04,623 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.1483 (max= 1.6870), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:04,623 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.1483 (max= 1.6870), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:04,623 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.1483 (max= 1.6870), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:04,623 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.1483 (max= 1.6870), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:04,623 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.1483 (max= 1.6870), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:04,623 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.1483 (max= 1.6870), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:36,501 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.1551 (max= 1.8044), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:36,501 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.1551 (max= 1.8044), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:36,501 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.1551 (max= 1.8044), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:36,501 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.1551 (max= 1.8044), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:36,501 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.1551 (max= 1.8044), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:36,501 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.1551 (max= 1.8044), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:36,501 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.1551 (max= 1.8044), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:08:36,501 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.1551 (max= 1.8044), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:08,400 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.1389 (max= 1.6784), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:08,400 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.1389 (max= 1.6784), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:08,400 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.1389 (max= 1.6784), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:08,400 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.1389 (max= 1.6784), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:08,400 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.1389 (max= 1.6784), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:08,400 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.1389 (max= 1.6784), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:08,400 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.1389 (max= 1.6784), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:08,400 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.1389 (max= 1.6784), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:40,259 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.1657 (max= 2.0614), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:40,259 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.1657 (max= 2.0614), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:40,259 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.1657 (max= 2.0614), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:40,259 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.1657 (max= 2.0614), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:40,259 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.1657 (max= 2.0614), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:40,259 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.1657 (max= 2.0614), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:40,259 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.1657 (max= 2.0614), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:09:40,259 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.1657 (max= 2.0614), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:12,068 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.1423 (max= 1.5902), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:12,068 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.1423 (max= 1.5902), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:12,068 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.1423 (max= 1.5902), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:12,068 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.1423 (max= 1.5902), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:12,068 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.1423 (max= 1.5902), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:12,068 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.1423 (max= 1.5902), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:12,068 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.1423 (max= 1.5902), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:12,068 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.1423 (max= 1.5902), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:43,919 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.1420 (max= 1.6675), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:43,919 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.1420 (max= 1.6675), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:43,919 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.1420 (max= 1.6675), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:43,919 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.1420 (max= 1.6675), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:43,919 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.1420 (max= 1.6675), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:43,919 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.1420 (max= 1.6675), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:43,919 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.1420 (max= 1.6675), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:10:43,920 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.1420 (max= 1.6675), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:15,838 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.1491 (max= 1.5670), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:15,838 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.1491 (max= 1.5670), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:15,838 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.1491 (max= 1.5670), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:15,838 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.1491 (max= 1.5670), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:15,838 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.1491 (max= 1.5670), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:15,838 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.1491 (max= 1.5670), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:15,839 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.1491 (max= 1.5670), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:15,839 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.1491 (max= 1.5670), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:47,649 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.1422 (max= 1.5447), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:47,649 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.1422 (max= 1.5447), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:47,649 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.1422 (max= 1.5447), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:47,649 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.1422 (max= 1.5447), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:47,649 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.1422 (max= 1.5447), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:47,649 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.1422 (max= 1.5447), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:47,649 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.1422 (max= 1.5447), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:11:47,649 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.1422 (max= 1.5447), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:19,529 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.1423 (max= 1.7791), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:19,529 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.1423 (max= 1.7791), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:19,529 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.1423 (max= 1.7791), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:19,529 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.1423 (max= 1.7791), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:19,529 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.1423 (max= 1.7791), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:19,529 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.1423 (max= 1.7791), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:19,530 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.1423 (max= 1.7791), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:19,530 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.1423 (max= 1.7791), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:51,376 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.1422 (max= 1.6251), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:51,376 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.1422 (max= 1.6251), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:51,376 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.1422 (max= 1.6251), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:51,376 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.1422 (max= 1.6251), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:51,376 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.1422 (max= 1.6251), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:51,376 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.1422 (max= 1.6251), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:51,376 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.1422 (max= 1.6251), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:12:51,377 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.1422 (max= 1.6251), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:23,267 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.1520 (max= 1.6605), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:23,267 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.1520 (max= 1.6605), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:23,267 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.1520 (max= 1.6605), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:23,267 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.1520 (max= 1.6605), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:23,267 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.1520 (max= 1.6605), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:23,267 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.1520 (max= 1.6605), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:23,267 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.1520 (max= 1.6605), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:23,267 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.1520 (max= 1.6605), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:55,182 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.1148 (max= 1.5861), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:55,182 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.1148 (max= 1.5861), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:55,182 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.1148 (max= 1.5861), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:55,182 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.1148 (max= 1.5861), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:55,182 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.1148 (max= 1.5861), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:55,182 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.1148 (max= 1.5861), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:55,182 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.1148 (max= 1.5861), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:13:55,182 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.1148 (max= 1.5861), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:26,928 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.1133 (max= 1.5959), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:26,928 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.1133 (max= 1.5959), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:26,928 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.1133 (max= 1.5959), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:26,928 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.1133 (max= 1.5959), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:26,928 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.1133 (max= 1.5959), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:26,928 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.1133 (max= 1.5959), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:26,928 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.1133 (max= 1.5959), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:26,928 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.1133 (max= 1.5959), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:58,749 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.1258 (max= 1.7835), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:58,749 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.1258 (max= 1.7835), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:58,749 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.1258 (max= 1.7835), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:58,749 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.1258 (max= 1.7835), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:58,749 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.1258 (max= 1.7835), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:58,749 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.1258 (max= 1.7835), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:58,749 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.1258 (max= 1.7835), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:14:58,749 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.1258 (max= 1.7835), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:15:30,593 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.1363 (max= 1.5074), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:15:30,593 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.1363 (max= 1.5074), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:15:30,594 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.1363 (max= 1.5074), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:15:30,594 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.1363 (max= 1.5074), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:15:30,594 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.1363 (max= 1.5074), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:15:30,594 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.1363 (max= 1.5074), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:15:30,594 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.1363 (max= 1.5074), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:15:30,594 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.1363 (max= 1.5074), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:02,449 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.1335 (max= 1.7602), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:02,449 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.1335 (max= 1.7602), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:02,449 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.1335 (max= 1.7602), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:02,449 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.1335 (max= 1.7602), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:02,449 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.1335 (max= 1.7602), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:02,449 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.1335 (max= 1.7602), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:02,449 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.1335 (max= 1.7602), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:02,449 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.1335 (max= 1.7602), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:34,324 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.1392 (max= 1.6769), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:34,324 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.1392 (max= 1.6769), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:34,324 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.1392 (max= 1.6769), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:34,324 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.1392 (max= 1.6769), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:34,325 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.1392 (max= 1.6769), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:34,325 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.1392 (max= 1.6769), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:34,325 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.1392 (max= 1.6769), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:16:34,325 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.1392 (max= 1.6769), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:06,133 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.1403 (max= 1.6751), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:06,133 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.1403 (max= 1.6751), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:06,133 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.1403 (max= 1.6751), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:06,133 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.1403 (max= 1.6751), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:06,133 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.1403 (max= 1.6751), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:06,133 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.1403 (max= 1.6751), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:06,133 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.1403 (max= 1.6751), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:06,133 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.1403 (max= 1.6751), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:37,965 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.1023 (max= 1.5923), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:37,965 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.1023 (max= 1.5923), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:37,965 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.1023 (max= 1.5923), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:37,965 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.1023 (max= 1.5923), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:37,965 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.1023 (max= 1.5923), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:37,965 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.1023 (max= 1.5923), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:37,965 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.1023 (max= 1.5923), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:17:37,966 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.1023 (max= 1.5923), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:09,749 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.0918 (max= 1.5967), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:09,749 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.0918 (max= 1.5967), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:09,749 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.0918 (max= 1.5967), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:09,749 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.0918 (max= 1.5967), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:09,749 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.0918 (max= 1.5967), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:09,749 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.0918 (max= 1.5967), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:09,749 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.0918 (max= 1.5967), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:09,750 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.0918 (max= 1.5967), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:41,536 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.1325 (max= 1.7513), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:41,537 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.1325 (max= 1.7513), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:41,537 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.1325 (max= 1.7513), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:41,537 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.1325 (max= 1.7513), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:41,537 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.1325 (max= 1.7513), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:41,537 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.1325 (max= 1.7513), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:41,537 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.1325 (max= 1.7513), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:18:41,537 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.1325 (max= 1.7513), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:13,379 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.1383 (max= 1.7946), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:13,379 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.1383 (max= 1.7946), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:13,379 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.1383 (max= 1.7946), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:13,379 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.1383 (max= 1.7946), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:13,379 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.1383 (max= 1.7946), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:13,379 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.1383 (max= 1.7946), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:13,379 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.1383 (max= 1.7946), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:13,380 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.1383 (max= 1.7946), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:45,258 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.1404 (max= 1.6002), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:45,258 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.1404 (max= 1.6002), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:45,258 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.1404 (max= 1.6002), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:45,258 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.1404 (max= 1.6002), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:45,258 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.1404 (max= 1.6002), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:45,258 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.1404 (max= 1.6002), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:45,258 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.1404 (max= 1.6002), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:19:45,258 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.1404 (max= 1.6002), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:17,163 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.1135 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:17,164 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.1135 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:17,164 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.1135 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:17,164 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.1135 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:17,164 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.1135 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:17,164 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.1135 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:17,164 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.1135 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:17,164 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.1135 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:49,003 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.1069 (max= 1.5285), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:49,004 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.1069 (max= 1.5285), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:49,004 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.1069 (max= 1.5285), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:49,004 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.1069 (max= 1.5285), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:49,004 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.1069 (max= 1.5285), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:49,004 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.1069 (max= 1.5285), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:49,004 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.1069 (max= 1.5285), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:20:49,004 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.1069 (max= 1.5285), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:20,812 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.1038 (max= 1.5507), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:20,812 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.1038 (max= 1.5507), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:20,812 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.1038 (max= 1.5507), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:20,812 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.1038 (max= 1.5507), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:20,812 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.1038 (max= 1.5507), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:20,812 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.1038 (max= 1.5507), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:20,812 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.1038 (max= 1.5507), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:20,812 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.1038 (max= 1.5507), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:52,666 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.0897 (max= 1.6521), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:52,666 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.0897 (max= 1.6521), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:52,666 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.0897 (max= 1.6521), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:52,666 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.0897 (max= 1.6521), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:52,666 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.0897 (max= 1.6521), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:52,666 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.0897 (max= 1.6521), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:52,666 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.0897 (max= 1.6521), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:21:52,666 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.0897 (max= 1.6521), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:24,485 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.1054 (max= 1.9664), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:24,485 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.1054 (max= 1.9664), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:24,485 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.1054 (max= 1.9664), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:24,485 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.1054 (max= 1.9664), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:24,485 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.1054 (max= 1.9664), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:24,485 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.1054 (max= 1.9664), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:24,485 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.1054 (max= 1.9664), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:24,485 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.1054 (max= 1.9664), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:56,355 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.1196 (max= 1.7241), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:56,355 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.1196 (max= 1.7241), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:56,355 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.1196 (max= 1.7241), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:56,355 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.1196 (max= 1.7241), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:56,355 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.1196 (max= 1.7241), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:56,355 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.1196 (max= 1.7241), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:56,355 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.1196 (max= 1.7241), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:22:56,356 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.1196 (max= 1.7241), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:23:28,280 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.1197 (max= 1.5547), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:23:28,280 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.1197 (max= 1.5547), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:23:28,280 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.1197 (max= 1.5547), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:23:28,280 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.1197 (max= 1.5547), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:23:28,280 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.1197 (max= 1.5547), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:23:28,280 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.1197 (max= 1.5547), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:23:28,280 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.1197 (max= 1.5547), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:23:28,281 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.1197 (max= 1.5547), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:00,183 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.1239 (max= 1.6040), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:00,183 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.1239 (max= 1.6040), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:00,183 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.1239 (max= 1.6040), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:00,183 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.1239 (max= 1.6040), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:00,183 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.1239 (max= 1.6040), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:00,183 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.1239 (max= 1.6040), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:00,183 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.1239 (max= 1.6040), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:00,184 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.1239 (max= 1.6040), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:31,968 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.1178 (max= 1.6207), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:31,968 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.1178 (max= 1.6207), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:31,968 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.1178 (max= 1.6207), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:31,968 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.1178 (max= 1.6207), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:31,968 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.1178 (max= 1.6207), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:31,968 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.1178 (max= 1.6207), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:31,968 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.1178 (max= 1.6207), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:24:31,968 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.1178 (max= 1.6207), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:03,851 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.1083 (max= 1.7000), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:03,851 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.1083 (max= 1.7000), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:03,851 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.1083 (max= 1.7000), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:03,851 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.1083 (max= 1.7000), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:03,851 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.1083 (max= 1.7000), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:03,851 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.1083 (max= 1.7000), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:03,851 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.1083 (max= 1.7000), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:03,851 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.1083 (max= 1.7000), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:35,698 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.1242 (max= 1.6817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:35,698 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.1242 (max= 1.6817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:35,698 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.1242 (max= 1.6817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:35,698 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.1242 (max= 1.6817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:35,698 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.1242 (max= 1.6817), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:35,698 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.1242 (max= 1.6817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:35,698 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.1242 (max= 1.6817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:25:35,698 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.1242 (max= 1.6817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:07,600 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.1278 (max= 1.6011), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:07,601 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.1278 (max= 1.6011), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:07,601 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.1278 (max= 1.6011), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:07,601 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.1278 (max= 1.6011), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:07,601 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.1278 (max= 1.6011), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:07,601 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.1278 (max= 1.6011), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:07,601 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.1278 (max= 1.6011), tps=20545, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:07,601 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.1278 (max= 1.6011), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:39,461 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.1431 (max= 1.7640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:39,461 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.1431 (max= 1.7640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:39,461 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.1431 (max= 1.7640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:39,461 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.1431 (max= 1.7640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:39,461 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.1431 (max= 1.7640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:39,461 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.1431 (max= 1.7640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:39,461 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.1431 (max= 1.7640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:26:39,461 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.1431 (max= 1.7640), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:11,415 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.1626 (max= 1.6665), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:11,415 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.1626 (max= 1.6665), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:11,415 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.1626 (max= 1.6665), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:11,415 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.1626 (max= 1.6665), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:11,415 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.1626 (max= 1.6665), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:11,416 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.1626 (max= 1.6665), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:11,416 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.1626 (max= 1.6665), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:11,416 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.1626 (max= 1.6665), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:43,312 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.1053 (max= 1.5569), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:43,312 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.1053 (max= 1.5569), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:43,312 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.1053 (max= 1.5569), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:43,312 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.1053 (max= 1.5569), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:43,312 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.1053 (max= 1.5569), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:43,312 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.1053 (max= 1.5569), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:43,313 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.1053 (max= 1.5569), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:27:43,313 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.1053 (max= 1.5569), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:15,147 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.1099 (max= 1.5361), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:15,147 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.1099 (max= 1.5361), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:15,147 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.1099 (max= 1.5361), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:15,147 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.1099 (max= 1.5361), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:15,147 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.1099 (max= 1.5361), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:15,147 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.1099 (max= 1.5361), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:15,147 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.1099 (max= 1.5361), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:15,147 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.1099 (max= 1.5361), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:47,038 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.0894 (max= 1.5274), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:47,038 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.0894 (max= 1.5274), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:47,038 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.0894 (max= 1.5274), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:47,038 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.0894 (max= 1.5274), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:47,038 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.0894 (max= 1.5274), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:47,038 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.0894 (max= 1.5274), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:47,038 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.0894 (max= 1.5274), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:28:47,038 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.0894 (max= 1.5274), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:18,887 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.1012 (max= 1.7135), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:18,887 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.1012 (max= 1.7135), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:18,887 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.1012 (max= 1.7135), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:18,887 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.1012 (max= 1.7135), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:18,887 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.1012 (max= 1.7135), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:18,887 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.1012 (max= 1.7135), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:18,887 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.1012 (max= 1.7135), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:18,887 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.1012 (max= 1.7135), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:50,794 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.1186 (max= 1.6440), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:50,794 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.1186 (max= 1.6440), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:50,794 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.1186 (max= 1.6440), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:50,794 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.1186 (max= 1.6440), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:50,794 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.1186 (max= 1.6440), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:50,794 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.1186 (max= 1.6440), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:50,794 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.1186 (max= 1.6440), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:29:50,794 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.1186 (max= 1.6440), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:22,551 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.1439 (max= 1.5668), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:22,551 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.1439 (max= 1.5668), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:22,551 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.1439 (max= 1.5668), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:22,551 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.1439 (max= 1.5668), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:22,551 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.1439 (max= 1.5668), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:22,551 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.1439 (max= 1.5668), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:22,552 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.1439 (max= 1.5668), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:22,552 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.1439 (max= 1.5668), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:24,883 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:1243078 +2025-10-25 15:30:54,369 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.1013 (max= 1.5308), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:54,369 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.1013 (max= 1.5308), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:54,369 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.1013 (max= 1.5308), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:54,369 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.1013 (max= 1.5308), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:54,369 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.1013 (max= 1.5308), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:54,369 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.1013 (max= 1.5308), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:54,369 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.1013 (max= 1.5308), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:30:54,369 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.1013 (max= 1.5308), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:26,250 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.0985 (max= 1.5612), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:26,250 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.0985 (max= 1.5612), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:26,250 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.0985 (max= 1.5612), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:26,250 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.0985 (max= 1.5612), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:26,250 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.0985 (max= 1.5612), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:26,250 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.0985 (max= 1.5612), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:26,250 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.0985 (max= 1.5612), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:26,250 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.0985 (max= 1.5612), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:58,056 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.1307 (max= 1.6020), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:58,056 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.1307 (max= 1.6020), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:58,056 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.1307 (max= 1.6020), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:58,056 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.1307 (max= 1.6020), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:58,056 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.1307 (max= 1.6020), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:58,056 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.1307 (max= 1.6020), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:58,056 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.1307 (max= 1.6020), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:31:58,056 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.1307 (max= 1.6020), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:32:29,893 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.1046 (max= 1.6627), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:32:29,893 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.1046 (max= 1.6627), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:32:29,893 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.1046 (max= 1.6627), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:32:29,893 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.1046 (max= 1.6627), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:32:29,893 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.1046 (max= 1.6627), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:32:29,893 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.1046 (max= 1.6627), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:32:29,893 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.1046 (max= 1.6627), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:32:29,893 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.1046 (max= 1.6627), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:01,703 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.1173 (max= 1.5225), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:01,703 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.1173 (max= 1.5225), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:01,703 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.1173 (max= 1.5225), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:01,703 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.1173 (max= 1.5225), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:01,703 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.1173 (max= 1.5225), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:01,703 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.1173 (max= 1.5225), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:01,703 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.1173 (max= 1.5225), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:01,703 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.1173 (max= 1.5225), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:33,526 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.1158 (max= 1.5353), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:33,526 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.1158 (max= 1.5353), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:33,526 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.1158 (max= 1.5353), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:33,526 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.1158 (max= 1.5353), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:33,526 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.1158 (max= 1.5353), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:33,527 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.1158 (max= 1.5353), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:33,527 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.1158 (max= 1.5353), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:33:33,527 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.1158 (max= 1.5353), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:05,340 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.1033 (max= 1.5976), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:05,340 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.1033 (max= 1.5976), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:05,340 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.1033 (max= 1.5976), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:05,340 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.1033 (max= 1.5976), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:05,340 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.1033 (max= 1.5976), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:05,340 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.1033 (max= 1.5976), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:05,340 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.1033 (max= 1.5976), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:05,340 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.1033 (max= 1.5976), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:37,207 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.1216 (max= 1.6049), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:37,207 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.1216 (max= 1.6049), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:37,207 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.1216 (max= 1.6049), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:37,207 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.1216 (max= 1.6049), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:37,207 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.1216 (max= 1.6049), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:37,207 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.1216 (max= 1.6049), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:37,207 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.1216 (max= 1.6049), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:34:37,207 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.1216 (max= 1.6049), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:09,016 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.1078 (max= 1.4758), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:09,016 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.1078 (max= 1.4758), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:09,016 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.1078 (max= 1.4758), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:09,016 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.1078 (max= 1.4758), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:09,016 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.1078 (max= 1.4758), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:09,016 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.1078 (max= 1.4758), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:09,016 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.1078 (max= 1.4758), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:09,016 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.1078 (max= 1.4758), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:40,885 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.1317 (max= 1.7537), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:40,885 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.1317 (max= 1.7537), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:40,885 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.1317 (max= 1.7537), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:40,885 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.1317 (max= 1.7537), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:40,885 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.1317 (max= 1.7537), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:40,885 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.1317 (max= 1.7537), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:40,885 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.1317 (max= 1.7537), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:35:40,886 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.1317 (max= 1.7537), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:12,739 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.1222 (max= 1.5366), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:12,739 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.1222 (max= 1.5366), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:12,739 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.1222 (max= 1.5366), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:12,739 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.1222 (max= 1.5366), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:12,739 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.1222 (max= 1.5366), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:12,739 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.1222 (max= 1.5366), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:12,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.1222 (max= 1.5366), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:12,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.1222 (max= 1.5366), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:44,587 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.1153 (max= 1.6698), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:44,587 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.1153 (max= 1.6698), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:44,587 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.1153 (max= 1.6698), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:44,587 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.1153 (max= 1.6698), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:44,587 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.1153 (max= 1.6698), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:44,587 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.1153 (max= 1.6698), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:44,587 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.1153 (max= 1.6698), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:36:44,588 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.1153 (max= 1.6698), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:16,420 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.0960 (max= 1.6000), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:16,420 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.0960 (max= 1.6000), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:16,420 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.0960 (max= 1.6000), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:16,420 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.0960 (max= 1.6000), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:16,421 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.0960 (max= 1.6000), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:16,421 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.0960 (max= 1.6000), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:16,421 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.0960 (max= 1.6000), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:16,421 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.0960 (max= 1.6000), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:48,278 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.1244 (max= 1.7182), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:48,278 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.1244 (max= 1.7182), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:48,278 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.1244 (max= 1.7182), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:48,278 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.1244 (max= 1.7182), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:48,278 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.1244 (max= 1.7182), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:48,278 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.1244 (max= 1.7182), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:48,278 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.1244 (max= 1.7182), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:37:48,278 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.1244 (max= 1.7182), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:20,142 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.1288 (max= 1.5109), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:20,142 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.1288 (max= 1.5109), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:20,142 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.1288 (max= 1.5109), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:20,142 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.1288 (max= 1.5109), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:20,142 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.1288 (max= 1.5109), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:20,143 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.1288 (max= 1.5109), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:20,143 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.1288 (max= 1.5109), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:20,143 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.1288 (max= 1.5109), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:52,022 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.1362 (max= 1.6630), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:52,022 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.1362 (max= 1.6630), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:52,022 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.1362 (max= 1.6630), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:52,022 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.1362 (max= 1.6630), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:52,022 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.1362 (max= 1.6630), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:52,022 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.1362 (max= 1.6630), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:52,022 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.1362 (max= 1.6630), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:38:52,022 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.1362 (max= 1.6630), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:23,878 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.1087 (max= 1.5066), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:23,878 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.1087 (max= 1.5066), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:23,878 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.1087 (max= 1.5066), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:23,879 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.1087 (max= 1.5066), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:23,879 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.1087 (max= 1.5066), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:23,879 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.1087 (max= 1.5066), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:23,879 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.1087 (max= 1.5066), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:23,879 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.1087 (max= 1.5066), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:55,777 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.1284 (max= 1.4809), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:55,777 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.1284 (max= 1.4809), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:55,777 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.1284 (max= 1.4809), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:55,777 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.1284 (max= 1.4809), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:55,777 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.1284 (max= 1.4809), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:55,777 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.1284 (max= 1.4809), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:55,777 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.1284 (max= 1.4809), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:39:55,777 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.1284 (max= 1.4809), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:27,637 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.1283 (max= 1.6108), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:27,637 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.1283 (max= 1.6108), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:27,638 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.1283 (max= 1.6108), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:27,638 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.1283 (max= 1.6108), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:27,638 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.1283 (max= 1.6108), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:27,638 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.1283 (max= 1.6108), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:27,638 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.1283 (max= 1.6108), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:27,638 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.1283 (max= 1.6108), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:59,412 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.1291 (max= 1.5598), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:59,412 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.1291 (max= 1.5598), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:59,412 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.1291 (max= 1.5598), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:59,412 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.1291 (max= 1.5598), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:59,412 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.1291 (max= 1.5598), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:59,412 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.1291 (max= 1.5598), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:59,412 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.1291 (max= 1.5598), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:40:59,412 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.1291 (max= 1.5598), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:41:31,223 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.1121 (max= 1.5338), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:41:31,223 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.1121 (max= 1.5338), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:41:31,223 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.1121 (max= 1.5338), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:41:31,223 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.1121 (max= 1.5338), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:41:31,223 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.1121 (max= 1.5338), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:41:31,223 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.1121 (max= 1.5338), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:41:31,223 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.1121 (max= 1.5338), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:41:31,223 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.1121 (max= 1.5338), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:03,068 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.1239 (max= 1.5124), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:03,068 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.1239 (max= 1.5124), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:03,068 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.1239 (max= 1.5124), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:03,068 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.1239 (max= 1.5124), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:03,068 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.1239 (max= 1.5124), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:03,068 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.1239 (max= 1.5124), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:03,068 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.1239 (max= 1.5124), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:03,068 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.1239 (max= 1.5124), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:34,975 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.1083 (max= 1.5448), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:34,975 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.1083 (max= 1.5448), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:34,975 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.1083 (max= 1.5448), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:34,975 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.1083 (max= 1.5448), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:34,975 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.1083 (max= 1.5448), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:34,975 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.1083 (max= 1.5448), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:34,975 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.1083 (max= 1.5448), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:42:34,975 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.1083 (max= 1.5448), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:06,872 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.1401 (max= 1.7393), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:06,872 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.1401 (max= 1.7393), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:06,872 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.1401 (max= 1.7393), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:06,872 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.1401 (max= 1.7393), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:06,872 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.1401 (max= 1.7393), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:06,872 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.1401 (max= 1.7393), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:06,872 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.1401 (max= 1.7393), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:06,872 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.1401 (max= 1.7393), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:38,750 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.1160 (max= 1.6139), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:38,750 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.1160 (max= 1.6139), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:38,750 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.1160 (max= 1.6139), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:38,750 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.1160 (max= 1.6139), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:38,750 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.1160 (max= 1.6139), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:38,750 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.1160 (max= 1.6139), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:38,750 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.1160 (max= 1.6139), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:43:38,750 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.1160 (max= 1.6139), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:10,519 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.1082 (max= 1.6506), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:10,519 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.1082 (max= 1.6506), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:10,519 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.1082 (max= 1.6506), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:10,519 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.1082 (max= 1.6506), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:10,519 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.1082 (max= 1.6506), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:10,519 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.1082 (max= 1.6506), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:10,519 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.1082 (max= 1.6506), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:10,519 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.1082 (max= 1.6506), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-6000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-6000! Save time: 4.465630769729614 +2025-10-25 15:44:42,365 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.0937 (max= 1.5318), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:42,365 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.0937 (max= 1.5318), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:42,365 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.0937 (max= 1.5318), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:42,365 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-25 15:44:42,365 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-25 15:44:42,365 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 15:44:42,365 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 15:44:42,365 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-25 15:44:42,365 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 15:44:42,365 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.0937 (max= 1.5318), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:42,365 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-25 15:44:42,365 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.0937 (max= 1.5318), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:42,365 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 15:44:42,365 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.0937 (max= 1.5318), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:42,365 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.0937 (max= 1.5318), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:42,365 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-25 15:44:42,365 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 15:44:42,365 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-25 15:44:42,365 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-25 15:44:42,365 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 15:44:42,365 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 15:44:42,365 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.0937 (max= 1.5318), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:44:42,365 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-25 15:44:42,365 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 15:44:56,488 - root - INFO - Finished saving the checkpoint in 14.12 seconds +2025-10-25 15:44:56,496 - root - INFO - Finished saving the checkpoint in 14.13 seconds +2025-10-25 15:44:56,496 - root - INFO - Finished saving the checkpoint in 14.13 seconds +2025-10-25 15:44:56,497 - root - INFO - Finished saving the checkpoint in 14.13 seconds +2025-10-25 15:44:56,497 - root - INFO - Finished saving the checkpoint in 14.13 seconds +2025-10-25 15:44:56,497 - root - INFO - Finished saving the checkpoint in 14.13 seconds +2025-10-25 15:44:56,498 - root - INFO - Finished saving the checkpoint in 14.13 seconds +2025-10-25 15:44:56,498 - root - INFO - Finished saving the checkpoint in 14.13 seconds +2025-10-25 15:45:28,335 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.0979 (max= 1.5769), tps=14257, mfu=29.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:45:28,335 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.0979 (max= 1.5769), tps=14257, mfu=29.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:45:28,335 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.0979 (max= 1.5769), tps=14257, mfu=29.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:45:28,336 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.0979 (max= 1.5769), tps=14257, mfu=29.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:45:28,336 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.0979 (max= 1.5769), tps=14257, mfu=29.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:45:28,336 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.0979 (max= 1.5769), tps=14257, mfu=29.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:45:28,336 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.0979 (max= 1.5769), tps=14257, mfu=29.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:45:28,336 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.0979 (max= 1.5769), tps=14257, mfu=29.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:00,248 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.1105 (max= 1.4989), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:00,248 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.1105 (max= 1.4989), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:00,248 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.1105 (max= 1.4989), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:00,248 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.1105 (max= 1.4989), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:00,249 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.1105 (max= 1.4989), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:00,249 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.1105 (max= 1.4989), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:00,249 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.1105 (max= 1.4989), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:00,249 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.1105 (max= 1.4989), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:32,131 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.1124 (max= 1.5030), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:32,131 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.1124 (max= 1.5030), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:32,131 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.1124 (max= 1.5030), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:32,131 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.1124 (max= 1.5030), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:32,131 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.1124 (max= 1.5030), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:32,132 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.1124 (max= 1.5030), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:32,132 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.1124 (max= 1.5030), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:46:32,132 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.1124 (max= 1.5030), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:04,060 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.0976 (max= 1.4391), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:04,060 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.0976 (max= 1.4391), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:04,060 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.0976 (max= 1.4391), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:04,060 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.0976 (max= 1.4391), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:04,060 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.0976 (max= 1.4391), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:04,060 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.0976 (max= 1.4391), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:04,060 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.0976 (max= 1.4391), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:04,060 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.0976 (max= 1.4391), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:35,872 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.1136 (max= 1.6462), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:35,872 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.1136 (max= 1.6462), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:35,872 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.1136 (max= 1.6462), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:35,872 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.1136 (max= 1.6462), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:35,872 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.1136 (max= 1.6462), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:35,872 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.1136 (max= 1.6462), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:35,872 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.1136 (max= 1.6462), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:47:35,872 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.1136 (max= 1.6462), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:07,678 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.0997 (max= 1.5186), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:07,678 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.0997 (max= 1.5186), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:07,678 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.0997 (max= 1.5186), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:07,678 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.0997 (max= 1.5186), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:07,678 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.0997 (max= 1.5186), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:07,678 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.0997 (max= 1.5186), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:07,678 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.0997 (max= 1.5186), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:07,678 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.0997 (max= 1.5186), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:22,748 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:7133860 +2025-10-25 15:48:39,547 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.1203 (max= 1.5399), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:39,547 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.1203 (max= 1.5399), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:39,547 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.1203 (max= 1.5399), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:39,548 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.1203 (max= 1.5399), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:39,548 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.1203 (max= 1.5399), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:39,548 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.1203 (max= 1.5399), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:39,548 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.1203 (max= 1.5399), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:48:39,548 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.1203 (max= 1.5399), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:11,450 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.1221 (max= 1.6944), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:11,450 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.1221 (max= 1.6944), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:11,450 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.1221 (max= 1.6944), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:11,450 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.1221 (max= 1.6944), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:11,450 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.1221 (max= 1.6944), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:11,450 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.1221 (max= 1.6944), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:11,450 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.1221 (max= 1.6944), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:11,450 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.1221 (max= 1.6944), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:43,257 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.1271 (max= 1.5475), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:43,257 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.1271 (max= 1.5475), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:43,257 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.1271 (max= 1.5475), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:43,257 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.1271 (max= 1.5475), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:43,258 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.1271 (max= 1.5475), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:43,258 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.1271 (max= 1.5475), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:43,258 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.1271 (max= 1.5475), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:49:43,258 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.1271 (max= 1.5475), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:15,133 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.1091 (max= 1.5951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:15,133 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.1091 (max= 1.5951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:15,133 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.1091 (max= 1.5951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:15,133 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.1091 (max= 1.5951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:15,133 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.1091 (max= 1.5951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:15,133 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.1091 (max= 1.5951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:15,133 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.1091 (max= 1.5951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:15,133 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.1091 (max= 1.5951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:47,043 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.1222 (max= 1.6485), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:47,043 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.1222 (max= 1.6485), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:47,043 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.1222 (max= 1.6485), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:47,043 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.1222 (max= 1.6485), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:47,043 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.1222 (max= 1.6485), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:47,043 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.1222 (max= 1.6485), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:47,043 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.1222 (max= 1.6485), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:50:47,043 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.1222 (max= 1.6485), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:18,888 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.1138 (max= 1.5540), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:18,888 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.1138 (max= 1.5540), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:18,888 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.1138 (max= 1.5540), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:18,888 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.1138 (max= 1.5540), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:18,888 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.1138 (max= 1.5540), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:18,888 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.1138 (max= 1.5540), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:18,888 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.1138 (max= 1.5540), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:18,888 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.1138 (max= 1.5540), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:50,764 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.1311 (max= 1.6260), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:50,764 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.1311 (max= 1.6260), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:50,764 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.1311 (max= 1.6260), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:50,764 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.1311 (max= 1.6260), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:50,764 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.1311 (max= 1.6260), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:50,764 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.1311 (max= 1.6260), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:50,765 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.1311 (max= 1.6260), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:51:50,765 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.1311 (max= 1.6260), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:22,653 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.0821 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:22,654 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.0821 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:22,654 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.0821 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:22,654 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.0821 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:22,654 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.0821 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:22,654 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.0821 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:22,654 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.0821 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:22,654 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.0821 (max= 1.5716), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:54,505 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.0920 (max= 1.6232), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:54,505 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.0920 (max= 1.6232), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:54,505 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.0920 (max= 1.6232), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:54,505 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.0920 (max= 1.6232), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:54,505 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.0920 (max= 1.6232), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:54,505 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.0920 (max= 1.6232), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:54,505 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.0920 (max= 1.6232), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:52:54,505 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.0920 (max= 1.6232), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:26,392 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.1544 (max= 1.7880), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:26,392 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.1544 (max= 1.7880), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:26,392 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.1544 (max= 1.7880), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:26,392 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.1544 (max= 1.7880), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:26,392 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.1544 (max= 1.7880), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:26,392 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.1544 (max= 1.7880), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:26,392 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.1544 (max= 1.7880), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:26,392 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.1544 (max= 1.7880), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:58,224 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.1204 (max= 1.5246), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:58,224 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.1204 (max= 1.5246), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:58,224 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.1204 (max= 1.5246), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:58,224 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.1204 (max= 1.5246), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:58,224 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.1204 (max= 1.5246), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:58,224 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.1204 (max= 1.5246), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:58,224 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.1204 (max= 1.5246), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:53:58,225 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.1204 (max= 1.5246), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:54:30,100 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.1044 (max= 1.5358), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:54:30,100 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.1044 (max= 1.5358), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:54:30,101 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.1044 (max= 1.5358), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:54:30,101 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.1044 (max= 1.5358), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:54:30,101 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.1044 (max= 1.5358), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:54:30,101 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.1044 (max= 1.5358), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:54:30,101 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.1044 (max= 1.5358), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:54:30,101 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.1044 (max= 1.5358), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:01,948 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.1141 (max= 1.8148), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:01,948 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.1141 (max= 1.8148), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:01,948 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.1141 (max= 1.8148), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:01,948 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.1141 (max= 1.8148), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:01,948 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.1141 (max= 1.8148), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:01,948 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.1141 (max= 1.8148), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:01,948 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.1141 (max= 1.8148), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:01,948 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.1141 (max= 1.8148), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:33,802 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.1030 (max= 1.5079), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:33,802 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.1030 (max= 1.5079), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:33,802 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.1030 (max= 1.5079), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:33,802 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.1030 (max= 1.5079), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:33,802 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.1030 (max= 1.5079), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:33,802 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.1030 (max= 1.5079), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:33,802 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.1030 (max= 1.5079), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:55:33,803 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.1030 (max= 1.5079), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:05,671 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.1095 (max= 1.6153), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:05,671 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.1095 (max= 1.6153), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:05,671 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.1095 (max= 1.6153), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:05,671 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.1095 (max= 1.6153), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:05,671 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.1095 (max= 1.6153), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:05,671 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.1095 (max= 1.6153), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:05,671 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.1095 (max= 1.6153), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:05,671 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.1095 (max= 1.6153), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:37,501 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.1391 (max= 1.5889), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:37,501 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.1391 (max= 1.5889), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:37,501 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.1391 (max= 1.5889), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:37,501 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.1391 (max= 1.5889), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:37,501 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.1391 (max= 1.5889), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:37,501 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.1391 (max= 1.5889), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:37,501 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.1391 (max= 1.5889), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:56:37,501 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.1391 (max= 1.5889), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:09,284 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.1036 (max= 1.5432), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:09,284 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.1036 (max= 1.5432), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:09,284 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.1036 (max= 1.5432), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:09,284 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.1036 (max= 1.5432), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:09,284 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.1036 (max= 1.5432), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:09,284 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.1036 (max= 1.5432), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:09,284 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.1036 (max= 1.5432), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:09,284 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.1036 (max= 1.5432), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:41,266 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.0977 (max= 1.6310), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:41,266 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.0977 (max= 1.6310), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:41,266 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.0977 (max= 1.6310), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:41,266 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.0977 (max= 1.6310), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:41,266 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.0977 (max= 1.6310), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:41,266 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.0977 (max= 1.6310), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:41,266 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.0977 (max= 1.6310), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:57:41,267 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.0977 (max= 1.6310), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:13,117 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.1006 (max= 1.6060), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:13,117 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.1006 (max= 1.6060), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:13,117 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.1006 (max= 1.6060), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:13,118 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.1006 (max= 1.6060), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:13,118 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.1006 (max= 1.6060), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:13,118 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.1006 (max= 1.6060), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:13,118 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.1006 (max= 1.6060), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:13,118 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.1006 (max= 1.6060), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:44,976 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.1251 (max= 1.4977), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:44,976 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.1251 (max= 1.4977), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:44,976 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.1251 (max= 1.4977), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:44,976 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.1251 (max= 1.4977), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:44,976 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.1251 (max= 1.4977), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:44,976 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.1251 (max= 1.4977), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:44,976 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.1251 (max= 1.4977), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:58:44,977 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.1251 (max= 1.4977), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:16,847 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.1096 (max= 1.5814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:16,848 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.1096 (max= 1.5814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:16,848 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.1096 (max= 1.5814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:16,848 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.1096 (max= 1.5814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:16,848 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.1096 (max= 1.5814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:16,848 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.1096 (max= 1.5814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:16,848 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.1096 (max= 1.5814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:16,848 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.1096 (max= 1.5814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:48,663 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.1202 (max= 1.6087), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:48,663 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.1202 (max= 1.6087), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:48,663 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.1202 (max= 1.6087), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:48,663 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.1202 (max= 1.6087), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:48,663 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.1202 (max= 1.6087), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:48,663 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.1202 (max= 1.6087), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:48,664 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.1202 (max= 1.6087), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 15:59:48,664 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.1202 (max= 1.6087), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:20,469 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.1213 (max= 1.5515), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:20,469 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.1213 (max= 1.5515), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:20,469 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.1213 (max= 1.5515), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:20,470 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.1213 (max= 1.5515), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:20,470 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.1213 (max= 1.5515), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:20,470 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.1213 (max= 1.5515), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:20,470 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.1213 (max= 1.5515), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:20,470 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.1213 (max= 1.5515), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:52,328 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.1153 (max= 1.5945), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:52,328 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.1153 (max= 1.5945), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:52,328 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.1153 (max= 1.5945), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:52,328 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.1153 (max= 1.5945), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:52,328 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.1153 (max= 1.5945), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:52,328 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.1153 (max= 1.5945), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:52,328 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.1153 (max= 1.5945), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:00:52,328 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.1153 (max= 1.5945), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:24,213 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.1251 (max= 1.6221), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:24,213 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.1251 (max= 1.6221), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:24,213 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.1251 (max= 1.6221), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:24,213 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.1251 (max= 1.6221), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:24,213 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.1251 (max= 1.6221), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:24,213 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.1251 (max= 1.6221), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:24,213 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.1251 (max= 1.6221), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:24,213 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.1251 (max= 1.6221), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:01:56,077 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.1331 (max= 1.6081), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.05s, 3.12%) +2025-10-25 16:01:56,077 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.1331 (max= 1.6081), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.05s, 3.12%) +2025-10-25 16:01:56,077 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.1331 (max= 1.6081), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.05s, 3.12%) +2025-10-25 16:01:56,077 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.1331 (max= 1.6081), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.05s, 3.12%) +2025-10-25 16:01:56,077 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.1331 (max= 1.6081), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.05s, 3.12%) +2025-10-25 16:01:56,077 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.1331 (max= 1.6081), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.05s, 3.12%) +2025-10-25 16:01:56,077 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.1331 (max= 1.6081), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.05s, 3.12%) +2025-10-25 16:01:56,077 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.1331 (max= 1.6081), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.05s, 3.12%) +2025-10-25 16:02:27,909 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.1115 (max= 1.5955), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:27,909 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.1115 (max= 1.5955), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:27,909 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.1115 (max= 1.5955), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:27,909 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.1115 (max= 1.5955), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:27,909 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.1115 (max= 1.5955), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:27,909 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.1115 (max= 1.5955), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:27,909 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.1115 (max= 1.5955), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:27,910 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.1115 (max= 1.5955), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:59,817 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.0996 (max= 1.5364), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:59,817 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.0996 (max= 1.5364), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:59,817 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.0996 (max= 1.5364), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:59,817 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.0996 (max= 1.5364), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:59,817 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.0996 (max= 1.5364), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:59,818 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.0996 (max= 1.5364), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:59,819 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.0996 (max= 1.5364), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:02:59,819 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.0996 (max= 1.5364), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:03:31,737 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.1265 (max= 1.6298), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:03:31,737 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.1265 (max= 1.6298), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:03:31,737 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.1265 (max= 1.6298), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:03:31,737 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.1265 (max= 1.6298), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:03:31,737 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.1265 (max= 1.6298), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:03:31,737 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.1265 (max= 1.6298), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:03:31,737 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.1265 (max= 1.6298), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:03:31,737 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.1265 (max= 1.6298), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:03,646 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.1132 (max= 1.4393), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:03,647 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.1132 (max= 1.4393), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:03,647 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.1132 (max= 1.4393), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:03,647 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.1132 (max= 1.4393), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:03,647 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.1132 (max= 1.4393), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:03,647 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.1132 (max= 1.4393), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:03,647 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.1132 (max= 1.4393), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:03,647 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.1132 (max= 1.4393), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:12,358 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:7058825 +2025-10-25 16:04:35,469 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.1258 (max= 1.5994), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:35,469 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.1258 (max= 1.5994), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:35,469 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.1258 (max= 1.5994), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:35,469 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.1258 (max= 1.5994), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:35,469 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.1258 (max= 1.5994), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:35,469 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.1258 (max= 1.5994), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:35,469 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.1258 (max= 1.5994), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:04:35,469 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.1258 (max= 1.5994), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:07,308 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.1154 (max= 1.5696), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:07,308 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.1154 (max= 1.5696), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:07,308 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.1154 (max= 1.5696), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:07,308 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.1154 (max= 1.5696), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:07,308 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.1154 (max= 1.5696), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:07,308 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.1154 (max= 1.5696), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:07,308 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.1154 (max= 1.5696), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:07,310 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.1154 (max= 1.5696), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:39,139 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.1237 (max= 1.6159), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:39,139 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.1237 (max= 1.6159), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:39,139 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.1237 (max= 1.6159), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:39,139 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.1237 (max= 1.6159), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:39,139 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.1237 (max= 1.6159), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:39,139 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.1237 (max= 1.6159), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:39,139 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.1237 (max= 1.6159), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:05:39,139 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.1237 (max= 1.6159), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:10,976 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.1200 (max= 1.7747), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:10,976 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.1200 (max= 1.7747), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:10,976 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.1200 (max= 1.7747), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:10,976 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.1200 (max= 1.7747), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:10,976 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.1200 (max= 1.7747), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:10,976 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.1200 (max= 1.7747), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:10,977 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.1200 (max= 1.7747), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:10,977 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.1200 (max= 1.7747), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:42,728 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.1082 (max= 1.4954), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:42,728 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.1082 (max= 1.4954), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:42,728 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.1082 (max= 1.4954), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:42,728 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.1082 (max= 1.4954), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:42,728 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.1082 (max= 1.4954), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:42,728 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.1082 (max= 1.4954), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:42,728 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.1082 (max= 1.4954), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:06:42,728 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.1082 (max= 1.4954), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:14,547 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.1376 (max= 1.5530), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:14,547 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.1376 (max= 1.5530), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:14,547 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.1376 (max= 1.5530), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:14,547 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.1376 (max= 1.5530), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:14,548 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.1376 (max= 1.5530), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:14,548 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.1376 (max= 1.5530), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:14,548 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.1376 (max= 1.5530), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:14,548 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.1376 (max= 1.5530), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:46,375 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.1243 (max= 1.5737), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:46,375 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.1243 (max= 1.5737), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:46,375 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.1243 (max= 1.5737), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:46,375 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.1243 (max= 1.5737), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:46,375 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.1243 (max= 1.5737), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:46,375 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.1243 (max= 1.5737), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:46,375 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.1243 (max= 1.5737), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:07:46,375 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.1243 (max= 1.5737), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:18,213 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.1184 (max= 1.5625), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:18,213 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.1184 (max= 1.5625), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:18,213 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.1184 (max= 1.5625), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:18,213 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.1184 (max= 1.5625), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:18,213 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.1184 (max= 1.5625), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:18,213 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.1184 (max= 1.5625), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:18,213 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.1184 (max= 1.5625), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:18,213 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.1184 (max= 1.5625), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:50,067 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.1216 (max= 1.4931), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:50,067 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.1216 (max= 1.4931), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:50,067 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.1216 (max= 1.4931), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:50,067 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.1216 (max= 1.4931), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:50,067 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.1216 (max= 1.4931), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:50,067 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.1216 (max= 1.4931), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:50,067 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.1216 (max= 1.4931), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:08:50,069 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.1216 (max= 1.4931), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:21,962 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.1255 (max= 1.6060), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:21,962 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.1255 (max= 1.6060), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:21,962 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.1255 (max= 1.6060), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:21,962 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.1255 (max= 1.6060), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:21,962 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.1255 (max= 1.6060), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:21,962 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.1255 (max= 1.6060), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:21,962 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.1255 (max= 1.6060), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:21,962 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.1255 (max= 1.6060), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:53,841 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.1300 (max= 1.5188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:53,841 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.1300 (max= 1.5188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:53,841 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.1300 (max= 1.5188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:53,841 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.1300 (max= 1.5188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:53,841 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.1300 (max= 1.5188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:53,841 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.1300 (max= 1.5188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:53,841 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.1300 (max= 1.5188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:09:53,842 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.1300 (max= 1.5188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:25,611 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.1477 (max= 1.6847), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:25,611 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.1477 (max= 1.6847), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:25,611 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.1477 (max= 1.6847), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:25,611 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.1477 (max= 1.6847), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:25,611 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.1477 (max= 1.6847), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:25,611 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.1477 (max= 1.6847), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:25,611 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.1477 (max= 1.6847), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:25,611 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.1477 (max= 1.6847), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:37,465 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5252136 +2025-10-25 16:10:57,480 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.1328 (max= 1.6822), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:57,480 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.1328 (max= 1.6822), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:57,480 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.1328 (max= 1.6822), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:57,480 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.1328 (max= 1.6822), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:57,480 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.1328 (max= 1.6822), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:57,480 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.1328 (max= 1.6822), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:57,480 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.1328 (max= 1.6822), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:10:57,480 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.1328 (max= 1.6822), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:11:29,259 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.1123 (max= 1.5477), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:11:29,259 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.1123 (max= 1.5477), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:11:29,259 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.1123 (max= 1.5477), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:11:29,259 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.1123 (max= 1.5477), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:11:29,259 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.1123 (max= 1.5477), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:11:29,259 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.1123 (max= 1.5477), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:11:29,259 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.1123 (max= 1.5477), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:11:29,260 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.1123 (max= 1.5477), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:01,067 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.1393 (max= 1.6908), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:01,067 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.1393 (max= 1.6908), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:01,067 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.1393 (max= 1.6908), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:01,067 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.1393 (max= 1.6908), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:01,067 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.1393 (max= 1.6908), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:01,067 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.1393 (max= 1.6908), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:01,067 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.1393 (max= 1.6908), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:01,067 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.1393 (max= 1.6908), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:32,988 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.1292 (max= 1.6258), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:32,988 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.1292 (max= 1.6258), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:32,988 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.1292 (max= 1.6258), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:32,988 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.1292 (max= 1.6258), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:32,988 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.1292 (max= 1.6258), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:32,988 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.1292 (max= 1.6258), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:32,988 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.1292 (max= 1.6258), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:12:32,988 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.1292 (max= 1.6258), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:04,828 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.1078 (max= 1.5103), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:04,828 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.1078 (max= 1.5103), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:04,828 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.1078 (max= 1.5103), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:04,828 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.1078 (max= 1.5103), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:04,828 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.1078 (max= 1.5103), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:04,828 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.1078 (max= 1.5103), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:04,828 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.1078 (max= 1.5103), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:04,828 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.1078 (max= 1.5103), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:16,693 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5894016 +2025-10-25 16:13:36,730 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.1084 (max= 1.8184), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:36,730 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.1084 (max= 1.8184), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:36,730 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.1084 (max= 1.8184), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:36,730 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.1084 (max= 1.8184), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:36,730 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.1084 (max= 1.8184), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:36,730 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.1084 (max= 1.8184), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:36,730 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.1084 (max= 1.8184), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:13:36,730 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.1084 (max= 1.8184), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:08,603 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.1155 (max= 1.5381), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:08,603 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.1155 (max= 1.5381), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:08,603 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.1155 (max= 1.5381), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:08,603 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.1155 (max= 1.5381), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:08,604 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.1155 (max= 1.5381), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:08,604 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.1155 (max= 1.5381), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:08,604 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.1155 (max= 1.5381), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:08,604 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.1155 (max= 1.5381), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:40,395 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.1093 (max= 1.5994), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:40,395 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.1093 (max= 1.5994), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:40,396 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.1093 (max= 1.5994), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:40,396 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.1093 (max= 1.5994), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:40,396 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.1093 (max= 1.5994), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:40,396 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.1093 (max= 1.5994), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:40,396 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.1093 (max= 1.5994), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:14:40,396 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.1093 (max= 1.5994), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:12,237 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.1297 (max= 1.5773), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:12,237 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.1297 (max= 1.5773), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:12,237 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.1297 (max= 1.5773), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:12,237 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.1297 (max= 1.5773), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:12,237 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.1297 (max= 1.5773), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:12,237 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.1297 (max= 1.5773), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:12,237 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.1297 (max= 1.5773), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:12,237 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.1297 (max= 1.5773), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:44,016 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.1232 (max= 1.5938), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:44,016 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.1232 (max= 1.5938), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:44,016 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.1232 (max= 1.5938), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:44,016 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.1232 (max= 1.5938), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:44,016 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.1232 (max= 1.5938), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:44,017 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.1232 (max= 1.5938), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:44,017 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.1232 (max= 1.5938), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:15:44,017 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.1232 (max= 1.5938), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:15,916 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.1672 (max= 1.6385), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:15,916 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.1672 (max= 1.6385), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:15,916 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.1672 (max= 1.6385), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:15,916 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.1672 (max= 1.6385), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:15,916 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.1672 (max= 1.6385), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:15,916 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.1672 (max= 1.6385), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:15,916 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.1672 (max= 1.6385), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:15,916 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.1672 (max= 1.6385), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:47,694 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.1167 (max= 1.5337), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:47,694 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.1167 (max= 1.5337), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:47,694 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.1167 (max= 1.5337), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:47,694 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.1167 (max= 1.5337), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:47,694 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.1167 (max= 1.5337), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:47,694 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.1167 (max= 1.5337), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:47,694 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.1167 (max= 1.5337), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:16:47,694 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.1167 (max= 1.5337), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:19,575 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.1373 (max= 1.5563), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:19,575 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.1373 (max= 1.5563), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:19,575 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.1373 (max= 1.5563), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:19,575 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.1373 (max= 1.5563), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:19,575 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.1373 (max= 1.5563), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:19,575 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.1373 (max= 1.5563), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:19,575 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.1373 (max= 1.5563), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:19,575 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.1373 (max= 1.5563), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:48,736 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5382151 +2025-10-25 16:17:51,364 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.1335 (max= 1.5804), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:51,364 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.1335 (max= 1.5804), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:51,364 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.1335 (max= 1.5804), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:51,364 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.1335 (max= 1.5804), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:51,364 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.1335 (max= 1.5804), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:51,364 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.1335 (max= 1.5804), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:51,364 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.1335 (max= 1.5804), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:17:51,364 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.1335 (max= 1.5804), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:23,222 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.1304 (max= 1.5363), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:23,222 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.1304 (max= 1.5363), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:23,222 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.1304 (max= 1.5363), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:23,222 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.1304 (max= 1.5363), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:23,222 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.1304 (max= 1.5363), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:23,222 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.1304 (max= 1.5363), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:23,222 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.1304 (max= 1.5363), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:23,222 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.1304 (max= 1.5363), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:55,117 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.1151 (max= 1.5149), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:55,117 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.1151 (max= 1.5149), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:55,117 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.1151 (max= 1.5149), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:55,117 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.1151 (max= 1.5149), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:55,117 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.1151 (max= 1.5149), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:55,117 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.1151 (max= 1.5149), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:55,117 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.1151 (max= 1.5149), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:18:55,117 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.1151 (max= 1.5149), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:26,963 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.1254 (max= 1.6812), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:26,963 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.1254 (max= 1.6812), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:26,963 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.1254 (max= 1.6812), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:26,963 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.1254 (max= 1.6812), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:26,963 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.1254 (max= 1.6812), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:26,963 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.1254 (max= 1.6812), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:26,963 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.1254 (max= 1.6812), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:26,963 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.1254 (max= 1.6812), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:58,850 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.1041 (max= 1.6173), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:58,850 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.1041 (max= 1.6173), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:58,850 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.1041 (max= 1.6173), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:58,850 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.1041 (max= 1.6173), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:58,850 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.1041 (max= 1.6173), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:58,850 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.1041 (max= 1.6173), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:58,850 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.1041 (max= 1.6173), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:19:58,851 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.1041 (max= 1.6173), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:20:30,678 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.1118 (max= 1.4631), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:20:30,678 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.1118 (max= 1.4631), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:20:30,678 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.1118 (max= 1.4631), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:20:30,678 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.1118 (max= 1.4631), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:20:30,678 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.1118 (max= 1.4631), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:20:30,678 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.1118 (max= 1.4631), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:20:30,678 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.1118 (max= 1.4631), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:20:30,678 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.1118 (max= 1.4631), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:02,642 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.1045 (max= 1.6087), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:02,642 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.1045 (max= 1.6087), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:02,642 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.1045 (max= 1.6087), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:02,642 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.1045 (max= 1.6087), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:02,642 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.1045 (max= 1.6087), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:02,642 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.1045 (max= 1.6087), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:02,642 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.1045 (max= 1.6087), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:02,642 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.1045 (max= 1.6087), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:34,517 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.0936 (max= 1.5074), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:34,517 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.0936 (max= 1.5074), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:34,517 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.0936 (max= 1.5074), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:34,517 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.0936 (max= 1.5074), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:34,517 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.0936 (max= 1.5074), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:34,517 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.0936 (max= 1.5074), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:34,517 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.0936 (max= 1.5074), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:21:34,517 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.0936 (max= 1.5074), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:06,381 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.1206 (max= 1.4831), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:06,381 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.1206 (max= 1.4831), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:06,381 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.1206 (max= 1.4831), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:06,381 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.1206 (max= 1.4831), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:06,381 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.1206 (max= 1.4831), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:06,381 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.1206 (max= 1.4831), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:06,381 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.1206 (max= 1.4831), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:06,381 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.1206 (max= 1.4831), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:38,227 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.1032 (max= 1.4710), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:38,227 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.1032 (max= 1.4710), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:38,228 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.1032 (max= 1.4710), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:38,228 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.1032 (max= 1.4710), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:38,228 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.1032 (max= 1.4710), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:38,228 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.1032 (max= 1.4710), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:38,228 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.1032 (max= 1.4710), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:22:38,228 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.1032 (max= 1.4710), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:10,058 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.1115 (max= 1.6395), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:10,059 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.1115 (max= 1.6395), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:10,059 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.1115 (max= 1.6395), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:10,059 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.1115 (max= 1.6395), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:10,059 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.1115 (max= 1.6395), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:10,059 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.1115 (max= 1.6395), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:10,059 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.1115 (max= 1.6395), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:10,059 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.1115 (max= 1.6395), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:41,875 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.1049 (max= 1.5537), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:41,875 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.1049 (max= 1.5537), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:41,875 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.1049 (max= 1.5537), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:41,875 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.1049 (max= 1.5537), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:41,875 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.1049 (max= 1.5537), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:41,875 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.1049 (max= 1.5537), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:41,875 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.1049 (max= 1.5537), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:23:41,875 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.1049 (max= 1.5537), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:13,799 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.1288 (max= 1.4820), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:13,799 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.1288 (max= 1.4820), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:13,800 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.1288 (max= 1.4820), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:13,800 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.1288 (max= 1.4820), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:13,800 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.1288 (max= 1.4820), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:13,800 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.1288 (max= 1.4820), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:13,800 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.1288 (max= 1.4820), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:13,800 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.1288 (max= 1.4820), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:45,694 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.1175 (max= 1.5865), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:45,694 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.1175 (max= 1.5865), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:45,694 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.1175 (max= 1.5865), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:45,694 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.1175 (max= 1.5865), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:45,694 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.1175 (max= 1.5865), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:45,694 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.1175 (max= 1.5865), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:45,694 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.1175 (max= 1.5865), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:24:45,695 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.1175 (max= 1.5865), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:17,640 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.1184 (max= 1.5389), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:17,640 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.1184 (max= 1.5389), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:17,640 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.1184 (max= 1.5389), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:17,640 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.1184 (max= 1.5389), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:17,640 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.1184 (max= 1.5389), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:17,640 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.1184 (max= 1.5389), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:17,640 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.1184 (max= 1.5389), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:17,640 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.1184 (max= 1.5389), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:49,466 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.1156 (max= 1.6988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:49,466 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.1156 (max= 1.6988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:49,466 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.1156 (max= 1.6988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:49,466 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.1156 (max= 1.6988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:49,466 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.1156 (max= 1.6988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:49,466 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.1156 (max= 1.6988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:49,466 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.1156 (max= 1.6988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:25:49,467 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.1156 (max= 1.6988), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:21,275 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.1074 (max= 1.5463), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:21,275 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.1074 (max= 1.5463), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:21,275 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.1074 (max= 1.5463), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:21,275 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.1074 (max= 1.5463), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:21,275 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.1074 (max= 1.5463), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:21,275 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.1074 (max= 1.5463), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:21,275 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.1074 (max= 1.5463), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:21,275 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.1074 (max= 1.5463), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:53,046 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.1283 (max= 1.6006), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:53,046 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.1283 (max= 1.6006), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:53,046 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.1283 (max= 1.6006), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:53,046 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.1283 (max= 1.6006), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:53,046 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.1283 (max= 1.6006), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:53,046 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.1283 (max= 1.6006), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:53,046 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.1283 (max= 1.6006), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:26:53,046 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.1283 (max= 1.6006), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:24,865 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.1274 (max= 1.5517), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:24,865 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.1274 (max= 1.5517), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:24,865 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.1274 (max= 1.5517), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:24,865 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.1274 (max= 1.5517), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:24,865 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.1274 (max= 1.5517), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:24,865 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.1274 (max= 1.5517), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:24,865 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.1274 (max= 1.5517), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:24,865 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.1274 (max= 1.5517), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:56,710 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.1045 (max= 1.6554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:56,710 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.1045 (max= 1.6554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:56,710 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.1045 (max= 1.6554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:56,710 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.1045 (max= 1.6554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:56,710 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.1045 (max= 1.6554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:56,710 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.1045 (max= 1.6554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:56,710 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.1045 (max= 1.6554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:27:56,710 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.1045 (max= 1.6554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:28:28,568 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.1067 (max= 1.4552), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:28:28,568 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.1067 (max= 1.4552), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:28:28,568 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.1067 (max= 1.4552), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:28:28,568 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.1067 (max= 1.4552), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:28:28,568 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.1067 (max= 1.4552), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:28:28,568 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.1067 (max= 1.4552), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:28:28,568 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.1067 (max= 1.4552), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:28:28,568 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.1067 (max= 1.4552), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:00,476 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.1073 (max= 1.5483), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:00,476 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.1073 (max= 1.5483), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:00,476 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.1073 (max= 1.5483), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:00,476 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.1073 (max= 1.5483), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:00,476 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.1073 (max= 1.5483), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:00,476 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.1073 (max= 1.5483), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:00,476 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.1073 (max= 1.5483), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:00,476 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.1073 (max= 1.5483), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:32,354 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.1350 (max= 1.6131), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:32,354 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.1350 (max= 1.6131), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:32,354 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.1350 (max= 1.6131), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:32,355 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.1350 (max= 1.6131), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:32,355 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.1350 (max= 1.6131), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:32,355 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.1350 (max= 1.6131), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:32,355 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.1350 (max= 1.6131), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:29:32,355 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.1350 (max= 1.6131), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:04,173 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.1281 (max= 1.5003), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:04,173 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.1281 (max= 1.5003), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:04,173 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.1281 (max= 1.5003), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:04,174 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.1281 (max= 1.5003), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:04,174 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.1281 (max= 1.5003), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:04,174 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.1281 (max= 1.5003), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:04,174 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.1281 (max= 1.5003), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:04,174 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.1281 (max= 1.5003), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:36,082 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.1130 (max= 1.5911), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:36,082 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.1130 (max= 1.5911), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:36,082 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.1130 (max= 1.5911), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:36,082 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.1130 (max= 1.5911), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:36,082 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.1130 (max= 1.5911), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:36,082 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.1130 (max= 1.5911), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:36,082 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.1130 (max= 1.5911), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:30:36,082 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.1130 (max= 1.5911), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:07,942 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.1413 (max= 1.6016), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:07,943 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.1413 (max= 1.6016), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:07,943 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.1413 (max= 1.6016), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:07,943 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.1413 (max= 1.6016), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:07,943 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.1413 (max= 1.6016), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:07,943 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.1413 (max= 1.6016), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:07,943 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.1413 (max= 1.6016), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:07,943 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.1413 (max= 1.6016), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:39,806 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.1012 (max= 1.6259), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:39,806 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.1012 (max= 1.6259), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:39,806 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.1012 (max= 1.6259), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:39,806 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.1012 (max= 1.6259), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:39,806 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.1012 (max= 1.6259), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:39,806 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.1012 (max= 1.6259), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:39,806 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.1012 (max= 1.6259), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:31:39,806 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.1012 (max= 1.6259), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:11,661 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.1044 (max= 1.6819), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:11,661 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.1044 (max= 1.6819), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:11,661 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.1044 (max= 1.6819), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:11,661 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.1044 (max= 1.6819), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:11,661 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.1044 (max= 1.6819), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:11,661 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.1044 (max= 1.6819), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:11,661 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.1044 (max= 1.6819), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:11,661 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.1044 (max= 1.6819), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:43,546 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.1366 (max= 1.7330), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:43,546 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.1366 (max= 1.7330), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:43,546 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.1366 (max= 1.7330), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:43,546 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.1366 (max= 1.7330), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:43,546 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.1366 (max= 1.7330), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:43,546 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.1366 (max= 1.7330), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:43,546 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.1366 (max= 1.7330), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:32:43,546 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.1366 (max= 1.7330), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:15,329 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.0962 (max= 1.5514), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:15,329 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.0962 (max= 1.5514), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:15,329 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.0962 (max= 1.5514), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:15,329 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.0962 (max= 1.5514), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:15,329 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.0962 (max= 1.5514), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:15,329 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.0962 (max= 1.5514), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:15,329 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.0962 (max= 1.5514), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:15,329 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.0962 (max= 1.5514), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:47,175 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.1383 (max= 1.5640), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:47,175 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.1383 (max= 1.5640), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:47,175 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.1383 (max= 1.5640), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:47,175 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.1383 (max= 1.5640), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:47,175 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.1383 (max= 1.5640), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:47,175 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.1383 (max= 1.5640), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:47,175 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.1383 (max= 1.5640), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:33:47,175 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.1383 (max= 1.5640), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:19,057 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.1156 (max= 1.5186), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:19,057 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.1156 (max= 1.5186), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:19,057 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.1156 (max= 1.5186), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:19,057 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.1156 (max= 1.5186), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:19,057 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.1156 (max= 1.5186), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:19,057 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.1156 (max= 1.5186), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:19,058 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.1156 (max= 1.5186), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:19,058 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.1156 (max= 1.5186), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:50,897 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.1105 (max= 1.6219), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:50,897 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.1105 (max= 1.6219), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:50,897 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.1105 (max= 1.6219), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:50,897 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.1105 (max= 1.6219), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:50,897 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.1105 (max= 1.6219), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:50,897 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.1105 (max= 1.6219), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:50,897 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.1105 (max= 1.6219), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:34:50,897 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.1105 (max= 1.6219), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:22,654 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.1057 (max= 1.5439), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:22,654 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.1057 (max= 1.5439), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:22,654 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.1057 (max= 1.5439), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:22,654 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.1057 (max= 1.5439), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:22,654 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.1057 (max= 1.5439), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:22,654 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.1057 (max= 1.5439), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:22,655 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.1057 (max= 1.5439), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:22,655 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.1057 (max= 1.5439), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:54,455 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.0981 (max= 1.5196), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:54,455 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.0981 (max= 1.5196), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:54,456 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.0981 (max= 1.5196), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:54,456 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.0981 (max= 1.5196), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:54,456 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.0981 (max= 1.5196), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:54,456 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.0981 (max= 1.5196), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:54,456 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.0981 (max= 1.5196), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:35:54,456 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.0981 (max= 1.5196), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:26,365 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.1313 (max= 1.5641), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:26,365 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.1313 (max= 1.5641), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:26,365 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.1313 (max= 1.5641), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:26,366 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.1313 (max= 1.5641), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:26,366 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.1313 (max= 1.5641), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:26,366 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.1313 (max= 1.5641), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:26,366 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.1313 (max= 1.5641), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:26,366 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.1313 (max= 1.5641), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:58,223 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.0947 (max= 1.7173), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:58,223 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.0947 (max= 1.7173), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:58,223 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.0947 (max= 1.7173), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:58,223 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.0947 (max= 1.7173), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:58,223 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.0947 (max= 1.7173), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:58,223 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.0947 (max= 1.7173), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:58,223 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.0947 (max= 1.7173), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:36:58,223 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.0947 (max= 1.7173), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:37:30,084 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.1268 (max= 1.6044), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:37:30,084 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.1268 (max= 1.6044), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:37:30,084 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.1268 (max= 1.6044), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:37:30,084 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.1268 (max= 1.6044), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:37:30,084 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.1268 (max= 1.6044), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:37:30,084 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.1268 (max= 1.6044), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:37:30,084 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.1268 (max= 1.6044), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:37:30,084 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.1268 (max= 1.6044), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-7000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-7000! Save time: 4.475318193435669 +2025-10-25 16:38:01,944 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.1185 (max= 1.5246), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:01,944 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-25 16:38:01,944 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 16:38:01,944 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.1185 (max= 1.5246), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:01,944 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.1185 (max= 1.5246), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:01,944 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.1185 (max= 1.5246), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:01,944 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.1185 (max= 1.5246), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:01,944 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-25 16:38:01,944 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.1185 (max= 1.5246), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:01,944 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.1185 (max= 1.5246), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:01,944 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-25 16:38:01,944 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 16:38:01,944 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 16:38:01,944 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-25 16:38:01,944 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-25 16:38:01,944 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-25 16:38:01,944 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-25 16:38:01,944 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 16:38:01,944 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 16:38:01,944 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 16:38:01,944 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 16:38:01,944 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.1185 (max= 1.5246), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:01,944 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-25 16:38:01,944 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 16:38:17,849 - root - INFO - Finished saving the checkpoint in 15.90 seconds +2025-10-25 16:38:17,856 - root - INFO - Finished saving the checkpoint in 15.91 seconds +2025-10-25 16:38:17,857 - root - INFO - Finished saving the checkpoint in 15.91 seconds +2025-10-25 16:38:17,857 - root - INFO - Finished saving the checkpoint in 15.91 seconds +2025-10-25 16:38:17,857 - root - INFO - Finished saving the checkpoint in 15.91 seconds +2025-10-25 16:38:17,857 - root - INFO - Finished saving the checkpoint in 15.91 seconds +2025-10-25 16:38:17,858 - root - INFO - Finished saving the checkpoint in 15.91 seconds +2025-10-25 16:38:17,859 - root - INFO - Finished saving the checkpoint in 15.91 seconds +2025-10-25 16:38:49,720 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.1240 (max= 1.6672), tps=13718, mfu=28.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:49,721 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.1240 (max= 1.6672), tps=13718, mfu=28.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:49,721 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.1240 (max= 1.6672), tps=13718, mfu=28.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:49,721 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.1240 (max= 1.6672), tps=13718, mfu=28.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:49,721 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.1240 (max= 1.6672), tps=13718, mfu=28.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:49,721 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.1240 (max= 1.6672), tps=13718, mfu=28.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:49,721 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.1240 (max= 1.6672), tps=13718, mfu=28.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:38:49,721 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.1240 (max= 1.6672), tps=13718, mfu=28.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:21,686 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.1284 (max= 1.5989), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:21,686 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.1284 (max= 1.5989), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:21,686 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.1284 (max= 1.5989), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:21,686 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.1284 (max= 1.5989), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:21,686 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.1284 (max= 1.5989), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:21,686 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.1284 (max= 1.5989), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:21,686 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.1284 (max= 1.5989), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:21,686 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.1284 (max= 1.5989), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:53,566 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.1108 (max= 1.7446), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:53,566 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.1108 (max= 1.7446), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:53,567 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.1108 (max= 1.7446), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:53,567 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.1108 (max= 1.7446), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:53,567 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.1108 (max= 1.7446), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:53,567 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.1108 (max= 1.7446), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:53,567 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.1108 (max= 1.7446), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:39:53,567 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.1108 (max= 1.7446), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:25,482 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.1083 (max= 1.4756), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:25,482 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.1083 (max= 1.4756), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:25,482 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.1083 (max= 1.4756), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:25,482 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.1083 (max= 1.4756), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:25,482 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.1083 (max= 1.4756), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:25,482 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.1083 (max= 1.4756), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:25,482 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.1083 (max= 1.4756), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:25,482 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.1083 (max= 1.4756), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:57,286 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.1068 (max= 1.4983), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:57,286 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.1068 (max= 1.4983), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:57,286 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.1068 (max= 1.4983), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:57,286 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.1068 (max= 1.4983), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:57,286 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.1068 (max= 1.4983), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:57,286 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.1068 (max= 1.4983), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:57,286 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.1068 (max= 1.4983), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:40:57,286 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.1068 (max= 1.4983), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:41:29,215 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.1045 (max= 1.5639), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:41:29,215 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.1045 (max= 1.5639), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:41:29,215 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.1045 (max= 1.5639), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:41:29,215 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.1045 (max= 1.5639), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:41:29,215 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.1045 (max= 1.5639), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:41:29,215 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.1045 (max= 1.5639), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:41:29,215 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.1045 (max= 1.5639), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:41:29,215 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.1045 (max= 1.5639), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:01,047 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.1161 (max= 1.6567), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:01,047 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.1161 (max= 1.6567), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:01,047 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.1161 (max= 1.6567), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:01,047 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.1161 (max= 1.6567), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:01,047 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.1161 (max= 1.6567), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:01,047 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.1161 (max= 1.6567), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:01,047 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.1161 (max= 1.6567), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:01,048 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.1161 (max= 1.6567), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:06,602 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:7010376 +2025-10-25 16:42:32,924 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.1285 (max= 1.5809), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:32,924 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.1285 (max= 1.5809), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:32,924 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.1285 (max= 1.5809), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:32,924 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.1285 (max= 1.5809), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:32,924 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.1285 (max= 1.5809), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:32,925 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.1285 (max= 1.5809), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:32,925 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.1285 (max= 1.5809), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:42:32,925 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.1285 (max= 1.5809), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:04,759 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.1141 (max= 1.8522), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:04,759 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.1141 (max= 1.8522), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:04,759 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.1141 (max= 1.8522), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:04,759 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.1141 (max= 1.8522), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:04,759 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.1141 (max= 1.8522), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:04,759 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.1141 (max= 1.8522), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:04,759 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.1141 (max= 1.8522), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:04,759 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.1141 (max= 1.8522), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:36,661 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.1083 (max= 1.5221), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:36,661 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.1083 (max= 1.5221), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:36,661 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.1083 (max= 1.5221), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:36,661 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.1083 (max= 1.5221), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:36,661 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.1083 (max= 1.5221), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:36,661 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.1083 (max= 1.5221), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:36,661 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.1083 (max= 1.5221), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:43:36,661 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.1083 (max= 1.5221), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:08,500 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.1096 (max= 1.5654), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:08,500 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.1096 (max= 1.5654), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:08,500 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.1096 (max= 1.5654), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:08,500 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.1096 (max= 1.5654), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:08,500 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.1096 (max= 1.5654), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:08,500 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.1096 (max= 1.5654), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:08,500 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.1096 (max= 1.5654), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:08,500 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.1096 (max= 1.5654), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:40,379 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.1176 (max= 1.6141), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:40,379 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.1176 (max= 1.6141), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:40,379 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.1176 (max= 1.6141), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:40,379 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.1176 (max= 1.6141), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:40,379 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.1176 (max= 1.6141), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:40,379 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.1176 (max= 1.6141), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:40,379 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.1176 (max= 1.6141), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:44:40,379 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.1176 (max= 1.6141), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:12,188 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.1118 (max= 1.6097), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:12,188 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.1118 (max= 1.6097), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:12,188 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.1118 (max= 1.6097), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:12,188 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.1118 (max= 1.6097), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:12,188 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.1118 (max= 1.6097), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:12,188 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.1118 (max= 1.6097), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:12,188 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.1118 (max= 1.6097), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:12,188 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.1118 (max= 1.6097), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:44,015 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.1107 (max= 1.6221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:44,015 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.1107 (max= 1.6221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:44,015 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.1107 (max= 1.6221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:44,015 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.1107 (max= 1.6221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:44,015 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.1107 (max= 1.6221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:44,015 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.1107 (max= 1.6221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:44,015 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.1107 (max= 1.6221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:45:44,015 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.1107 (max= 1.6221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:15,839 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.1046 (max= 1.5441), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:15,839 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.1046 (max= 1.5441), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:15,839 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.1046 (max= 1.5441), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:15,839 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.1046 (max= 1.5441), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:15,839 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.1046 (max= 1.5441), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:15,839 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.1046 (max= 1.5441), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:15,839 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.1046 (max= 1.5441), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:15,839 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.1046 (max= 1.5441), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:47,665 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.0944 (max= 1.4951), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:47,665 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.0944 (max= 1.4951), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:47,665 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.0944 (max= 1.4951), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:47,665 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.0944 (max= 1.4951), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:47,665 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.0944 (max= 1.4951), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:47,665 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.0944 (max= 1.4951), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:47,665 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.0944 (max= 1.4951), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:46:47,665 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.0944 (max= 1.4951), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:19,540 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.1064 (max= 1.7453), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:19,540 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.1064 (max= 1.7453), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:19,540 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.1064 (max= 1.7453), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:19,540 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.1064 (max= 1.7453), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:19,540 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.1064 (max= 1.7453), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:19,540 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.1064 (max= 1.7453), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:19,540 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.1064 (max= 1.7453), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:19,540 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.1064 (max= 1.7453), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:51,435 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.1168 (max= 1.5421), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:51,435 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.1168 (max= 1.5421), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:51,435 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.1168 (max= 1.5421), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:51,435 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.1168 (max= 1.5421), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:51,435 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.1168 (max= 1.5421), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:51,435 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.1168 (max= 1.5421), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:51,435 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.1168 (max= 1.5421), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:47:51,435 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.1168 (max= 1.5421), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:23,265 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.0830 (max= 1.6558), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:23,265 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.0830 (max= 1.6558), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:23,265 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.0830 (max= 1.6558), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:23,265 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.0830 (max= 1.6558), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:23,265 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.0830 (max= 1.6558), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:23,265 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.0830 (max= 1.6558), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:23,265 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.0830 (max= 1.6558), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:23,265 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.0830 (max= 1.6558), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:55,171 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.1122 (max= 1.8786), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:55,171 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.1122 (max= 1.8786), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:55,171 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.1122 (max= 1.8786), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:55,171 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.1122 (max= 1.8786), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:55,171 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.1122 (max= 1.8786), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:55,171 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.1122 (max= 1.8786), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:55,171 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.1122 (max= 1.8786), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:48:55,171 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.1122 (max= 1.8786), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:27,027 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.1089 (max= 1.5690), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:27,027 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.1089 (max= 1.5690), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:27,027 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.1089 (max= 1.5690), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:27,027 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.1089 (max= 1.5690), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:27,027 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.1089 (max= 1.5690), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:27,027 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.1089 (max= 1.5690), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:27,027 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.1089 (max= 1.5690), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:27,027 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.1089 (max= 1.5690), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:58,857 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.1046 (max= 1.4573), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:58,857 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.1046 (max= 1.4573), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:58,857 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.1046 (max= 1.4573), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:58,857 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.1046 (max= 1.4573), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:58,857 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.1046 (max= 1.4573), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:58,857 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.1046 (max= 1.4573), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:58,857 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.1046 (max= 1.4573), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:49:58,857 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.1046 (max= 1.4573), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:50:30,730 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.1066 (max= 1.6976), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:50:30,730 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.1066 (max= 1.6976), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:50:30,730 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.1066 (max= 1.6976), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:50:30,730 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.1066 (max= 1.6976), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:50:30,730 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.1066 (max= 1.6976), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:50:30,730 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.1066 (max= 1.6976), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:50:30,730 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.1066 (max= 1.6976), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:50:30,731 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.1066 (max= 1.6976), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:02,678 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.1034 (max= 1.7475), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:02,678 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.1034 (max= 1.7475), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:02,678 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.1034 (max= 1.7475), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:02,678 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.1034 (max= 1.7475), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:02,678 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.1034 (max= 1.7475), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:02,678 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.1034 (max= 1.7475), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:02,678 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.1034 (max= 1.7475), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:02,678 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.1034 (max= 1.7475), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:34,472 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.1247 (max= 1.6138), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:34,472 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.1247 (max= 1.6138), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:34,472 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.1247 (max= 1.6138), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:34,472 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.1247 (max= 1.6138), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:34,472 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.1247 (max= 1.6138), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:34,472 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.1247 (max= 1.6138), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:34,472 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.1247 (max= 1.6138), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:51:34,472 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.1247 (max= 1.6138), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:06,328 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.1316 (max= 1.6238), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:06,328 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.1316 (max= 1.6238), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:06,328 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.1316 (max= 1.6238), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:06,328 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.1316 (max= 1.6238), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:06,328 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.1316 (max= 1.6238), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:06,328 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.1316 (max= 1.6238), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:06,328 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.1316 (max= 1.6238), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:06,328 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.1316 (max= 1.6238), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:38,173 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.0921 (max= 1.4929), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:38,173 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.0921 (max= 1.4929), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:38,173 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.0921 (max= 1.4929), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:38,173 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.0921 (max= 1.4929), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:38,173 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.0921 (max= 1.4929), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:38,173 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.0921 (max= 1.4929), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:38,173 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.0921 (max= 1.4929), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:52:38,174 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.0921 (max= 1.4929), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:09,994 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.0778 (max= 1.4592), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:09,994 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.0778 (max= 1.4592), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:09,994 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.0778 (max= 1.4592), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:09,994 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.0778 (max= 1.4592), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:09,994 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.0778 (max= 1.4592), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:09,994 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.0778 (max= 1.4592), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:09,994 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.0778 (max= 1.4592), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:09,994 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.0778 (max= 1.4592), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:41,867 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.1201 (max= 1.6192), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:41,867 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.1201 (max= 1.6192), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:41,867 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.1201 (max= 1.6192), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:41,867 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.1201 (max= 1.6192), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:41,867 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.1201 (max= 1.6192), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:41,867 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.1201 (max= 1.6192), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:41,867 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.1201 (max= 1.6192), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:53:41,867 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.1201 (max= 1.6192), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:13,690 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.0874 (max= 1.5597), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:13,691 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.0874 (max= 1.5597), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:13,691 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.0874 (max= 1.5597), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:13,691 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.0874 (max= 1.5597), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:13,691 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.0874 (max= 1.5597), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:13,691 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.0874 (max= 1.5597), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:13,691 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.0874 (max= 1.5597), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:13,691 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.0874 (max= 1.5597), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:45,685 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.0869 (max= 1.4012), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:45,685 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.0869 (max= 1.4012), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:45,685 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.0869 (max= 1.4012), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:45,685 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.0869 (max= 1.4012), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:45,685 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.0869 (max= 1.4012), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:45,685 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.0869 (max= 1.4012), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:45,685 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.0869 (max= 1.4012), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:54:45,685 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.0869 (max= 1.4012), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:17,660 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.1274 (max= 1.5279), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:17,660 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.1274 (max= 1.5279), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:17,660 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.1274 (max= 1.5279), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:17,660 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.1274 (max= 1.5279), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:17,660 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.1274 (max= 1.5279), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:17,660 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.1274 (max= 1.5279), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:17,660 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.1274 (max= 1.5279), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:17,660 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.1274 (max= 1.5279), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:49,504 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.0998 (max= 1.7081), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:49,504 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.0998 (max= 1.7081), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:49,505 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.0998 (max= 1.7081), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:49,505 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.0998 (max= 1.7081), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:49,505 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.0998 (max= 1.7081), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:49,505 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.0998 (max= 1.7081), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:49,505 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.0998 (max= 1.7081), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:55:49,505 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.0998 (max= 1.7081), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:21,362 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.1058 (max= 1.4934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:21,362 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.1058 (max= 1.4934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:21,363 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.1058 (max= 1.4934), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:21,363 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.1058 (max= 1.4934), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:21,363 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.1058 (max= 1.4934), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:21,363 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.1058 (max= 1.4934), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:21,363 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.1058 (max= 1.4934), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:21,363 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.1058 (max= 1.4934), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:53,226 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.1001 (max= 1.5656), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:53,226 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.1001 (max= 1.5656), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:53,226 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.1001 (max= 1.5656), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:53,226 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.1001 (max= 1.5656), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:53,226 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.1001 (max= 1.5656), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:53,226 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.1001 (max= 1.5656), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:53,226 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.1001 (max= 1.5656), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:56:53,226 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.1001 (max= 1.5656), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:25,179 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.0904 (max= 1.5184), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:25,180 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.0904 (max= 1.5184), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:25,180 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.0904 (max= 1.5184), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:25,180 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.0904 (max= 1.5184), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:25,180 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.0904 (max= 1.5184), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:25,180 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.0904 (max= 1.5184), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:25,180 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.0904 (max= 1.5184), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:25,180 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.0904 (max= 1.5184), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:57,020 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.1093 (max= 1.4899), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:57,020 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.1093 (max= 1.4899), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:57,020 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.1093 (max= 1.4899), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:57,020 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.1093 (max= 1.4899), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:57,020 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.1093 (max= 1.4899), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:57,020 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.1093 (max= 1.4899), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:57,020 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.1093 (max= 1.4899), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:57:57,020 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.1093 (max= 1.4899), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:58:28,750 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.1083 (max= 1.5155), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:58:28,750 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.1083 (max= 1.5155), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:58:28,750 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.1083 (max= 1.5155), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:58:28,750 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.1083 (max= 1.5155), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:58:28,751 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.1083 (max= 1.5155), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:58:28,751 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.1083 (max= 1.5155), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:58:28,751 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.1083 (max= 1.5155), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:58:28,751 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.1083 (max= 1.5155), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:00,627 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.1146 (max= 1.6011), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:00,627 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.1146 (max= 1.6011), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:00,627 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.1146 (max= 1.6011), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:00,627 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.1146 (max= 1.6011), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:00,627 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.1146 (max= 1.6011), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:00,627 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.1146 (max= 1.6011), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:00,627 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.1146 (max= 1.6011), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:00,627 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.1146 (max= 1.6011), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:32,626 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.1192 (max= 1.6880), tps=20482, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:32,626 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.1192 (max= 1.6880), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:32,626 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.1192 (max= 1.6880), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:32,626 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.1192 (max= 1.6880), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:32,626 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.1192 (max= 1.6880), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:32,626 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.1192 (max= 1.6880), tps=20482, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:32,626 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.1192 (max= 1.6880), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 16:59:32,626 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.1192 (max= 1.6880), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:04,426 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.0822 (max= 1.4569), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:04,426 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.0822 (max= 1.4569), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:04,426 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.0822 (max= 1.4569), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:04,426 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.0822 (max= 1.4569), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:04,426 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.0822 (max= 1.4569), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:04,426 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.0822 (max= 1.4569), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:04,426 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.0822 (max= 1.4569), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:04,426 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.0822 (max= 1.4569), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:36,186 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.1050 (max= 1.4477), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:36,186 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.1050 (max= 1.4477), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:36,186 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.1050 (max= 1.4477), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:36,186 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.1050 (max= 1.4477), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:36,186 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.1050 (max= 1.4477), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:36,186 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.1050 (max= 1.4477), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:36,186 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.1050 (max= 1.4477), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:36,186 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.1050 (max= 1.4477), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:00:51,215 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:6795936 +2025-10-25 17:01:08,016 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.1148 (max= 1.5883), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:08,016 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.1148 (max= 1.5883), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:08,016 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.1148 (max= 1.5883), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:08,016 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.1148 (max= 1.5883), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:08,016 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.1148 (max= 1.5883), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:08,016 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.1148 (max= 1.5883), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:08,016 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.1148 (max= 1.5883), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:08,016 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.1148 (max= 1.5883), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:39,861 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.0984 (max= 1.6038), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:39,861 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.0984 (max= 1.6038), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:39,861 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.0984 (max= 1.6038), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:39,861 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.0984 (max= 1.6038), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:39,861 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.0984 (max= 1.6038), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:39,861 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.0984 (max= 1.6038), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:39,861 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.0984 (max= 1.6038), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:01:39,861 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.0984 (max= 1.6038), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:11,690 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.0990 (max= 1.5952), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:11,691 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.0990 (max= 1.5952), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:11,691 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.0990 (max= 1.5952), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:11,691 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.0990 (max= 1.5952), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:11,691 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.0990 (max= 1.5952), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:11,691 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.0990 (max= 1.5952), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:11,691 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.0990 (max= 1.5952), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:11,691 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.0990 (max= 1.5952), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:43,495 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.0840 (max= 1.6144), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:43,495 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.0840 (max= 1.6144), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:43,496 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.0840 (max= 1.6144), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:43,496 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.0840 (max= 1.6144), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:43,496 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.0840 (max= 1.6144), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:43,496 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.0840 (max= 1.6144), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:43,496 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.0840 (max= 1.6144), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:02:43,496 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.0840 (max= 1.6144), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:08,167 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:578883 +2025-10-25 17:03:15,400 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.0955 (max= 1.4729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:15,400 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.0955 (max= 1.4729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:15,401 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.0955 (max= 1.4729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:15,401 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.0955 (max= 1.4729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:15,401 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.0955 (max= 1.4729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:15,401 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.0955 (max= 1.4729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:15,401 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.0955 (max= 1.4729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:15,401 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.0955 (max= 1.4729), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:47,189 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.0805 (max= 1.5850), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:47,189 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.0805 (max= 1.5850), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:47,190 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.0805 (max= 1.5850), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:47,190 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.0805 (max= 1.5850), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:47,190 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.0805 (max= 1.5850), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:47,190 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.0805 (max= 1.5850), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:47,190 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.0805 (max= 1.5850), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:03:47,190 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.0805 (max= 1.5850), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:19,104 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.1227 (max= 1.5369), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:19,104 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.1227 (max= 1.5369), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:19,104 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.1227 (max= 1.5369), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:19,104 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.1227 (max= 1.5369), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:19,104 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.1227 (max= 1.5369), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:19,104 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.1227 (max= 1.5369), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:19,104 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.1227 (max= 1.5369), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:19,104 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.1227 (max= 1.5369), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:50,935 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.0904 (max= 1.4595), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:50,935 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.0904 (max= 1.4595), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:50,935 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.0904 (max= 1.4595), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:50,935 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.0904 (max= 1.4595), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:50,935 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.0904 (max= 1.4595), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:50,935 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.0904 (max= 1.4595), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:50,935 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.0904 (max= 1.4595), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:04:50,935 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.0904 (max= 1.4595), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:22,877 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.1082 (max= 1.5268), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:22,877 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.1082 (max= 1.5268), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:22,877 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.1082 (max= 1.5268), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:22,878 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.1082 (max= 1.5268), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:22,878 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.1082 (max= 1.5268), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:22,878 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.1082 (max= 1.5268), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:22,878 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.1082 (max= 1.5268), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:22,878 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.1082 (max= 1.5268), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:54,813 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.1139 (max= 1.6410), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:54,813 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.1139 (max= 1.6410), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:54,813 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.1139 (max= 1.6410), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:54,813 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.1139 (max= 1.6410), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:54,813 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.1139 (max= 1.6410), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:54,813 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.1139 (max= 1.6410), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:54,813 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.1139 (max= 1.6410), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:05:54,813 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.1139 (max= 1.6410), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:26,643 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.0866 (max= 1.4993), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:26,643 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.0866 (max= 1.4993), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:26,643 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.0866 (max= 1.4993), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:26,643 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.0866 (max= 1.4993), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:26,643 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.0866 (max= 1.4993), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:26,643 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.0866 (max= 1.4993), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:26,643 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.0866 (max= 1.4993), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:26,643 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.0866 (max= 1.4993), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:58,448 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.0941 (max= 1.6025), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:58,448 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.0941 (max= 1.6025), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:58,448 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.0941 (max= 1.6025), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:58,448 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.0941 (max= 1.6025), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:58,448 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.0941 (max= 1.6025), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:58,448 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.0941 (max= 1.6025), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:58,448 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.0941 (max= 1.6025), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:06:58,448 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.0941 (max= 1.6025), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:07:30,311 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.0980 (max= 1.6855), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:07:30,311 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.0980 (max= 1.6855), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:07:30,311 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.0980 (max= 1.6855), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:07:30,311 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.0980 (max= 1.6855), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:07:30,311 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.0980 (max= 1.6855), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:07:30,311 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.0980 (max= 1.6855), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:07:30,311 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.0980 (max= 1.6855), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:07:30,311 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.0980 (max= 1.6855), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:02,191 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.1232 (max= 1.5581), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:02,191 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.1232 (max= 1.5581), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:02,191 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.1232 (max= 1.5581), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:02,191 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.1232 (max= 1.5581), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:02,191 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.1232 (max= 1.5581), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:02,191 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.1232 (max= 1.5581), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:02,191 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.1232 (max= 1.5581), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:02,191 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.1232 (max= 1.5581), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:34,046 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.1173 (max= 1.5304), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:34,046 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.1173 (max= 1.5304), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:34,046 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.1173 (max= 1.5304), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:34,047 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.1173 (max= 1.5304), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:34,047 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.1173 (max= 1.5304), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:34,047 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.1173 (max= 1.5304), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:34,047 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.1173 (max= 1.5304), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:08:34,047 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.1173 (max= 1.5304), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:05,912 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.0963 (max= 1.6070), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:05,912 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.0963 (max= 1.6070), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:05,912 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.0963 (max= 1.6070), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:05,912 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.0963 (max= 1.6070), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:05,912 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.0963 (max= 1.6070), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:05,913 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.0963 (max= 1.6070), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:05,913 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.0963 (max= 1.6070), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:05,913 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.0963 (max= 1.6070), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:37,823 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.0929 (max= 1.5786), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:37,823 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.0929 (max= 1.5786), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:37,823 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.0929 (max= 1.5786), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:37,823 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.0929 (max= 1.5786), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:37,823 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.0929 (max= 1.5786), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:37,823 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.0929 (max= 1.5786), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:37,823 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.0929 (max= 1.5786), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:09:37,823 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.0929 (max= 1.5786), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:09,668 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.1082 (max= 1.5665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:09,668 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.1082 (max= 1.5665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:09,668 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.1082 (max= 1.5665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:09,668 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.1082 (max= 1.5665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:09,668 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.1082 (max= 1.5665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:09,668 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.1082 (max= 1.5665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:09,668 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.1082 (max= 1.5665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:09,668 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.1082 (max= 1.5665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:41,521 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.1043 (max= 1.6117), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:41,521 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.1043 (max= 1.6117), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:41,521 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.1043 (max= 1.6117), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:41,521 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.1043 (max= 1.6117), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:41,521 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.1043 (max= 1.6117), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:41,521 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.1043 (max= 1.6117), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:41,521 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.1043 (max= 1.6117), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:10:41,521 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.1043 (max= 1.6117), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:13,378 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.0935 (max= 1.6092), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:13,378 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.0935 (max= 1.6092), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:13,378 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.0935 (max= 1.6092), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:13,378 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.0935 (max= 1.6092), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:13,378 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.0935 (max= 1.6092), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:13,378 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.0935 (max= 1.6092), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:13,378 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.0935 (max= 1.6092), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:13,378 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.0935 (max= 1.6092), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:45,241 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.0931 (max= 1.5170), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:45,241 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.0931 (max= 1.5170), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:45,241 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.0931 (max= 1.5170), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:45,241 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.0931 (max= 1.5170), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:45,241 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.0931 (max= 1.5170), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:45,241 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.0931 (max= 1.5170), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:45,241 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.0931 (max= 1.5170), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:11:45,241 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.0931 (max= 1.5170), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:17,040 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.0953 (max= 1.6062), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:17,040 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.0953 (max= 1.6062), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:17,040 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.0953 (max= 1.6062), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:17,040 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.0953 (max= 1.6062), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:17,040 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.0953 (max= 1.6062), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:17,040 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.0953 (max= 1.6062), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:17,040 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.0953 (max= 1.6062), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:17,040 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.0953 (max= 1.6062), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:48,924 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.1174 (max= 1.5310), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:48,924 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.1174 (max= 1.5310), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:48,924 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.1174 (max= 1.5310), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:48,924 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.1174 (max= 1.5310), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:48,924 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.1174 (max= 1.5310), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:48,924 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.1174 (max= 1.5310), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:48,924 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.1174 (max= 1.5310), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:12:48,924 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.1174 (max= 1.5310), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:20,799 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.0996 (max= 1.4785), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:20,799 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.0996 (max= 1.4785), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:20,799 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.0996 (max= 1.4785), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:20,799 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.0996 (max= 1.4785), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:20,799 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.0996 (max= 1.4785), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:20,799 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.0996 (max= 1.4785), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:20,800 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.0996 (max= 1.4785), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:20,800 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.0996 (max= 1.4785), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:52,702 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.0814 (max= 1.5054), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:52,702 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.0814 (max= 1.5054), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:52,702 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.0814 (max= 1.5054), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:52,702 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.0814 (max= 1.5054), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:52,702 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.0814 (max= 1.5054), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:52,702 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.0814 (max= 1.5054), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:52,703 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.0814 (max= 1.5054), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:13:52,703 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.0814 (max= 1.5054), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:24,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.0883 (max= 1.6884), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:24,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.0883 (max= 1.6884), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:24,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.0883 (max= 1.6884), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:24,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.0883 (max= 1.6884), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:24,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.0883 (max= 1.6884), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:24,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.0883 (max= 1.6884), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:24,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.0883 (max= 1.6884), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:24,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.0883 (max= 1.6884), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:56,467 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.0694 (max= 1.6180), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:56,468 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.0694 (max= 1.6180), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:56,468 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.0694 (max= 1.6180), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:56,468 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.0694 (max= 1.6180), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:56,468 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.0694 (max= 1.6180), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:56,468 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.0694 (max= 1.6180), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:56,468 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.0694 (max= 1.6180), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:14:56,468 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.0694 (max= 1.6180), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:15:28,371 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.0906 (max= 1.5681), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:15:28,371 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.0906 (max= 1.5681), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:15:28,371 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.0906 (max= 1.5681), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:15:28,371 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.0906 (max= 1.5681), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:15:28,371 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.0906 (max= 1.5681), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:15:28,371 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.0906 (max= 1.5681), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:15:28,371 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.0906 (max= 1.5681), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:15:28,371 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.0906 (max= 1.5681), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:00,223 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.1151 (max= 1.5994), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:00,223 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.1151 (max= 1.5994), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:00,223 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.1151 (max= 1.5994), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:00,223 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.1151 (max= 1.5994), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:00,223 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.1151 (max= 1.5994), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:00,223 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.1151 (max= 1.5994), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:00,223 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.1151 (max= 1.5994), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:00,224 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.1151 (max= 1.5994), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:32,104 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.0994 (max= 1.5129), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:32,104 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.0994 (max= 1.5129), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:32,104 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.0994 (max= 1.5129), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:32,104 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.0994 (max= 1.5129), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:32,104 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.0994 (max= 1.5129), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:32,104 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.0994 (max= 1.5129), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:32,104 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.0994 (max= 1.5129), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:16:32,104 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.0994 (max= 1.5129), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:04,051 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.1021 (max= 1.5469), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:04,051 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.1021 (max= 1.5469), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:04,051 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.1021 (max= 1.5469), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:04,051 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.1021 (max= 1.5469), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:04,051 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.1021 (max= 1.5469), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:04,052 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.1021 (max= 1.5469), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:04,052 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.1021 (max= 1.5469), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:04,052 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.1021 (max= 1.5469), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:35,926 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.0905 (max= 1.4860), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:35,926 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.0905 (max= 1.4860), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:35,926 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.0905 (max= 1.4860), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:35,926 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.0905 (max= 1.4860), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:35,926 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.0905 (max= 1.4860), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:35,926 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.0905 (max= 1.4860), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:35,926 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.0905 (max= 1.4860), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:17:35,926 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.0905 (max= 1.4860), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:07,812 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.1058 (max= 1.5690), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:07,812 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.1058 (max= 1.5690), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:07,812 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.1058 (max= 1.5690), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:07,812 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.1058 (max= 1.5690), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:07,812 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.1058 (max= 1.5690), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:07,812 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.1058 (max= 1.5690), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:07,812 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.1058 (max= 1.5690), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:07,812 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.1058 (max= 1.5690), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:39,694 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.1047 (max= 1.5272), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:39,694 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.1047 (max= 1.5272), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:39,694 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.1047 (max= 1.5272), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:39,694 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.1047 (max= 1.5272), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:39,694 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.1047 (max= 1.5272), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:39,695 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.1047 (max= 1.5272), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:39,695 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.1047 (max= 1.5272), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:18:39,695 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.1047 (max= 1.5272), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:11,559 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.1095 (max= 1.5823), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:11,559 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.1095 (max= 1.5823), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:11,559 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.1095 (max= 1.5823), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:11,559 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.1095 (max= 1.5823), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:11,559 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.1095 (max= 1.5823), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:11,559 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.1095 (max= 1.5823), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:11,559 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.1095 (max= 1.5823), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:11,560 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.1095 (max= 1.5823), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:43,369 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.0912 (max= 1.5990), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:43,369 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.0912 (max= 1.5990), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:43,369 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.0912 (max= 1.5990), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:43,369 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.0912 (max= 1.5990), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:43,369 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.0912 (max= 1.5990), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:43,369 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.0912 (max= 1.5990), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:43,369 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.0912 (max= 1.5990), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:19:43,369 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.0912 (max= 1.5990), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:15,214 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.0942 (max= 1.6361), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:15,214 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.0942 (max= 1.6361), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:15,214 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.0942 (max= 1.6361), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:15,214 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.0942 (max= 1.6361), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:15,214 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.0942 (max= 1.6361), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:15,214 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.0942 (max= 1.6361), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:15,214 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.0942 (max= 1.6361), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:15,214 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.0942 (max= 1.6361), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:47,064 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.1287 (max= 1.8160), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:47,064 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.1287 (max= 1.8160), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:47,064 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.1287 (max= 1.8160), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:47,065 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.1287 (max= 1.8160), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:47,065 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.1287 (max= 1.8160), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:47,065 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.1287 (max= 1.8160), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:47,065 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.1287 (max= 1.8160), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:20:47,065 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.1287 (max= 1.8160), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:18,934 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.0947 (max= 1.4970), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:18,934 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.0947 (max= 1.4970), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:18,934 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.0947 (max= 1.4970), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:18,934 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.0947 (max= 1.4970), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:18,934 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.0947 (max= 1.4970), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:18,934 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.0947 (max= 1.4970), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:18,934 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.0947 (max= 1.4970), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:18,934 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.0947 (max= 1.4970), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:50,711 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.1044 (max= 1.4867), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:50,711 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.1044 (max= 1.4867), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:50,711 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.1044 (max= 1.4867), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:50,711 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.1044 (max= 1.4867), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:50,711 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.1044 (max= 1.4867), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:50,711 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.1044 (max= 1.4867), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:50,711 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.1044 (max= 1.4867), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:21:50,711 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.1044 (max= 1.4867), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:22,528 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.1196 (max= 1.6708), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:22,528 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.1196 (max= 1.6708), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:22,528 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.1196 (max= 1.6708), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:22,528 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.1196 (max= 1.6708), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:22,528 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.1196 (max= 1.6708), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:22,528 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.1196 (max= 1.6708), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:22,528 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.1196 (max= 1.6708), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:22,528 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.1196 (max= 1.6708), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:54,387 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.1260 (max= 1.8934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:54,387 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.1260 (max= 1.8934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:54,387 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.1260 (max= 1.8934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:54,387 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.1260 (max= 1.8934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:54,387 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.1260 (max= 1.8934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:54,387 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.1260 (max= 1.8934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:54,387 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.1260 (max= 1.8934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:22:54,387 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.1260 (max= 1.8934), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:26,253 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.0865 (max= 1.5259), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:26,253 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.0865 (max= 1.5259), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:26,253 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.0865 (max= 1.5259), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:26,253 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.0865 (max= 1.5259), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:26,253 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.0865 (max= 1.5259), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:26,253 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.0865 (max= 1.5259), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:26,253 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.0865 (max= 1.5259), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:26,253 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.0865 (max= 1.5259), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:58,165 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.0965 (max= 1.6204), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:58,165 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.0965 (max= 1.6204), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:58,165 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.0965 (max= 1.6204), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:58,165 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.0965 (max= 1.6204), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:58,165 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.0965 (max= 1.6204), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:58,165 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.0965 (max= 1.6204), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:58,165 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.0965 (max= 1.6204), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:23:58,165 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.0965 (max= 1.6204), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:24:30,053 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.1161 (max= 1.5186), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:24:30,053 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.1161 (max= 1.5186), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:24:30,053 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.1161 (max= 1.5186), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:24:30,053 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.1161 (max= 1.5186), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:24:30,053 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.1161 (max= 1.5186), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:24:30,053 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.1161 (max= 1.5186), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:24:30,053 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.1161 (max= 1.5186), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:24:30,054 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.1161 (max= 1.5186), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:01,843 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.1107 (max= 1.5300), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:01,843 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.1107 (max= 1.5300), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:01,843 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.1107 (max= 1.5300), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:01,843 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.1107 (max= 1.5300), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:01,843 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.1107 (max= 1.5300), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:01,843 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.1107 (max= 1.5300), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:01,843 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.1107 (max= 1.5300), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:01,843 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.1107 (max= 1.5300), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:33,647 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.1099 (max= 1.7518), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:33,647 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.1099 (max= 1.7518), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:33,647 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.1099 (max= 1.7518), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:33,647 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.1099 (max= 1.7518), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:33,647 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.1099 (max= 1.7518), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:33,647 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.1099 (max= 1.7518), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:33,647 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.1099 (max= 1.7518), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:25:33,647 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.1099 (max= 1.7518), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:05,553 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.1349 (max= 1.5700), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:05,553 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.1349 (max= 1.5700), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:05,553 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.1349 (max= 1.5700), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:05,553 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.1349 (max= 1.5700), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:05,553 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.1349 (max= 1.5700), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:05,553 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.1349 (max= 1.5700), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:05,553 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.1349 (max= 1.5700), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:05,553 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.1349 (max= 1.5700), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:37,424 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.1196 (max= 1.4757), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:37,424 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.1196 (max= 1.4757), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:37,424 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.1196 (max= 1.4757), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:37,424 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.1196 (max= 1.4757), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:37,424 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.1196 (max= 1.4757), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:37,424 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.1196 (max= 1.4757), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:37,424 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.1196 (max= 1.4757), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:26:37,424 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.1196 (max= 1.4757), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:09,265 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.1002 (max= 1.6061), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:09,265 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.1002 (max= 1.6061), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:09,265 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.1002 (max= 1.6061), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:09,265 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.1002 (max= 1.6061), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:09,265 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.1002 (max= 1.6061), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:09,265 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.1002 (max= 1.6061), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:09,266 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.1002 (max= 1.6061), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:09,266 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.1002 (max= 1.6061), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:41,118 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.1135 (max= 1.7215), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:41,118 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.1135 (max= 1.7215), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:41,118 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.1135 (max= 1.7215), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:41,118 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.1135 (max= 1.7215), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:41,118 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.1135 (max= 1.7215), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:41,118 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.1135 (max= 1.7215), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:41,118 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.1135 (max= 1.7215), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:27:41,118 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.1135 (max= 1.7215), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:12,991 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.1166 (max= 1.5030), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:12,991 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.1166 (max= 1.5030), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:12,991 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.1166 (max= 1.5030), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:12,991 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.1166 (max= 1.5030), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:12,991 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.1166 (max= 1.5030), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:12,991 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.1166 (max= 1.5030), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:12,991 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.1166 (max= 1.5030), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:12,991 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.1166 (max= 1.5030), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:44,750 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.1220 (max= 1.7340), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:44,750 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.1220 (max= 1.7340), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:44,750 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.1220 (max= 1.7340), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:44,750 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.1220 (max= 1.7340), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:44,750 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.1220 (max= 1.7340), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:44,750 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.1220 (max= 1.7340), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:44,750 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.1220 (max= 1.7340), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:28:44,750 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.1220 (max= 1.7340), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:16,645 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.1083 (max= 1.5861), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:16,645 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.1083 (max= 1.5861), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:16,645 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.1083 (max= 1.5861), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:16,646 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.1083 (max= 1.5861), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:16,646 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.1083 (max= 1.5861), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:16,646 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.1083 (max= 1.5861), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:16,646 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.1083 (max= 1.5861), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:16,646 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.1083 (max= 1.5861), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:48,555 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.1243 (max= 1.5630), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:48,555 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.1243 (max= 1.5630), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:48,555 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.1243 (max= 1.5630), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:48,555 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.1243 (max= 1.5630), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:48,555 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.1243 (max= 1.5630), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:48,555 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.1243 (max= 1.5630), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:48,555 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.1243 (max= 1.5630), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:29:48,555 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.1243 (max= 1.5630), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:20,394 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.1229 (max= 1.7264), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:20,394 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.1229 (max= 1.7264), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:20,394 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.1229 (max= 1.7264), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:20,394 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.1229 (max= 1.7264), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:20,394 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.1229 (max= 1.7264), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:20,394 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.1229 (max= 1.7264), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:20,394 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.1229 (max= 1.7264), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:20,394 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.1229 (max= 1.7264), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:52,251 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.1102 (max= 1.5017), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:52,251 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.1102 (max= 1.5017), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:52,251 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.1102 (max= 1.5017), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:52,251 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.1102 (max= 1.5017), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:52,251 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.1102 (max= 1.5017), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:52,251 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.1102 (max= 1.5017), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:52,251 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.1102 (max= 1.5017), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:30:52,251 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.1102 (max= 1.5017), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-8000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-8000! Save time: 4.495220899581909 +2025-10-25 17:31:24,114 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.1110 (max= 1.5223), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:31:24,114 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.1110 (max= 1.5223), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:31:24,114 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-25 17:31:24,114 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 17:31:24,114 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-25 17:31:24,114 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 17:31:24,114 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.1110 (max= 1.5223), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:31:24,114 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.1110 (max= 1.5223), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:31:24,114 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.1110 (max= 1.5223), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:31:24,114 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-25 17:31:24,114 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 17:31:24,114 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-25 17:31:24,114 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 17:31:24,114 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-25 17:31:24,114 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.1110 (max= 1.5223), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:31:24,114 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 17:31:24,114 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.1110 (max= 1.5223), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:31:24,114 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.1110 (max= 1.5223), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:31:24,114 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-25 17:31:24,114 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 17:31:24,114 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-25 17:31:24,114 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 17:31:24,114 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-25 17:31:24,114 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 17:31:40,009 - root - INFO - Finished saving the checkpoint in 15.90 seconds +2025-10-25 17:31:40,017 - root - INFO - Finished saving the checkpoint in 15.90 seconds +2025-10-25 17:31:40,017 - root - INFO - Finished saving the checkpoint in 15.90 seconds +2025-10-25 17:31:40,018 - root - INFO - Finished saving the checkpoint in 15.90 seconds +2025-10-25 17:31:40,018 - root - INFO - Finished saving the checkpoint in 15.90 seconds +2025-10-25 17:31:40,019 - root - INFO - Finished saving the checkpoint in 15.90 seconds +2025-10-25 17:31:40,019 - root - INFO - Finished saving the checkpoint in 15.90 seconds +2025-10-25 17:31:40,020 - root - INFO - Finished saving the checkpoint in 15.91 seconds +2025-10-25 17:32:11,798 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.0918 (max= 1.4760), tps=13744, mfu=28.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:11,798 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.0918 (max= 1.4760), tps=13744, mfu=28.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:11,799 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.0918 (max= 1.4760), tps=13744, mfu=28.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:11,799 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.0918 (max= 1.4760), tps=13745, mfu=28.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:11,799 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.0918 (max= 1.4760), tps=13745, mfu=28.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:11,799 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.0918 (max= 1.4760), tps=13744, mfu=28.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:11,799 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.0918 (max= 1.4760), tps=13745, mfu=28.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:11,799 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.0918 (max= 1.4760), tps=13744, mfu=28.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:43,652 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.1117 (max= 1.5009), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:43,652 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.1117 (max= 1.5009), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:43,652 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.1117 (max= 1.5009), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:43,652 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.1117 (max= 1.5009), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:43,652 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.1117 (max= 1.5009), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:43,652 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.1117 (max= 1.5009), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:43,652 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.1117 (max= 1.5009), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:32:43,652 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.1117 (max= 1.5009), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:15,525 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.1176 (max= 1.5047), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:15,525 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.1176 (max= 1.5047), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:15,525 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.1176 (max= 1.5047), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:15,526 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.1176 (max= 1.5047), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:15,526 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.1176 (max= 1.5047), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:15,526 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.1176 (max= 1.5047), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:15,526 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.1176 (max= 1.5047), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:15,526 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.1176 (max= 1.5047), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:47,331 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.1082 (max= 1.5479), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:47,331 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.1082 (max= 1.5479), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:47,331 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.1082 (max= 1.5479), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:47,331 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.1082 (max= 1.5479), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:47,331 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.1082 (max= 1.5479), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:47,331 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.1082 (max= 1.5479), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:47,331 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.1082 (max= 1.5479), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:33:47,331 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.1082 (max= 1.5479), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:19,188 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.1081 (max= 1.6245), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:19,188 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.1081 (max= 1.6245), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:19,188 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.1081 (max= 1.6245), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:19,189 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.1081 (max= 1.6245), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:19,189 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.1081 (max= 1.6245), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:19,189 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.1081 (max= 1.6245), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:19,189 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.1081 (max= 1.6245), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:19,189 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.1081 (max= 1.6245), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:51,010 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.1098 (max= 1.5144), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:51,010 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.1098 (max= 1.5144), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:51,010 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.1098 (max= 1.5144), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:51,010 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.1098 (max= 1.5144), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:51,010 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.1098 (max= 1.5144), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:51,010 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.1098 (max= 1.5144), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:51,010 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.1098 (max= 1.5144), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:34:51,010 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.1098 (max= 1.5144), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:22,828 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.1076 (max= 1.5294), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:22,828 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.1076 (max= 1.5294), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:22,828 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.1076 (max= 1.5294), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:22,828 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.1076 (max= 1.5294), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:22,828 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.1076 (max= 1.5294), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:22,828 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.1076 (max= 1.5294), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:22,828 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.1076 (max= 1.5294), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:22,828 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.1076 (max= 1.5294), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:54,650 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.1021 (max= 1.5241), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:54,650 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.1021 (max= 1.5241), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:54,650 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.1021 (max= 1.5241), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:54,650 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.1021 (max= 1.5241), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:54,650 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.1021 (max= 1.5241), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:54,650 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.1021 (max= 1.5241), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:54,650 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.1021 (max= 1.5241), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:35:54,650 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.1021 (max= 1.5241), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:26,422 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.1133 (max= 1.5212), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:26,422 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.1133 (max= 1.5212), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:26,422 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.1133 (max= 1.5212), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:26,422 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.1133 (max= 1.5212), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:26,422 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.1133 (max= 1.5212), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:26,422 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.1133 (max= 1.5212), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:26,422 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.1133 (max= 1.5212), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:26,422 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.1133 (max= 1.5212), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:58,247 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.0949 (max= 1.5742), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:58,247 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.0949 (max= 1.5742), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:58,247 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.0949 (max= 1.5742), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:58,247 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.0949 (max= 1.5742), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:58,248 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.0949 (max= 1.5742), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:58,248 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.0949 (max= 1.5742), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:58,248 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.0949 (max= 1.5742), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:36:58,248 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.0949 (max= 1.5742), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:37:30,080 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.1041 (max= 1.5814), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:37:30,080 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.1041 (max= 1.5814), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:37:30,080 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.1041 (max= 1.5814), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:37:30,081 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.1041 (max= 1.5814), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:37:30,081 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.1041 (max= 1.5814), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:37:30,081 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.1041 (max= 1.5814), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:37:30,081 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.1041 (max= 1.5814), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:37:30,081 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.1041 (max= 1.5814), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:01,926 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.1243 (max= 1.6615), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:01,926 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.1243 (max= 1.6615), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:01,926 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.1243 (max= 1.6615), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:01,926 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.1243 (max= 1.6615), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:01,926 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.1243 (max= 1.6615), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:01,926 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.1243 (max= 1.6615), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:01,926 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.1243 (max= 1.6615), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:01,926 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.1243 (max= 1.6615), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:33,786 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.1081 (max= 1.4649), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:33,786 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.1081 (max= 1.4649), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:33,786 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.1081 (max= 1.4649), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:33,787 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.1081 (max= 1.4649), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:33,787 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.1081 (max= 1.4649), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:33,787 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.1081 (max= 1.4649), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:33,787 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.1081 (max= 1.4649), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:38:33,787 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.1081 (max= 1.4649), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:05,559 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.1139 (max= 1.5341), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:05,559 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.1139 (max= 1.5341), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:05,559 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.1139 (max= 1.5341), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:05,559 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.1139 (max= 1.5341), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:05,559 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.1139 (max= 1.5341), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:05,559 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.1139 (max= 1.5341), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:05,559 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.1139 (max= 1.5341), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:05,559 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.1139 (max= 1.5341), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:37,481 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.1245 (max= 1.5850), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:37,481 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.1245 (max= 1.5850), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:37,482 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.1245 (max= 1.5850), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:37,482 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.1245 (max= 1.5850), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:37,482 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.1245 (max= 1.5850), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:37,482 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.1245 (max= 1.5850), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:37,482 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.1245 (max= 1.5850), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:39:37,482 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.1245 (max= 1.5850), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:09,427 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.1336 (max= 1.5178), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:09,427 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.1336 (max= 1.5178), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:09,427 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.1336 (max= 1.5178), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:09,427 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.1336 (max= 1.5178), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:09,427 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.1336 (max= 1.5178), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:09,427 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.1336 (max= 1.5178), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:09,427 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.1336 (max= 1.5178), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:09,427 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.1336 (max= 1.5178), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:41,320 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.1160 (max= 1.5079), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:41,320 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.1160 (max= 1.5079), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:41,320 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.1160 (max= 1.5079), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:41,320 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.1160 (max= 1.5079), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:41,320 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.1160 (max= 1.5079), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:41,320 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.1160 (max= 1.5079), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:41,320 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.1160 (max= 1.5079), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:40:41,320 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.1160 (max= 1.5079), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:13,180 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.1191 (max= 1.5618), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:13,180 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.1191 (max= 1.5618), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:13,180 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.1191 (max= 1.5618), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:13,180 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.1191 (max= 1.5618), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:13,180 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.1191 (max= 1.5618), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:13,180 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.1191 (max= 1.5618), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:13,180 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.1191 (max= 1.5618), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:13,180 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.1191 (max= 1.5618), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:45,078 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.1316 (max= 1.4909), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:45,078 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.1316 (max= 1.4909), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:45,078 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.1316 (max= 1.4909), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:45,078 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.1316 (max= 1.4909), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:45,078 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.1316 (max= 1.4909), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:45,078 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.1316 (max= 1.4909), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:45,078 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.1316 (max= 1.4909), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:41:45,078 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.1316 (max= 1.4909), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:16,960 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.1023 (max= 1.4931), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:16,960 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.1023 (max= 1.4931), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:16,960 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.1023 (max= 1.4931), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:16,960 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.1023 (max= 1.4931), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:16,960 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.1023 (max= 1.4931), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:16,960 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.1023 (max= 1.4931), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:16,960 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.1023 (max= 1.4931), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:16,960 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.1023 (max= 1.4931), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:48,770 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.1290 (max= 1.6262), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:48,770 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.1290 (max= 1.6262), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:48,771 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.1290 (max= 1.6262), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:48,771 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.1290 (max= 1.6262), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:48,771 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.1290 (max= 1.6262), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:48,771 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.1290 (max= 1.6262), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:48,771 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.1290 (max= 1.6262), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:42:48,771 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.1290 (max= 1.6262), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:20,718 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.1223 (max= 1.4880), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:20,718 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.1223 (max= 1.4880), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:20,719 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.1223 (max= 1.4880), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:20,719 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.1223 (max= 1.4880), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:20,719 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.1223 (max= 1.4880), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:20,719 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.1223 (max= 1.4880), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:20,719 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.1223 (max= 1.4880), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:20,719 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.1223 (max= 1.4880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:52,590 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.0881 (max= 1.4727), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:52,590 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.0881 (max= 1.4727), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:52,590 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.0881 (max= 1.4727), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:52,590 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.0881 (max= 1.4727), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:52,590 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.0881 (max= 1.4727), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:52,590 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.0881 (max= 1.4727), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:52,590 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.0881 (max= 1.4727), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:43:52,590 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.0881 (max= 1.4727), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:24,478 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.1221 (max= 1.6123), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:24,478 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.1221 (max= 1.6123), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:24,478 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.1221 (max= 1.6123), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:24,478 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.1221 (max= 1.6123), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:24,478 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.1221 (max= 1.6123), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:24,478 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.1221 (max= 1.6123), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:24,478 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.1221 (max= 1.6123), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:24,478 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.1221 (max= 1.6123), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:56,352 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.1157 (max= 1.5452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:56,352 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.1157 (max= 1.5452), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:56,352 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.1157 (max= 1.5452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:56,352 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.1157 (max= 1.5452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:56,352 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.1157 (max= 1.5452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:56,352 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.1157 (max= 1.5452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:56,352 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.1157 (max= 1.5452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:44:56,352 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.1157 (max= 1.5452), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:45:03,318 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5457976 +2025-10-25 17:45:28,233 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.1324 (max= 1.4751), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:45:28,234 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.1324 (max= 1.4751), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:45:28,234 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.1324 (max= 1.4751), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:45:28,234 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.1324 (max= 1.4751), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:45:28,234 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.1324 (max= 1.4751), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:45:28,234 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.1324 (max= 1.4751), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:45:28,234 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.1324 (max= 1.4751), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:45:28,234 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.1324 (max= 1.4751), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:00,224 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.1242 (max= 1.5013), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:00,224 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.1242 (max= 1.5013), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:00,224 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.1242 (max= 1.5013), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:00,224 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.1242 (max= 1.5013), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:00,224 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.1242 (max= 1.5013), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:00,224 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.1242 (max= 1.5013), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:00,224 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.1242 (max= 1.5013), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:00,224 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.1242 (max= 1.5013), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:32,106 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.1046 (max= 1.4888), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:32,106 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.1046 (max= 1.4888), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:32,106 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.1046 (max= 1.4888), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:32,106 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.1046 (max= 1.4888), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:32,106 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.1046 (max= 1.4888), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:32,106 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.1046 (max= 1.4888), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:32,106 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.1046 (max= 1.4888), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:46:32,107 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.1046 (max= 1.4888), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:04,011 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.1256 (max= 1.5785), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:04,011 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.1256 (max= 1.5785), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:04,011 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.1256 (max= 1.5785), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:04,011 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.1256 (max= 1.5785), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:04,011 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.1256 (max= 1.5785), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:04,011 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.1256 (max= 1.5785), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:04,011 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.1256 (max= 1.5785), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:04,011 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.1256 (max= 1.5785), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:35,872 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.1265 (max= 1.5512), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:35,872 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.1265 (max= 1.5512), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:35,872 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.1265 (max= 1.5512), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:35,872 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.1265 (max= 1.5512), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:35,872 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.1265 (max= 1.5512), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:35,872 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.1265 (max= 1.5512), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:35,872 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.1265 (max= 1.5512), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:47:35,872 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.1265 (max= 1.5512), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:07,695 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.1251 (max= 1.4931), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:07,695 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.1251 (max= 1.4931), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:07,695 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.1251 (max= 1.4931), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:07,695 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.1251 (max= 1.4931), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:07,695 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.1251 (max= 1.4931), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:07,695 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.1251 (max= 1.4931), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:07,695 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.1251 (max= 1.4931), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:07,695 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.1251 (max= 1.4931), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:39,542 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.1372 (max= 1.7220), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:39,542 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.1372 (max= 1.7220), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:39,542 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.1372 (max= 1.7220), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:39,542 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.1372 (max= 1.7220), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:39,542 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.1372 (max= 1.7220), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:39,542 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.1372 (max= 1.7220), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:39,542 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.1372 (max= 1.7220), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:48:39,542 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.1372 (max= 1.7220), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:11,456 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.1138 (max= 1.6602), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:11,456 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.1138 (max= 1.6602), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:11,456 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.1138 (max= 1.6602), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:11,456 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.1138 (max= 1.6602), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:11,456 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.1138 (max= 1.6602), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:11,456 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.1138 (max= 1.6602), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:11,456 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.1138 (max= 1.6602), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:11,456 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.1138 (max= 1.6602), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:43,268 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.1324 (max= 1.6573), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:43,268 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.1324 (max= 1.6573), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:43,268 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.1324 (max= 1.6573), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:43,268 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.1324 (max= 1.6573), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:43,268 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.1324 (max= 1.6573), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:43,268 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.1324 (max= 1.6573), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:43,268 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.1324 (max= 1.6573), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:49:43,268 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.1324 (max= 1.6573), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:15,049 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.1167 (max= 1.4739), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:15,049 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.1167 (max= 1.4739), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:15,050 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.1167 (max= 1.4739), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:15,050 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.1167 (max= 1.4739), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:15,050 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.1167 (max= 1.4739), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:15,050 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.1167 (max= 1.4739), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:15,050 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.1167 (max= 1.4739), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:15,050 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.1167 (max= 1.4739), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:30,113 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:4849137 +2025-10-25 17:50:46,973 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.1060 (max= 1.6094), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:46,973 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.1060 (max= 1.6094), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:46,973 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.1060 (max= 1.6094), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:46,973 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.1060 (max= 1.6094), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:46,973 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.1060 (max= 1.6094), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:46,973 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.1060 (max= 1.6094), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:46,973 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.1060 (max= 1.6094), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:50:46,973 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.1060 (max= 1.6094), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:18,825 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.1251 (max= 1.5940), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:18,825 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.1251 (max= 1.5940), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:18,825 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.1251 (max= 1.5940), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:18,825 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.1251 (max= 1.5940), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:18,825 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.1251 (max= 1.5940), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:18,825 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.1251 (max= 1.5940), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:18,825 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.1251 (max= 1.5940), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:18,825 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.1251 (max= 1.5940), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:50,677 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.1120 (max= 1.5280), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:50,677 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.1120 (max= 1.5280), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:50,678 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.1120 (max= 1.5280), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:50,678 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.1120 (max= 1.5280), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:50,678 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.1120 (max= 1.5280), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:50,678 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.1120 (max= 1.5280), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:50,678 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.1120 (max= 1.5280), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:51:50,678 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.1120 (max= 1.5280), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:22,524 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.1102 (max= 1.4924), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:22,524 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.1102 (max= 1.4924), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:22,524 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.1102 (max= 1.4924), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:22,524 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.1102 (max= 1.4924), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:22,524 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.1102 (max= 1.4924), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:22,524 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.1102 (max= 1.4924), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:22,524 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.1102 (max= 1.4924), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:22,524 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.1102 (max= 1.4924), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:54,435 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.1200 (max= 1.5399), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:54,435 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.1200 (max= 1.5399), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:54,435 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.1200 (max= 1.5399), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:54,435 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.1200 (max= 1.5399), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:54,435 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.1200 (max= 1.5399), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:54,435 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.1200 (max= 1.5399), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:54,435 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.1200 (max= 1.5399), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:52:54,435 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.1200 (max= 1.5399), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:26,330 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.1119 (max= 1.4584), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:26,330 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.1119 (max= 1.4584), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:26,330 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.1119 (max= 1.4584), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:26,330 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.1119 (max= 1.4584), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:26,330 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.1119 (max= 1.4584), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:26,330 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.1119 (max= 1.4584), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:26,330 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.1119 (max= 1.4584), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:26,330 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.1119 (max= 1.4584), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:58,180 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.1129 (max= 1.6035), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:58,181 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.1129 (max= 1.6035), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:58,181 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.1129 (max= 1.6035), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:58,181 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.1129 (max= 1.6035), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:58,181 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.1129 (max= 1.6035), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:58,181 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.1129 (max= 1.6035), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:58,181 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.1129 (max= 1.6035), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:53:58,181 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.1129 (max= 1.6035), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:54:30,012 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.1232 (max= 1.5055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:54:30,012 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.1232 (max= 1.5055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:54:30,012 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.1232 (max= 1.5055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:54:30,012 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.1232 (max= 1.5055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:54:30,012 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.1232 (max= 1.5055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:54:30,012 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.1232 (max= 1.5055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:54:30,012 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.1232 (max= 1.5055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:54:30,013 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.1232 (max= 1.5055), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:01,896 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.1068 (max= 1.5574), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:01,897 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.1068 (max= 1.5574), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:01,897 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.1068 (max= 1.5574), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:01,897 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.1068 (max= 1.5574), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:01,897 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.1068 (max= 1.5574), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:01,897 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.1068 (max= 1.5574), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:01,897 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.1068 (max= 1.5574), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:01,897 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.1068 (max= 1.5574), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:33,735 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.0968 (max= 1.5342), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:33,736 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.0968 (max= 1.5342), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:33,736 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.0968 (max= 1.5342), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:33,736 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.0968 (max= 1.5342), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:33,736 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.0968 (max= 1.5342), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:33,736 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.0968 (max= 1.5342), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:33,736 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.0968 (max= 1.5342), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:55:33,736 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.0968 (max= 1.5342), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:05,639 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.1086 (max= 1.6455), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:05,639 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.1086 (max= 1.6455), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:05,639 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.1086 (max= 1.6455), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:05,639 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.1086 (max= 1.6455), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:05,639 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.1086 (max= 1.6455), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:05,639 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.1086 (max= 1.6455), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:05,639 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.1086 (max= 1.6455), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:05,639 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.1086 (max= 1.6455), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:37,527 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.1103 (max= 1.5293), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:37,528 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.1103 (max= 1.5293), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:37,528 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.1103 (max= 1.5293), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:37,528 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.1103 (max= 1.5293), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:37,528 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.1103 (max= 1.5293), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:37,528 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.1103 (max= 1.5293), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:37,528 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.1103 (max= 1.5293), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:56:37,528 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.1103 (max= 1.5293), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:09,444 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.1282 (max= 1.5345), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:09,444 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.1282 (max= 1.5345), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:09,444 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.1282 (max= 1.5345), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:09,444 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.1282 (max= 1.5345), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:09,444 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.1282 (max= 1.5345), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:09,444 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.1282 (max= 1.5345), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:09,444 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.1282 (max= 1.5345), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:09,444 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.1282 (max= 1.5345), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:13,217 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:164873 +2025-10-25 17:57:41,285 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.1576 (max= 1.5301), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:41,285 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.1576 (max= 1.5301), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:41,285 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.1576 (max= 1.5301), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:41,285 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.1576 (max= 1.5301), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:41,285 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.1576 (max= 1.5301), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:41,285 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.1576 (max= 1.5301), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:41,285 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.1576 (max= 1.5301), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:57:41,285 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.1576 (max= 1.5301), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:13,159 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.1330 (max= 1.6203), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:13,159 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.1330 (max= 1.6203), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:13,159 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.1330 (max= 1.6203), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:13,159 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.1330 (max= 1.6203), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:13,159 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.1330 (max= 1.6203), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:13,159 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.1330 (max= 1.6203), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:13,159 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.1330 (max= 1.6203), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:13,159 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.1330 (max= 1.6203), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:45,067 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.1364 (max= 1.7180), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:45,067 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.1364 (max= 1.7180), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:45,067 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.1364 (max= 1.7180), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:45,067 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.1364 (max= 1.7180), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:45,067 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.1364 (max= 1.7180), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:45,067 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.1364 (max= 1.7180), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:45,067 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.1364 (max= 1.7180), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:58:45,067 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.1364 (max= 1.7180), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:16,927 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.1271 (max= 1.6051), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:16,927 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.1271 (max= 1.6051), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:16,927 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.1271 (max= 1.6051), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:16,927 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.1271 (max= 1.6051), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:16,927 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.1271 (max= 1.6051), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:16,927 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.1271 (max= 1.6051), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:16,927 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.1271 (max= 1.6051), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:16,927 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.1271 (max= 1.6051), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:48,836 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.1071 (max= 1.6537), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:48,836 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.1071 (max= 1.6537), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:48,836 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.1071 (max= 1.6537), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:48,836 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.1071 (max= 1.6537), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:48,836 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.1071 (max= 1.6537), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:48,836 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.1071 (max= 1.6537), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:48,836 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.1071 (max= 1.6537), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 17:59:48,836 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.1071 (max= 1.6537), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:20,706 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.1110 (max= 1.7123), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:20,706 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.1110 (max= 1.7123), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:20,706 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.1110 (max= 1.7123), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:20,706 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.1110 (max= 1.7123), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:20,706 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.1110 (max= 1.7123), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:20,706 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.1110 (max= 1.7123), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:20,706 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.1110 (max= 1.7123), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:20,706 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.1110 (max= 1.7123), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:52,557 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.1149 (max= 1.4930), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:52,557 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.1149 (max= 1.4930), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:52,557 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.1149 (max= 1.4930), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:52,557 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.1149 (max= 1.4930), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:52,557 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.1149 (max= 1.4930), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:52,557 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.1149 (max= 1.4930), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:52,557 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.1149 (max= 1.4930), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:00:52,557 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.1149 (max= 1.4930), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:24,423 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.1225 (max= 1.5218), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:24,423 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.1225 (max= 1.5218), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:24,423 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.1225 (max= 1.5218), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:24,423 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.1225 (max= 1.5218), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:24,423 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.1225 (max= 1.5218), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:24,423 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.1225 (max= 1.5218), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:24,423 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.1225 (max= 1.5218), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:24,423 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.1225 (max= 1.5218), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:56,277 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.1444 (max= 1.5427), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:56,277 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.1444 (max= 1.5427), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:56,277 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.1444 (max= 1.5427), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:56,277 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.1444 (max= 1.5427), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:56,277 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.1444 (max= 1.5427), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:56,277 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.1444 (max= 1.5427), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:56,277 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.1444 (max= 1.5427), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:01:56,277 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.1444 (max= 1.5427), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:02:28,131 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.1214 (max= 1.5333), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:02:28,131 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.1214 (max= 1.5333), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:02:28,131 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.1214 (max= 1.5333), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:02:28,131 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.1214 (max= 1.5333), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:02:28,131 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.1214 (max= 1.5333), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:02:28,131 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.1214 (max= 1.5333), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:02:28,131 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.1214 (max= 1.5333), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:02:28,131 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.1214 (max= 1.5333), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:00,004 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.1105 (max= 1.4712), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:00,004 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.1105 (max= 1.4712), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:00,004 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.1105 (max= 1.4712), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:00,004 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.1105 (max= 1.4712), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:00,004 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.1105 (max= 1.4712), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:00,004 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.1105 (max= 1.4712), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:00,004 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.1105 (max= 1.4712), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:00,004 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.1105 (max= 1.4712), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:31,822 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.1241 (max= 1.5581), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:31,822 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.1241 (max= 1.5581), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:31,822 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.1241 (max= 1.5581), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:31,822 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.1241 (max= 1.5581), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:31,822 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.1241 (max= 1.5581), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:31,822 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.1241 (max= 1.5581), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:31,822 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.1241 (max= 1.5581), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:03:31,822 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.1241 (max= 1.5581), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:03,702 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.1367 (max= 1.5625), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:03,702 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.1367 (max= 1.5625), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:03,702 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.1367 (max= 1.5625), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:03,702 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.1367 (max= 1.5625), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:03,702 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.1367 (max= 1.5625), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:03,702 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.1367 (max= 1.5625), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:03,702 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.1367 (max= 1.5625), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:03,702 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.1367 (max= 1.5625), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:35,581 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1146 (max= 1.7293), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:35,581 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1146 (max= 1.7293), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:35,581 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1146 (max= 1.7293), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:35,581 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1146 (max= 1.7293), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:35,581 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1146 (max= 1.7293), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:35,581 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1146 (max= 1.7293), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:35,581 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1146 (max= 1.7293), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:04:35,582 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1146 (max= 1.7293), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:07,571 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.1152 (max= 1.5261), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:07,571 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.1152 (max= 1.5261), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:07,571 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.1152 (max= 1.5261), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:07,571 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.1152 (max= 1.5261), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:07,571 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.1152 (max= 1.5261), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:07,571 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.1152 (max= 1.5261), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:07,571 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.1152 (max= 1.5261), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:07,571 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.1152 (max= 1.5261), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:39,403 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.1267 (max= 1.6803), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:39,404 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.1267 (max= 1.6803), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:39,404 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.1267 (max= 1.6803), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:39,404 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.1267 (max= 1.6803), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:39,404 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.1267 (max= 1.6803), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:39,404 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.1267 (max= 1.6803), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:39,404 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.1267 (max= 1.6803), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:05:39,404 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.1267 (max= 1.6803), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:11,246 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.1109 (max= 1.5731), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:11,246 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.1109 (max= 1.5731), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:11,246 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.1109 (max= 1.5731), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:11,246 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.1109 (max= 1.5731), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:11,246 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.1109 (max= 1.5731), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:11,246 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.1109 (max= 1.5731), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:11,246 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.1109 (max= 1.5731), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:11,246 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.1109 (max= 1.5731), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:43,080 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.1289 (max= 1.5160), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:43,080 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.1289 (max= 1.5160), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:43,080 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.1289 (max= 1.5160), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:43,080 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.1289 (max= 1.5160), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:43,080 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.1289 (max= 1.5160), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:43,080 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.1289 (max= 1.5160), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:43,080 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.1289 (max= 1.5160), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:06:43,080 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.1289 (max= 1.5160), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:14,915 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.1214 (max= 1.6153), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:14,915 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.1214 (max= 1.6153), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:14,915 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.1214 (max= 1.6153), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:14,915 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.1214 (max= 1.6153), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:14,915 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.1214 (max= 1.6153), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:14,915 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.1214 (max= 1.6153), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:14,915 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.1214 (max= 1.6153), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:14,915 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.1214 (max= 1.6153), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:46,817 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.1034 (max= 1.5436), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:46,817 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.1034 (max= 1.5436), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:46,817 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.1034 (max= 1.5436), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:46,817 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.1034 (max= 1.5436), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:46,817 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.1034 (max= 1.5436), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:46,817 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.1034 (max= 1.5436), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:46,817 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.1034 (max= 1.5436), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:07:46,817 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.1034 (max= 1.5436), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:18,735 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.1174 (max= 1.6170), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:18,735 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.1174 (max= 1.6170), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:18,735 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.1174 (max= 1.6170), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:18,735 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.1174 (max= 1.6170), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:18,735 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.1174 (max= 1.6170), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:18,735 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.1174 (max= 1.6170), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:18,735 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.1174 (max= 1.6170), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:18,735 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.1174 (max= 1.6170), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:50,595 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.1421 (max= 1.4880), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:50,595 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.1421 (max= 1.4880), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:50,595 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.1421 (max= 1.4880), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:50,595 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.1421 (max= 1.4880), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:50,595 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.1421 (max= 1.4880), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:50,595 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.1421 (max= 1.4880), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:50,595 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.1421 (max= 1.4880), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:08:50,595 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.1421 (max= 1.4880), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:15,212 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5744034 +2025-10-25 18:09:22,455 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.1109 (max= 1.5712), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:22,455 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.1109 (max= 1.5712), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:22,455 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.1109 (max= 1.5712), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:22,455 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.1109 (max= 1.5712), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:22,455 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.1109 (max= 1.5712), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:22,455 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.1109 (max= 1.5712), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:22,455 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.1109 (max= 1.5712), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:22,456 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.1109 (max= 1.5712), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:54,366 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.1383 (max= 1.5931), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:54,366 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.1383 (max= 1.5931), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:54,367 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.1383 (max= 1.5931), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:54,367 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.1383 (max= 1.5931), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:54,367 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.1383 (max= 1.5931), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:54,367 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.1383 (max= 1.5931), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:54,367 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.1383 (max= 1.5931), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:09:54,367 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.1383 (max= 1.5931), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:26,231 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.1101 (max= 1.5781), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:26,231 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.1101 (max= 1.5781), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:26,231 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.1101 (max= 1.5781), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:26,231 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.1101 (max= 1.5781), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:26,231 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.1101 (max= 1.5781), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:26,231 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.1101 (max= 1.5781), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:26,231 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.1101 (max= 1.5781), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:26,231 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.1101 (max= 1.5781), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:58,182 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.1259 (max= 1.4282), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:58,182 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.1259 (max= 1.4282), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:58,182 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.1259 (max= 1.4282), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:58,182 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.1259 (max= 1.4282), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:58,182 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.1259 (max= 1.4282), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:58,182 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.1259 (max= 1.4282), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:58,182 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.1259 (max= 1.4282), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:10:58,182 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.1259 (max= 1.4282), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:11:30,117 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.0961 (max= 1.5515), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:11:30,117 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.0961 (max= 1.5515), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:11:30,117 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.0961 (max= 1.5515), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:11:30,117 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.0961 (max= 1.5515), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:11:30,117 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.0961 (max= 1.5515), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:11:30,117 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.0961 (max= 1.5515), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:11:30,118 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.0961 (max= 1.5515), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:11:30,118 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.0961 (max= 1.5515), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:02,007 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.1142 (max= 1.5375), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:02,007 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.1142 (max= 1.5375), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:02,007 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.1142 (max= 1.5375), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:02,007 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.1142 (max= 1.5375), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:02,007 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.1142 (max= 1.5375), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:02,007 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.1142 (max= 1.5375), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:02,007 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.1142 (max= 1.5375), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:02,007 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.1142 (max= 1.5375), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:33,849 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.1054 (max= 1.6065), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:33,849 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.1054 (max= 1.6065), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:33,849 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.1054 (max= 1.6065), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:33,849 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.1054 (max= 1.6065), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:33,849 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.1054 (max= 1.6065), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:33,849 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.1054 (max= 1.6065), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:33,849 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.1054 (max= 1.6065), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:12:33,849 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.1054 (max= 1.6065), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:05,624 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.1120 (max= 1.6408), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:05,624 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.1120 (max= 1.6408), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:05,624 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.1120 (max= 1.6408), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:05,624 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.1120 (max= 1.6408), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:05,624 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.1120 (max= 1.6408), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:05,624 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.1120 (max= 1.6408), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:05,624 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.1120 (max= 1.6408), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:05,624 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.1120 (max= 1.6408), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:37,537 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.1212 (max= 1.5525), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:37,537 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.1212 (max= 1.5525), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:37,537 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.1212 (max= 1.5525), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:37,537 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.1212 (max= 1.5525), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:37,537 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.1212 (max= 1.5525), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:37,537 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.1212 (max= 1.5525), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:37,538 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.1212 (max= 1.5525), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:13:37,538 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.1212 (max= 1.5525), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:09,364 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.1206 (max= 1.4799), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:09,364 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.1206 (max= 1.4799), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:09,364 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.1206 (max= 1.4799), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:09,364 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.1206 (max= 1.4799), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:09,364 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.1206 (max= 1.4799), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:09,364 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.1206 (max= 1.4799), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:09,364 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.1206 (max= 1.4799), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:09,365 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.1206 (max= 1.4799), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:41,190 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.1065 (max= 1.6290), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:41,190 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.1065 (max= 1.6290), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:41,190 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.1065 (max= 1.6290), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:41,190 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.1065 (max= 1.6290), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:41,190 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.1065 (max= 1.6290), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:41,190 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.1065 (max= 1.6290), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:41,190 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.1065 (max= 1.6290), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:14:41,190 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.1065 (max= 1.6290), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:13,045 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.1239 (max= 1.5405), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:13,045 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.1239 (max= 1.5405), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:13,046 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.1239 (max= 1.5405), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:13,046 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.1239 (max= 1.5405), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:13,046 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.1239 (max= 1.5405), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:13,046 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.1239 (max= 1.5405), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:13,046 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.1239 (max= 1.5405), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:13,046 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.1239 (max= 1.5405), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:44,863 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.1219 (max= 1.6013), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:44,863 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.1219 (max= 1.6013), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:44,863 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.1219 (max= 1.6013), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:44,863 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.1219 (max= 1.6013), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:44,863 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.1219 (max= 1.6013), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:44,863 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.1219 (max= 1.6013), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:44,863 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.1219 (max= 1.6013), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:15:44,863 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.1219 (max= 1.6013), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:16,694 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.0885 (max= 1.6533), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:16,694 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.0885 (max= 1.6533), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:16,694 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.0885 (max= 1.6533), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:16,694 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.0885 (max= 1.6533), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:16,694 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.0885 (max= 1.6533), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:16,694 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.0885 (max= 1.6533), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:16,694 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.0885 (max= 1.6533), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:16,695 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.0885 (max= 1.6533), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:48,539 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.1160 (max= 1.5638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:48,539 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.1160 (max= 1.5638), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:48,540 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.1160 (max= 1.5638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:48,540 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.1160 (max= 1.5638), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:48,540 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.1160 (max= 1.5638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:48,540 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.1160 (max= 1.5638), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:48,540 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.1160 (max= 1.5638), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:16:48,540 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.1160 (max= 1.5638), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:20,449 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.1228 (max= 1.4930), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:20,449 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.1228 (max= 1.4930), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:20,449 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.1228 (max= 1.4930), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:20,449 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.1228 (max= 1.4930), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:20,449 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.1228 (max= 1.4930), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:20,449 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.1228 (max= 1.4930), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:20,449 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.1228 (max= 1.4930), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:20,449 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.1228 (max= 1.4930), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:52,403 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.1126 (max= 1.5505), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:52,403 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.1126 (max= 1.5505), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:52,403 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.1126 (max= 1.5505), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:52,403 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.1126 (max= 1.5505), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:52,403 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.1126 (max= 1.5505), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:52,403 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.1126 (max= 1.5505), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:52,403 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.1126 (max= 1.5505), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:17:52,403 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.1126 (max= 1.5505), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:24,157 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.1115 (max= 1.4671), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:24,157 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.1115 (max= 1.4671), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:24,157 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.1115 (max= 1.4671), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:24,157 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.1115 (max= 1.4671), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:24,157 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.1115 (max= 1.4671), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:24,157 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.1115 (max= 1.4671), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:24,157 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.1115 (max= 1.4671), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:24,157 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.1115 (max= 1.4671), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:56,040 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.1025 (max= 1.5093), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:56,040 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.1025 (max= 1.5093), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:56,041 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.1025 (max= 1.5093), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:56,041 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.1025 (max= 1.5093), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:56,041 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.1025 (max= 1.5093), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:56,041 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.1025 (max= 1.5093), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:56,041 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.1025 (max= 1.5093), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:18:56,041 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.1025 (max= 1.5093), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:27,841 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.1174 (max= 1.5267), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:27,841 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.1174 (max= 1.5267), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:27,841 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.1174 (max= 1.5267), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:27,841 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.1174 (max= 1.5267), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:27,841 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.1174 (max= 1.5267), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:27,841 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.1174 (max= 1.5267), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:27,841 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.1174 (max= 1.5267), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:27,841 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.1174 (max= 1.5267), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:59,694 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.1278 (max= 1.5216), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:59,694 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.1278 (max= 1.5216), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:59,694 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.1278 (max= 1.5216), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:59,694 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.1278 (max= 1.5216), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:59,694 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.1278 (max= 1.5216), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:59,694 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.1278 (max= 1.5216), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:59,694 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.1278 (max= 1.5216), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:19:59,695 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.1278 (max= 1.5216), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:20:31,530 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.1068 (max= 1.7018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:20:31,530 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.1068 (max= 1.7018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:20:31,530 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.1068 (max= 1.7018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:20:31,530 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.1068 (max= 1.7018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:20:31,530 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.1068 (max= 1.7018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:20:31,530 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.1068 (max= 1.7018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:20:31,530 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.1068 (max= 1.7018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:20:31,530 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.1068 (max= 1.7018), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:03,370 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.1132 (max= 1.5130), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:03,370 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.1132 (max= 1.5130), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:03,370 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.1132 (max= 1.5130), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:03,370 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.1132 (max= 1.5130), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:03,371 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.1132 (max= 1.5130), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:03,371 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.1132 (max= 1.5130), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:03,371 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.1132 (max= 1.5130), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:03,371 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.1132 (max= 1.5130), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:35,196 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.1236 (max= 1.5829), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:35,196 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.1236 (max= 1.5829), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:35,196 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.1236 (max= 1.5829), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:35,196 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.1236 (max= 1.5829), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:35,196 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.1236 (max= 1.5829), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:35,196 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.1236 (max= 1.5829), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:35,196 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.1236 (max= 1.5829), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:21:35,196 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.1236 (max= 1.5829), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:07,034 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.1107 (max= 1.4522), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:07,034 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.1107 (max= 1.4522), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:07,034 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.1107 (max= 1.4522), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:07,034 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.1107 (max= 1.4522), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:07,034 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.1107 (max= 1.4522), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:07,034 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.1107 (max= 1.4522), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:07,034 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.1107 (max= 1.4522), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:07,034 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.1107 (max= 1.4522), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:38,910 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.1115 (max= 1.7652), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:38,910 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.1115 (max= 1.7652), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:38,910 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.1115 (max= 1.7652), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:38,910 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.1115 (max= 1.7652), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:38,910 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.1115 (max= 1.7652), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:38,910 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.1115 (max= 1.7652), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:38,910 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.1115 (max= 1.7652), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:22:38,910 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.1115 (max= 1.7652), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:10,702 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.1323 (max= 1.5028), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:10,702 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.1323 (max= 1.5028), tps=20617, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:10,702 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.1323 (max= 1.5028), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:10,702 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.1323 (max= 1.5028), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:10,702 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.1323 (max= 1.5028), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:10,702 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.1323 (max= 1.5028), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:10,702 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.1323 (max= 1.5028), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:10,702 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.1323 (max= 1.5028), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:42,522 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.1091 (max= 1.4717), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:42,522 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.1091 (max= 1.4717), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:42,522 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.1091 (max= 1.4717), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:42,522 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.1091 (max= 1.4717), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:42,522 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.1091 (max= 1.4717), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:42,522 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.1091 (max= 1.4717), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:42,522 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.1091 (max= 1.4717), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:23:42,522 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.1091 (max= 1.4717), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:14,384 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.1258 (max= 1.6200), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:14,385 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.1258 (max= 1.6200), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:14,385 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.1258 (max= 1.6200), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:14,385 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.1258 (max= 1.6200), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:14,385 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.1258 (max= 1.6200), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:14,385 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.1258 (max= 1.6200), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:14,385 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.1258 (max= 1.6200), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:14,385 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.1258 (max= 1.6200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-9000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-9000! Save time: 4.531578302383423 +2025-10-25 18:24:46,304 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.1276 (max= 1.5590), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:46,304 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.1276 (max= 1.5590), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:46,304 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-25 18:24:46,304 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 18:24:46,305 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-25 18:24:46,305 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 18:24:46,304 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.1276 (max= 1.5590), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:46,305 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.1276 (max= 1.5590), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:46,305 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-25 18:24:46,305 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 18:24:46,305 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.1276 (max= 1.5590), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:46,305 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-25 18:24:46,305 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 18:24:46,305 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.1276 (max= 1.5590), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:46,305 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-25 18:24:46,305 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 18:24:46,305 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-25 18:24:46,305 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 18:24:46,305 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.1276 (max= 1.5590), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:46,305 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.1276 (max= 1.5590), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:24:46,305 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-25 18:24:46,305 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-25 18:24:46,305 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 18:24:46,305 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 18:25:01,692 - root - INFO - Finished saving the checkpoint in 15.39 seconds +2025-10-25 18:25:01,700 - root - INFO - Finished saving the checkpoint in 15.40 seconds +2025-10-25 18:25:01,700 - root - INFO - Finished saving the checkpoint in 15.40 seconds +2025-10-25 18:25:01,700 - root - INFO - Finished saving the checkpoint in 15.40 seconds +2025-10-25 18:25:01,700 - root - INFO - Finished saving the checkpoint in 15.40 seconds +2025-10-25 18:25:01,700 - root - INFO - Finished saving the checkpoint in 15.40 seconds +2025-10-25 18:25:01,701 - root - INFO - Finished saving the checkpoint in 15.40 seconds +2025-10-25 18:25:01,704 - root - INFO - Finished saving the checkpoint in 15.40 seconds +2025-10-25 18:25:33,530 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.1292 (max= 1.4947), tps=13878, mfu=28.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:25:33,531 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.1292 (max= 1.4947), tps=13878, mfu=28.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:25:33,531 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.1292 (max= 1.4947), tps=13878, mfu=28.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:25:33,531 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.1292 (max= 1.4947), tps=13878, mfu=28.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:25:33,531 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.1292 (max= 1.4947), tps=13878, mfu=28.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:25:33,531 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.1292 (max= 1.4947), tps=13878, mfu=28.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:25:33,531 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.1292 (max= 1.4947), tps=13878, mfu=28.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:25:33,531 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.1292 (max= 1.4947), tps=13878, mfu=28.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:05,394 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.1301 (max= 1.4989), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:05,394 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.1301 (max= 1.4989), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:05,394 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.1301 (max= 1.4989), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:05,394 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.1301 (max= 1.4989), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:05,394 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.1301 (max= 1.4989), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:05,394 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.1301 (max= 1.4989), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:05,394 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.1301 (max= 1.4989), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:05,394 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.1301 (max= 1.4989), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:37,224 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.1239 (max= 1.6366), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:37,224 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.1239 (max= 1.6366), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:37,224 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.1239 (max= 1.6366), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:37,225 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.1239 (max= 1.6366), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:37,225 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.1239 (max= 1.6366), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:37,225 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.1239 (max= 1.6366), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:37,225 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.1239 (max= 1.6366), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:26:37,225 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.1239 (max= 1.6366), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:09,096 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.1200 (max= 1.5783), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:09,096 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.1200 (max= 1.5783), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:09,096 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.1200 (max= 1.5783), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:09,096 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.1200 (max= 1.5783), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:09,096 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.1200 (max= 1.5783), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:09,096 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.1200 (max= 1.5783), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:09,096 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.1200 (max= 1.5783), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:09,096 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.1200 (max= 1.5783), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:40,940 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.1407 (max= 1.5364), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:40,940 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.1407 (max= 1.5364), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:40,940 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.1407 (max= 1.5364), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:40,940 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.1407 (max= 1.5364), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:40,940 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.1407 (max= 1.5364), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:40,940 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.1407 (max= 1.5364), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:40,940 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.1407 (max= 1.5364), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:27:40,940 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.1407 (max= 1.5364), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:12,777 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.1092 (max= 1.7184), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:12,777 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.1092 (max= 1.7184), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:12,777 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.1092 (max= 1.7184), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:12,777 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.1092 (max= 1.7184), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:12,777 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.1092 (max= 1.7184), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:12,777 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.1092 (max= 1.7184), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:12,777 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.1092 (max= 1.7184), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:12,777 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.1092 (max= 1.7184), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:44,678 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.1137 (max= 1.5354), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:44,678 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.1137 (max= 1.5354), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:44,678 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.1137 (max= 1.5354), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:44,678 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.1137 (max= 1.5354), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:44,678 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.1137 (max= 1.5354), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:44,678 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.1137 (max= 1.5354), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:44,678 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.1137 (max= 1.5354), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:28:44,678 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.1137 (max= 1.5354), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:16,616 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.1126 (max= 1.6603), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:16,616 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.1126 (max= 1.6603), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:16,616 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.1126 (max= 1.6603), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:16,616 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.1126 (max= 1.6603), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:16,616 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.1126 (max= 1.6603), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:16,616 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.1126 (max= 1.6603), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:16,616 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.1126 (max= 1.6603), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:16,617 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.1126 (max= 1.6603), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:48,480 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.1143 (max= 1.5416), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:48,480 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.1143 (max= 1.5416), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:48,480 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.1143 (max= 1.5416), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:48,480 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.1143 (max= 1.5416), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:48,480 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.1143 (max= 1.5416), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:48,480 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.1143 (max= 1.5416), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:48,480 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.1143 (max= 1.5416), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:29:48,480 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.1143 (max= 1.5416), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:20,280 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.1421 (max= 1.5849), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:20,280 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.1421 (max= 1.5849), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:20,280 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.1421 (max= 1.5849), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:20,280 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.1421 (max= 1.5849), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:20,280 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.1421 (max= 1.5849), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:20,280 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.1421 (max= 1.5849), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:20,280 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.1421 (max= 1.5849), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:20,280 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.1421 (max= 1.5849), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:52,115 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.1097 (max= 1.6481), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:52,115 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.1097 (max= 1.6481), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:52,115 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.1097 (max= 1.6481), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:52,116 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.1097 (max= 1.6481), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:52,116 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.1097 (max= 1.6481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:52,116 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.1097 (max= 1.6481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:52,116 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.1097 (max= 1.6481), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:30:52,116 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.1097 (max= 1.6481), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:23,978 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.1160 (max= 1.6270), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:23,978 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.1160 (max= 1.6270), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:23,978 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.1160 (max= 1.6270), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:23,978 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.1160 (max= 1.6270), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:23,978 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.1160 (max= 1.6270), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:23,978 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.1160 (max= 1.6270), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:23,978 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.1160 (max= 1.6270), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:23,978 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.1160 (max= 1.6270), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:55,785 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.1105 (max= 1.5198), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:55,785 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.1105 (max= 1.5198), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:55,785 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.1105 (max= 1.5198), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:55,785 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.1105 (max= 1.5198), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:55,785 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.1105 (max= 1.5198), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:55,785 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.1105 (max= 1.5198), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:55,785 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.1105 (max= 1.5198), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:31:55,785 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.1105 (max= 1.5198), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:27,696 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.1230 (max= 1.6393), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:27,696 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.1230 (max= 1.6393), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:27,696 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.1230 (max= 1.6393), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:27,696 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.1230 (max= 1.6393), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:27,696 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.1230 (max= 1.6393), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:27,696 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.1230 (max= 1.6393), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:27,696 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.1230 (max= 1.6393), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:27,696 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.1230 (max= 1.6393), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:59,589 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.1086 (max= 1.4982), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:59,589 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.1086 (max= 1.4982), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:59,589 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.1086 (max= 1.4982), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:59,589 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.1086 (max= 1.4982), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:59,589 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.1086 (max= 1.4982), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:59,589 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.1086 (max= 1.4982), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:59,589 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.1086 (max= 1.4982), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:32:59,589 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.1086 (max= 1.4982), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:33:31,514 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.1298 (max= 1.6441), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:33:31,514 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.1298 (max= 1.6441), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:33:31,514 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.1298 (max= 1.6441), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:33:31,515 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.1298 (max= 1.6441), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:33:31,515 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.1298 (max= 1.6441), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:33:31,515 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.1298 (max= 1.6441), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:33:31,515 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.1298 (max= 1.6441), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:33:31,515 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.1298 (max= 1.6441), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:03,465 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.1385 (max= 1.6527), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:03,465 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.1385 (max= 1.6527), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:03,465 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.1385 (max= 1.6527), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:03,465 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.1385 (max= 1.6527), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:03,465 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.1385 (max= 1.6527), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:03,465 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.1385 (max= 1.6527), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:03,465 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.1385 (max= 1.6527), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:03,465 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.1385 (max= 1.6527), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:35,393 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.1077 (max= 1.5270), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:35,393 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.1077 (max= 1.5270), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:35,393 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.1077 (max= 1.5270), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:35,393 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.1077 (max= 1.5270), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:35,393 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.1077 (max= 1.5270), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:35,393 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.1077 (max= 1.5270), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:35,393 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.1077 (max= 1.5270), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:34:35,393 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.1077 (max= 1.5270), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:07,227 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.1168 (max= 1.5358), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:07,227 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.1168 (max= 1.5358), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:07,227 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.1168 (max= 1.5358), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:07,227 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.1168 (max= 1.5358), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:07,227 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.1168 (max= 1.5358), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:07,227 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.1168 (max= 1.5358), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:07,227 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.1168 (max= 1.5358), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:07,227 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.1168 (max= 1.5358), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:39,120 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.1297 (max= 1.5494), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:39,120 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.1297 (max= 1.5494), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:39,120 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.1297 (max= 1.5494), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:39,120 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.1297 (max= 1.5494), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:39,120 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.1297 (max= 1.5494), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:39,120 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.1297 (max= 1.5494), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:39,120 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.1297 (max= 1.5494), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:35:39,121 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.1297 (max= 1.5494), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:10,948 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.1062 (max= 1.5221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:10,948 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.1062 (max= 1.5221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:10,948 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.1062 (max= 1.5221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:10,948 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.1062 (max= 1.5221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:10,948 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.1062 (max= 1.5221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:10,948 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.1062 (max= 1.5221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:10,949 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.1062 (max= 1.5221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:10,949 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.1062 (max= 1.5221), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:42,748 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.1106 (max= 1.5579), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:42,748 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.1106 (max= 1.5579), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:42,748 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.1106 (max= 1.5579), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:42,749 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.1106 (max= 1.5579), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:42,749 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.1106 (max= 1.5579), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:42,749 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.1106 (max= 1.5579), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:42,749 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.1106 (max= 1.5579), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:36:42,749 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.1106 (max= 1.5579), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:14,614 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.1222 (max= 1.5302), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:14,614 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.1222 (max= 1.5302), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:14,614 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.1222 (max= 1.5302), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:14,614 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.1222 (max= 1.5302), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:14,614 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.1222 (max= 1.5302), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:14,614 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.1222 (max= 1.5302), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:14,614 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.1222 (max= 1.5302), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:14,614 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.1222 (max= 1.5302), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:46,526 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.1322 (max= 1.5119), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:46,526 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.1322 (max= 1.5119), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:46,526 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.1322 (max= 1.5119), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:46,526 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.1322 (max= 1.5119), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:46,526 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.1322 (max= 1.5119), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:46,526 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.1322 (max= 1.5119), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:46,526 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.1322 (max= 1.5119), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:37:46,526 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.1322 (max= 1.5119), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:18,377 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.1340 (max= 1.4748), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:18,377 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.1340 (max= 1.4748), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:18,377 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.1340 (max= 1.4748), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:18,377 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.1340 (max= 1.4748), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:18,377 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.1340 (max= 1.4748), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:18,377 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.1340 (max= 1.4748), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:18,377 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.1340 (max= 1.4748), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:18,377 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.1340 (max= 1.4748), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:50,272 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.1286 (max= 1.6218), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:50,272 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.1286 (max= 1.6218), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:50,272 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.1286 (max= 1.6218), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:50,272 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.1286 (max= 1.6218), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:50,272 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.1286 (max= 1.6218), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:50,272 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.1286 (max= 1.6218), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:50,272 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.1286 (max= 1.6218), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:38:50,272 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.1286 (max= 1.6218), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:22,139 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.1279 (max= 1.5247), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:22,139 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.1279 (max= 1.5247), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:22,139 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.1279 (max= 1.5247), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:22,139 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.1279 (max= 1.5247), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:22,139 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.1279 (max= 1.5247), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:22,139 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.1279 (max= 1.5247), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:22,139 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.1279 (max= 1.5247), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:22,139 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.1279 (max= 1.5247), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:54,049 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.1076 (max= 1.5018), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:54,049 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.1076 (max= 1.5018), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:54,049 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.1076 (max= 1.5018), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:54,049 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.1076 (max= 1.5018), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:54,049 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.1076 (max= 1.5018), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:54,049 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.1076 (max= 1.5018), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:54,049 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.1076 (max= 1.5018), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:39:54,049 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.1076 (max= 1.5018), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:25,947 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.1310 (max= 1.5383), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:25,947 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.1310 (max= 1.5383), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:25,948 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.1310 (max= 1.5383), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:25,948 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.1310 (max= 1.5383), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:25,948 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.1310 (max= 1.5383), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:25,948 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.1310 (max= 1.5383), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:25,948 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.1310 (max= 1.5383), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:25,948 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.1310 (max= 1.5383), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:57,812 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.1292 (max= 1.7575), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:57,812 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.1292 (max= 1.7575), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:57,812 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.1292 (max= 1.7575), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:57,813 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.1292 (max= 1.7575), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:57,813 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.1292 (max= 1.7575), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:57,813 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.1292 (max= 1.7575), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:57,813 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.1292 (max= 1.7575), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:40:57,813 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.1292 (max= 1.7575), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:41:29,707 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.1728 (max= 2.0837), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:41:29,707 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.1728 (max= 2.0837), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:41:29,707 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.1728 (max= 2.0837), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:41:29,707 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.1728 (max= 2.0837), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:41:29,707 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.1728 (max= 2.0837), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:41:29,707 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.1728 (max= 2.0837), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:41:29,707 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.1728 (max= 2.0837), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:41:29,707 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.1728 (max= 2.0837), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:01,541 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.1454 (max= 1.5850), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:01,541 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.1454 (max= 1.5850), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:01,541 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.1454 (max= 1.5850), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:01,542 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.1454 (max= 1.5850), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:01,542 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.1454 (max= 1.5850), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:01,542 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.1454 (max= 1.5850), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:01,542 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.1454 (max= 1.5850), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:01,542 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.1454 (max= 1.5850), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:33,384 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.0991 (max= 1.6700), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:33,384 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.0991 (max= 1.6700), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:33,384 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.0991 (max= 1.6700), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:33,384 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.0991 (max= 1.6700), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:33,384 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.0991 (max= 1.6700), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:33,384 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.0991 (max= 1.6700), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:33,384 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.0991 (max= 1.6700), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:42:33,384 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.0991 (max= 1.6700), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:05,209 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.1140 (max= 1.6165), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:05,210 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.1140 (max= 1.6165), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:05,210 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.1140 (max= 1.6165), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:05,210 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.1140 (max= 1.6165), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:05,210 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.1140 (max= 1.6165), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:05,210 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.1140 (max= 1.6165), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:05,210 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.1140 (max= 1.6165), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:05,210 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.1140 (max= 1.6165), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:37,080 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.1259 (max= 1.5831), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:37,080 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.1259 (max= 1.5831), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:37,080 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.1259 (max= 1.5831), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:37,080 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.1259 (max= 1.5831), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:37,080 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.1259 (max= 1.5831), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:37,080 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.1259 (max= 1.5831), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:37,080 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.1259 (max= 1.5831), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:43:37,080 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.1259 (max= 1.5831), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:09,007 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.1287 (max= 1.6964), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:09,007 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.1287 (max= 1.6964), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:09,007 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.1287 (max= 1.6964), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:09,007 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.1287 (max= 1.6964), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:09,007 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.1287 (max= 1.6964), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:09,007 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.1287 (max= 1.6964), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:09,007 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.1287 (max= 1.6964), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:09,007 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.1287 (max= 1.6964), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:40,842 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.1281 (max= 1.5610), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:40,842 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.1281 (max= 1.5610), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:40,842 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.1281 (max= 1.5610), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:40,843 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.1281 (max= 1.5610), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:40,843 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.1281 (max= 1.5610), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:40,843 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.1281 (max= 1.5610), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:40,843 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.1281 (max= 1.5610), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:44:40,843 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.1281 (max= 1.5610), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,741 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.1158 (max= 1.6576), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,741 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.1158 (max= 1.6576), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,741 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.1158 (max= 1.6576), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,741 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.1158 (max= 1.6576), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,741 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.1158 (max= 1.6576), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,741 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.1158 (max= 1.6576), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,741 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.1158 (max= 1.6576), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,741 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.1158 (max= 1.6576), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:12,750 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5464900 +2025-10-25 18:45:44,644 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.0962 (max= 1.5181), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:44,644 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.0962 (max= 1.5181), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:44,644 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.0962 (max= 1.5181), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:44,645 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.0962 (max= 1.5181), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:44,645 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.0962 (max= 1.5181), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:44,645 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.0962 (max= 1.5181), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:44,645 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.0962 (max= 1.5181), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:45:44,645 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.0962 (max= 1.5181), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:16,495 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.1274 (max= 1.5618), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:16,495 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.1274 (max= 1.5618), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:16,495 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.1274 (max= 1.5618), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:16,495 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.1274 (max= 1.5618), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:16,495 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.1274 (max= 1.5618), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:16,495 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.1274 (max= 1.5618), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:16,495 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.1274 (max= 1.5618), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:16,495 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.1274 (max= 1.5618), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:48,262 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.1288 (max= 1.4404), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:48,262 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.1288 (max= 1.4404), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:48,262 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.1288 (max= 1.4404), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:48,262 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.1288 (max= 1.4404), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:48,262 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.1288 (max= 1.4404), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:48,262 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.1288 (max= 1.4404), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:48,262 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.1288 (max= 1.4404), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:46:48,262 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.1288 (max= 1.4404), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:20,118 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.1275 (max= 1.8049), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:20,118 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.1275 (max= 1.8049), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:20,118 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.1275 (max= 1.8049), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:20,118 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.1275 (max= 1.8049), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:20,118 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.1275 (max= 1.8049), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:20,118 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.1275 (max= 1.8049), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:20,118 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.1275 (max= 1.8049), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:20,118 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.1275 (max= 1.8049), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:52,024 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.1172 (max= 1.4803), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:52,024 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.1172 (max= 1.4803), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:52,024 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.1172 (max= 1.4803), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:52,024 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.1172 (max= 1.4803), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:52,024 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.1172 (max= 1.4803), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:52,024 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.1172 (max= 1.4803), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:52,024 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.1172 (max= 1.4803), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:47:52,024 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.1172 (max= 1.4803), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:23,912 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.1344 (max= 1.8257), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:23,912 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.1344 (max= 1.8257), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:23,912 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.1344 (max= 1.8257), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:23,912 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.1344 (max= 1.8257), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:23,912 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.1344 (max= 1.8257), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:23,912 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.1344 (max= 1.8257), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:23,912 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.1344 (max= 1.8257), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:23,912 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.1344 (max= 1.8257), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:55,853 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.1088 (max= 1.5405), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:55,853 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.1088 (max= 1.5405), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:55,853 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.1088 (max= 1.5405), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:55,853 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.1088 (max= 1.5405), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:55,853 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.1088 (max= 1.5405), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:55,853 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.1088 (max= 1.5405), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:55,853 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.1088 (max= 1.5405), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:48:55,854 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.1088 (max= 1.5405), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:27,771 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.0854 (max= 1.5530), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:27,771 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.0854 (max= 1.5530), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:27,772 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.0854 (max= 1.5530), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:27,772 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.0854 (max= 1.5530), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:27,772 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.0854 (max= 1.5530), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:27,772 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.0854 (max= 1.5530), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:27,772 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.0854 (max= 1.5530), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:27,772 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.0854 (max= 1.5530), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:59,621 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.1141 (max= 1.5239), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:59,621 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.1141 (max= 1.5239), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:59,621 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.1141 (max= 1.5239), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:59,621 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.1141 (max= 1.5239), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:59,621 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.1141 (max= 1.5239), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:59,621 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.1141 (max= 1.5239), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:59,621 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.1141 (max= 1.5239), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:49:59,621 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.1141 (max= 1.5239), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:50:31,514 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.1260 (max= 1.7647), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:50:31,514 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.1260 (max= 1.7647), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:50:31,514 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.1260 (max= 1.7647), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:50:31,514 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.1260 (max= 1.7647), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:50:31,514 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.1260 (max= 1.7647), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:50:31,514 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.1260 (max= 1.7647), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:50:31,514 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.1260 (max= 1.7647), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:50:31,514 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.1260 (max= 1.7647), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:03,347 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.1094 (max= 1.6149), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:03,347 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.1094 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:03,347 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.1094 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:03,347 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.1094 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:03,347 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.1094 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:03,347 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.1094 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:03,347 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.1094 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:03,347 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.1094 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:35,246 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.1030 (max= 1.5565), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:35,246 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.1030 (max= 1.5565), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:35,246 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.1030 (max= 1.5565), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:35,247 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.1030 (max= 1.5565), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:35,247 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.1030 (max= 1.5565), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:35,247 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.1030 (max= 1.5565), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:35,247 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.1030 (max= 1.5565), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:51:35,247 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.1030 (max= 1.5565), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:07,096 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.1192 (max= 1.5864), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:07,096 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.1192 (max= 1.5864), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:07,096 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.1192 (max= 1.5864), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:07,096 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.1192 (max= 1.5864), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:07,096 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.1192 (max= 1.5864), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:07,096 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.1192 (max= 1.5864), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:07,096 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.1192 (max= 1.5864), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:07,096 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.1192 (max= 1.5864), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:38,938 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.1036 (max= 1.8993), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:38,938 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.1036 (max= 1.8993), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:38,938 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.1036 (max= 1.8993), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:38,939 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.1036 (max= 1.8993), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:38,939 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.1036 (max= 1.8993), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:38,939 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.1036 (max= 1.8993), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:38,939 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.1036 (max= 1.8993), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:52:38,939 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.1036 (max= 1.8993), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:10,744 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.1319 (max= 1.7202), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:10,744 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.1319 (max= 1.7202), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:10,744 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.1319 (max= 1.7202), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:10,744 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.1319 (max= 1.7202), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:10,744 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.1319 (max= 1.7202), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:10,744 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.1319 (max= 1.7202), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:10,744 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.1319 (max= 1.7202), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:10,744 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.1319 (max= 1.7202), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:42,624 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.1256 (max= 1.6938), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:42,624 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.1256 (max= 1.6938), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:42,624 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.1256 (max= 1.6938), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:42,624 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.1256 (max= 1.6938), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:42,624 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.1256 (max= 1.6938), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:42,624 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.1256 (max= 1.6938), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:42,624 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.1256 (max= 1.6938), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:53:42,625 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.1256 (max= 1.6938), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:14,528 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.1185 (max= 1.5464), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:14,529 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.1185 (max= 1.5464), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:14,529 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.1185 (max= 1.5464), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:14,529 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.1185 (max= 1.5464), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:14,529 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.1185 (max= 1.5464), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:14,529 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.1185 (max= 1.5464), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:14,529 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.1185 (max= 1.5464), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:14,529 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.1185 (max= 1.5464), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:46,343 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.1053 (max= 1.6496), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:46,343 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.1053 (max= 1.6496), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:46,343 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.1053 (max= 1.6496), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:46,343 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.1053 (max= 1.6496), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:46,343 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.1053 (max= 1.6496), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:46,343 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.1053 (max= 1.6496), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:46,343 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.1053 (max= 1.6496), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:54:46,343 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.1053 (max= 1.6496), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:18,234 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.1226 (max= 1.5458), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:18,234 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.1226 (max= 1.5458), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:18,234 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.1226 (max= 1.5458), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:18,234 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.1226 (max= 1.5458), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:18,234 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.1226 (max= 1.5458), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:18,234 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.1226 (max= 1.5458), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:18,234 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.1226 (max= 1.5458), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:18,234 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.1226 (max= 1.5458), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:50,126 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.1126 (max= 1.6233), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:50,126 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.1126 (max= 1.6233), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:50,127 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.1126 (max= 1.6233), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:50,127 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.1126 (max= 1.6233), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:50,127 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.1126 (max= 1.6233), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:50,127 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.1126 (max= 1.6233), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:50,127 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.1126 (max= 1.6233), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:55:50,127 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.1126 (max= 1.6233), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:22,063 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.1192 (max= 1.6611), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:22,063 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.1192 (max= 1.6611), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:22,063 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.1192 (max= 1.6611), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:22,063 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.1192 (max= 1.6611), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:22,063 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.1192 (max= 1.6611), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:22,063 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.1192 (max= 1.6611), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:22,063 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.1192 (max= 1.6611), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:22,063 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.1192 (max= 1.6611), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:53,875 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.0999 (max= 1.6144), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:53,875 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.0999 (max= 1.6144), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:53,875 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.0999 (max= 1.6144), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:53,875 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.0999 (max= 1.6144), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:53,875 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.0999 (max= 1.6144), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:53,875 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.0999 (max= 1.6144), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:53,875 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.0999 (max= 1.6144), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:56:53,875 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.0999 (max= 1.6144), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:25,764 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.1118 (max= 1.6052), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:25,764 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.1118 (max= 1.6052), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:25,764 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.1118 (max= 1.6052), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:25,764 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.1118 (max= 1.6052), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:25,764 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.1118 (max= 1.6052), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:25,764 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.1118 (max= 1.6052), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:25,764 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.1118 (max= 1.6052), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:25,764 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.1118 (max= 1.6052), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:57,705 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.0928 (max= 1.5413), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:57,705 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.0928 (max= 1.5413), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:57,706 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.0928 (max= 1.5413), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:57,706 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.0928 (max= 1.5413), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:57,706 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.0928 (max= 1.5413), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:57,706 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.0928 (max= 1.5413), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:57,706 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.0928 (max= 1.5413), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:57:57,706 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.0928 (max= 1.5413), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:58:29,637 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.1151 (max= 1.7361), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 1.78%) +2025-10-25 18:58:29,637 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.1151 (max= 1.7361), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 1.78%) +2025-10-25 18:58:29,637 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.1151 (max= 1.7361), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 1.78%) +2025-10-25 18:58:29,637 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.1151 (max= 1.7361), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 1.78%) +2025-10-25 18:58:29,637 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.1151 (max= 1.7361), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 1.78%) +2025-10-25 18:58:29,637 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.1151 (max= 1.7361), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 1.78%) +2025-10-25 18:58:29,637 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.1151 (max= 1.7361), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 1.78%) +2025-10-25 18:58:29,637 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.1151 (max= 1.7361), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 1.78%) +2025-10-25 18:59:01,517 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.0800 (max= 1.6694), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:01,517 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.0800 (max= 1.6694), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:01,517 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.0800 (max= 1.6694), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:01,517 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.0800 (max= 1.6694), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:01,517 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.0800 (max= 1.6694), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:01,517 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.0800 (max= 1.6694), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:01,517 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.0800 (max= 1.6694), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:01,517 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.0800 (max= 1.6694), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:33,415 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.1128 (max= 1.5852), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:33,415 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.1128 (max= 1.5852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:33,415 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.1128 (max= 1.5852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:33,415 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.1128 (max= 1.5852), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:33,415 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.1128 (max= 1.5852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:33,416 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.1128 (max= 1.5852), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:33,416 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.1128 (max= 1.5852), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 18:59:33,416 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.1128 (max= 1.5852), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:05,294 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.1161 (max= 1.4614), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:05,294 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.1161 (max= 1.4614), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:05,294 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.1161 (max= 1.4614), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:05,294 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.1161 (max= 1.4614), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:05,294 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.1161 (max= 1.4614), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:05,294 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.1161 (max= 1.4614), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:05,294 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.1161 (max= 1.4614), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:05,294 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.1161 (max= 1.4614), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:37,131 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.1130 (max= 1.7245), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:37,131 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.1130 (max= 1.7245), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:37,131 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.1130 (max= 1.7245), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:37,131 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.1130 (max= 1.7245), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:37,131 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.1130 (max= 1.7245), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:37,131 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.1130 (max= 1.7245), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:37,132 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.1130 (max= 1.7245), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:00:37,132 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.1130 (max= 1.7245), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:08,938 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.1120 (max= 1.6575), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:08,939 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.1120 (max= 1.6575), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:08,939 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.1120 (max= 1.6575), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:08,939 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.1120 (max= 1.6575), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:08,939 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.1120 (max= 1.6575), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:08,939 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.1120 (max= 1.6575), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:08,939 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.1120 (max= 1.6575), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:08,939 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.1120 (max= 1.6575), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:40,797 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.1256 (max= 1.7987), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:40,797 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.1256 (max= 1.7987), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:40,797 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.1256 (max= 1.7987), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:40,798 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.1256 (max= 1.7987), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:40,798 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.1256 (max= 1.7987), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:40,798 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.1256 (max= 1.7987), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:40,798 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.1256 (max= 1.7987), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:01:40,798 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.1256 (max= 1.7987), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:12,641 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.1204 (max= 1.5844), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:12,641 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.1204 (max= 1.5844), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:12,641 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.1204 (max= 1.5844), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:12,641 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.1204 (max= 1.5844), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:12,641 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.1204 (max= 1.5844), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:12,641 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.1204 (max= 1.5844), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:12,641 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.1204 (max= 1.5844), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:12,641 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.1204 (max= 1.5844), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:44,448 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.0824 (max= 1.4800), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:44,449 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.0824 (max= 1.4800), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:44,449 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.0824 (max= 1.4800), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:44,449 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.0824 (max= 1.4800), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:44,449 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.0824 (max= 1.4800), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:44,449 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.0824 (max= 1.4800), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:44,449 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.0824 (max= 1.4800), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:02:44,449 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.0824 (max= 1.4800), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:16,291 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.1090 (max= 1.9147), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:16,291 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.1090 (max= 1.9147), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:16,291 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.1090 (max= 1.9147), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:16,291 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.1090 (max= 1.9147), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:16,291 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.1090 (max= 1.9147), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:16,291 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.1090 (max= 1.9147), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:16,291 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.1090 (max= 1.9147), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:16,291 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.1090 (max= 1.9147), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:48,165 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.1154 (max= 1.6295), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:48,165 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.1154 (max= 1.6295), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:48,165 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.1154 (max= 1.6295), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:48,165 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.1154 (max= 1.6295), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:48,165 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.1154 (max= 1.6295), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:48,165 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.1154 (max= 1.6295), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:48,165 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.1154 (max= 1.6295), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:03:48,165 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.1154 (max= 1.6295), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:20,063 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.1003 (max= 1.6373), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:20,063 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.1003 (max= 1.6373), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:20,063 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.1003 (max= 1.6373), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:20,063 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.1003 (max= 1.6373), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:20,063 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.1003 (max= 1.6373), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:20,063 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.1003 (max= 1.6373), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:20,063 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.1003 (max= 1.6373), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:20,063 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.1003 (max= 1.6373), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:51,922 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.1049 (max= 1.5862), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:51,922 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.1049 (max= 1.5862), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:51,922 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.1049 (max= 1.5862), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:51,922 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.1049 (max= 1.5862), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:51,922 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.1049 (max= 1.5862), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:51,922 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.1049 (max= 1.5862), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:51,922 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.1049 (max= 1.5862), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:04:51,922 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.1049 (max= 1.5862), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:23,767 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.1170 (max= 1.5099), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:23,768 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.1170 (max= 1.5099), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:23,768 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.1170 (max= 1.5099), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:23,768 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.1170 (max= 1.5099), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:23,768 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.1170 (max= 1.5099), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:23,768 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.1170 (max= 1.5099), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:23,768 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.1170 (max= 1.5099), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:23,768 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.1170 (max= 1.5099), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:55,649 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.1202 (max= 1.6584), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:55,649 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.1202 (max= 1.6584), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:55,649 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.1202 (max= 1.6584), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:55,649 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.1202 (max= 1.6584), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:55,649 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.1202 (max= 1.6584), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:55,649 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.1202 (max= 1.6584), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:55,649 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.1202 (max= 1.6584), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:05:55,650 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.1202 (max= 1.6584), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:27,484 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.1335 (max= 1.7498), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:27,484 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.1335 (max= 1.7498), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:27,484 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.1335 (max= 1.7498), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:27,484 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.1335 (max= 1.7498), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:27,484 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.1335 (max= 1.7498), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:27,484 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.1335 (max= 1.7498), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:27,484 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.1335 (max= 1.7498), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:27,484 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.1335 (max= 1.7498), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:59,363 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.1257 (max= 1.8000), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:59,363 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.1257 (max= 1.8000), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:59,363 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.1257 (max= 1.8000), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:59,363 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.1257 (max= 1.8000), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:59,363 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.1257 (max= 1.8000), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:59,363 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.1257 (max= 1.8000), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:59,363 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.1257 (max= 1.8000), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:06:59,363 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.1257 (max= 1.8000), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:07:31,118 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.1472 (max= 1.6732), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:07:31,118 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.1472 (max= 1.6732), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:07:31,118 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.1472 (max= 1.6732), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:07:31,118 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.1472 (max= 1.6732), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:07:31,118 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.1472 (max= 1.6732), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:07:31,118 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.1472 (max= 1.6732), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:07:31,118 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.1472 (max= 1.6732), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:07:31,119 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.1472 (max= 1.6732), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:02,939 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.1312 (max= 1.5769), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:02,939 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.1312 (max= 1.5769), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:02,939 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.1312 (max= 1.5769), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:02,939 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.1312 (max= 1.5769), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:02,939 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.1312 (max= 1.5769), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:02,939 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.1312 (max= 1.5769), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:02,939 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.1312 (max= 1.5769), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:02,939 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.1312 (max= 1.5769), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:34,783 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.1354 (max= 1.5491), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:34,783 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.1354 (max= 1.5491), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:34,783 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.1354 (max= 1.5491), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:34,783 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.1354 (max= 1.5491), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:34,783 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.1354 (max= 1.5491), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:34,783 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.1354 (max= 1.5491), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:34,783 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.1354 (max= 1.5491), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:08:34,783 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.1354 (max= 1.5491), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:06,548 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.0997 (max= 1.5697), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:06,548 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.0997 (max= 1.5697), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:06,548 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.0997 (max= 1.5697), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:06,548 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.0997 (max= 1.5697), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:06,548 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.0997 (max= 1.5697), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:06,548 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.0997 (max= 1.5697), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:06,548 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.0997 (max= 1.5697), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:06,548 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.0997 (max= 1.5697), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:38,385 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.1026 (max= 1.4967), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:38,385 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.1026 (max= 1.4967), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:38,385 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.1026 (max= 1.4967), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:38,385 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.1026 (max= 1.4967), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:38,385 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.1026 (max= 1.4967), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:38,385 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.1026 (max= 1.4967), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:38,385 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.1026 (max= 1.4967), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:09:38,385 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.1026 (max= 1.4967), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:10,290 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.1353 (max= 1.8025), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:10,291 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.1353 (max= 1.8025), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:10,291 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.1353 (max= 1.8025), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:10,291 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.1353 (max= 1.8025), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:10,291 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.1353 (max= 1.8025), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:10,291 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.1353 (max= 1.8025), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:10,291 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.1353 (max= 1.8025), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:10,291 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.1353 (max= 1.8025), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:42,132 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.1287 (max= 1.5763), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:42,132 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.1287 (max= 1.5763), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:42,132 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.1287 (max= 1.5763), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:42,132 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.1287 (max= 1.5763), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:42,132 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.1287 (max= 1.5763), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:42,132 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.1287 (max= 1.5763), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:42,132 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.1287 (max= 1.5763), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:10:42,132 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.1287 (max= 1.5763), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:14,018 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.1270 (max= 1.6806), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:14,018 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.1270 (max= 1.6806), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:14,018 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.1270 (max= 1.6806), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:14,018 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.1270 (max= 1.6806), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:14,018 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.1270 (max= 1.6806), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:14,018 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.1270 (max= 1.6806), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:14,018 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.1270 (max= 1.6806), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:14,018 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.1270 (max= 1.6806), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:45,923 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.1074 (max= 1.6483), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:45,923 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.1074 (max= 1.6483), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:45,923 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.1074 (max= 1.6483), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:45,923 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.1074 (max= 1.6483), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:45,923 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.1074 (max= 1.6483), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:45,923 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.1074 (max= 1.6483), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:45,923 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.1074 (max= 1.6483), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:11:45,923 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.1074 (max= 1.6483), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:17,745 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.1349 (max= 1.5333), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:17,745 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.1349 (max= 1.5333), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:17,745 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.1349 (max= 1.5333), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:17,745 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.1349 (max= 1.5333), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:17,745 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.1349 (max= 1.5333), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:17,745 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.1349 (max= 1.5333), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:17,745 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.1349 (max= 1.5333), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:17,745 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.1349 (max= 1.5333), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:49,571 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.1362 (max= 1.5288), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:49,571 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.1362 (max= 1.5288), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:49,571 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.1362 (max= 1.5288), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:49,571 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.1362 (max= 1.5288), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:49,571 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.1362 (max= 1.5288), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:49,571 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.1362 (max= 1.5288), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:49,571 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.1362 (max= 1.5288), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:12:49,572 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.1362 (max= 1.5288), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:21,508 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.1503 (max= 1.7037), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:21,508 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.1503 (max= 1.7037), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:21,508 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.1503 (max= 1.7037), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:21,508 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.1503 (max= 1.7037), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:21,508 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.1503 (max= 1.7037), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:21,509 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.1503 (max= 1.7037), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:21,509 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.1503 (max= 1.7037), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:21,509 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.1503 (max= 1.7037), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:53,364 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.1234 (max= 1.5496), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:53,364 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.1234 (max= 1.5496), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:53,364 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.1234 (max= 1.5496), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:53,364 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.1234 (max= 1.5496), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:53,364 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.1234 (max= 1.5496), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:53,364 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.1234 (max= 1.5496), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:53,364 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.1234 (max= 1.5496), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:13:53,364 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.1234 (max= 1.5496), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:25,176 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.1273 (max= 1.5624), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:25,176 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.1273 (max= 1.5624), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:25,176 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.1273 (max= 1.5624), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:25,176 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.1273 (max= 1.5624), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:25,176 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.1273 (max= 1.5624), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:25,176 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.1273 (max= 1.5624), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:25,176 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.1273 (max= 1.5624), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:25,176 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.1273 (max= 1.5624), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:57,049 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.1346 (max= 1.8089), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:57,049 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.1346 (max= 1.8089), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:57,049 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.1346 (max= 1.8089), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:57,049 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.1346 (max= 1.8089), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:57,049 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.1346 (max= 1.8089), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:57,049 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.1346 (max= 1.8089), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:57,049 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.1346 (max= 1.8089), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:14:57,049 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.1346 (max= 1.8089), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:15:28,834 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.1323 (max= 1.5455), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:15:28,834 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.1323 (max= 1.5455), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:15:28,834 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.1323 (max= 1.5455), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:15:28,834 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.1323 (max= 1.5455), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:15:28,834 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.1323 (max= 1.5455), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:15:28,834 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.1323 (max= 1.5455), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:15:28,834 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.1323 (max= 1.5455), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:15:28,834 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.1323 (max= 1.5455), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:00,689 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.1207 (max= 1.7199), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:00,689 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.1207 (max= 1.7199), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:00,689 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.1207 (max= 1.7199), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:00,689 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.1207 (max= 1.7199), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:00,689 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.1207 (max= 1.7199), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:00,689 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.1207 (max= 1.7199), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:00,689 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.1207 (max= 1.7199), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:00,689 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.1207 (max= 1.7199), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:32,507 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.1294 (max= 1.6637), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:32,507 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.1294 (max= 1.6637), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:32,507 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.1294 (max= 1.6637), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:32,507 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.1294 (max= 1.6637), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:32,508 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.1294 (max= 1.6637), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:32,508 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.1294 (max= 1.6637), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:32,508 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.1294 (max= 1.6637), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:16:32,508 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.1294 (max= 1.6637), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:04,423 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.1337 (max= 1.6701), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:04,423 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.1337 (max= 1.6701), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:04,423 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.1337 (max= 1.6701), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:04,423 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.1337 (max= 1.6701), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:04,423 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.1337 (max= 1.6701), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:04,423 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.1337 (max= 1.6701), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:04,423 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.1337 (max= 1.6701), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:04,423 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.1337 (max= 1.6701), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:36,336 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.1108 (max= 1.5221), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:36,336 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.1108 (max= 1.5221), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:36,336 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.1108 (max= 1.5221), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:36,336 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.1108 (max= 1.5221), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:36,336 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.1108 (max= 1.5221), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:36,336 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.1108 (max= 1.5221), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:36,336 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.1108 (max= 1.5221), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:17:36,337 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.1108 (max= 1.5221), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-10000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-10000! Save time: 4.51492166519165 +2025-10-25 19:18:08,226 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.1345 (max= 1.6278), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:08,226 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-25 19:18:08,226 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 19:18:08,227 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.1345 (max= 1.6278), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:08,227 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.1345 (max= 1.6278), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:08,227 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.1345 (max= 1.6278), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:08,227 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.1345 (max= 1.6278), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:08,227 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.1345 (max= 1.6278), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:08,227 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-25 19:18:08,227 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-25 19:18:08,227 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.1345 (max= 1.6278), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:08,227 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-25 19:18:08,227 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.1345 (max= 1.6278), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:08,227 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 19:18:08,227 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 19:18:08,227 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-25 19:18:08,227 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-25 19:18:08,227 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 19:18:08,227 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 19:18:08,227 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 19:18:08,227 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-25 19:18:08,227 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-25 19:18:08,227 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 19:18:08,227 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 19:18:23,676 - root - INFO - Finished saving the checkpoint in 15.45 seconds +2025-10-25 19:18:23,683 - root - INFO - Finished saving the checkpoint in 15.46 seconds +2025-10-25 19:18:23,684 - root - INFO - Finished saving the checkpoint in 15.46 seconds +2025-10-25 19:18:23,684 - root - INFO - Finished saving the checkpoint in 15.46 seconds +2025-10-25 19:18:23,684 - root - INFO - Finished saving the checkpoint in 15.46 seconds +2025-10-25 19:18:23,684 - root - INFO - Finished saving the checkpoint in 15.46 seconds +2025-10-25 19:18:23,685 - root - INFO - Finished saving the checkpoint in 15.46 seconds +2025-10-25 19:18:23,685 - root - INFO - Finished saving the checkpoint in 15.46 seconds +2025-10-25 19:18:55,513 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.1377 (max= 1.6166), tps=13860, mfu=28.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:55,513 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.1377 (max= 1.6166), tps=13860, mfu=28.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:55,513 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.1377 (max= 1.6166), tps=13860, mfu=28.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:55,513 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.1377 (max= 1.6166), tps=13860, mfu=28.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:55,513 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.1377 (max= 1.6166), tps=13860, mfu=28.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:55,513 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.1377 (max= 1.6166), tps=13860, mfu=28.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:55,513 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.1377 (max= 1.6166), tps=13860, mfu=28.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:18:55,513 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.1377 (max= 1.6166), tps=13860, mfu=28.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:27,410 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.1455 (max= 1.6563), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:27,410 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.1455 (max= 1.6563), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:27,410 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.1455 (max= 1.6563), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:27,410 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.1455 (max= 1.6563), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:27,410 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.1455 (max= 1.6563), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:27,410 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.1455 (max= 1.6563), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:27,410 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.1455 (max= 1.6563), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:27,410 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.1455 (max= 1.6563), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:59,260 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.1237 (max= 1.7068), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:59,260 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.1237 (max= 1.7068), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:59,260 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.1237 (max= 1.7068), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:59,260 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.1237 (max= 1.7068), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:59,260 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.1237 (max= 1.7068), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:59,260 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.1237 (max= 1.7068), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:59,260 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.1237 (max= 1.7068), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:19:59,260 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.1237 (max= 1.7068), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:20:31,096 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.1332 (max= 1.5237), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:20:31,096 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.1332 (max= 1.5237), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:20:31,096 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.1332 (max= 1.5237), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:20:31,096 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.1332 (max= 1.5237), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:20:31,097 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.1332 (max= 1.5237), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:20:31,097 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.1332 (max= 1.5237), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:20:31,097 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.1332 (max= 1.5237), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:20:31,097 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.1332 (max= 1.5237), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:02,968 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.1327 (max= 1.5813), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:02,968 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.1327 (max= 1.5813), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:02,968 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.1327 (max= 1.5813), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:02,968 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.1327 (max= 1.5813), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:02,968 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.1327 (max= 1.5813), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:02,968 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.1327 (max= 1.5813), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:02,968 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.1327 (max= 1.5813), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:02,969 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.1327 (max= 1.5813), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:34,820 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.1189 (max= 1.5273), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:34,820 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.1189 (max= 1.5273), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:34,820 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.1189 (max= 1.5273), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:34,820 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.1189 (max= 1.5273), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:34,820 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.1189 (max= 1.5273), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:34,820 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.1189 (max= 1.5273), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:34,820 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.1189 (max= 1.5273), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:21:34,820 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.1189 (max= 1.5273), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:06,700 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.1014 (max= 1.6247), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:06,700 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.1014 (max= 1.6247), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:06,700 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.1014 (max= 1.6247), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:06,700 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.1014 (max= 1.6247), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:06,700 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.1014 (max= 1.6247), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:06,700 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.1014 (max= 1.6247), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:06,700 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.1014 (max= 1.6247), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:06,700 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.1014 (max= 1.6247), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:38,527 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.1324 (max= 1.4936), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:38,527 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.1324 (max= 1.4936), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:38,527 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.1324 (max= 1.4936), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:38,527 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.1324 (max= 1.4936), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:38,527 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.1324 (max= 1.4936), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:38,527 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.1324 (max= 1.4936), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:38,527 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.1324 (max= 1.4936), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:22:38,527 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.1324 (max= 1.4936), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:10,422 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.1103 (max= 1.7129), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:10,422 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.1103 (max= 1.7129), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:10,422 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.1103 (max= 1.7129), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:10,422 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.1103 (max= 1.7129), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:10,422 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.1103 (max= 1.7129), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:10,422 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.1103 (max= 1.7129), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:10,422 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.1103 (max= 1.7129), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:10,422 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.1103 (max= 1.7129), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:42,297 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.1268 (max= 1.7488), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:42,297 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.1268 (max= 1.7488), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:42,297 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.1268 (max= 1.7488), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:42,297 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.1268 (max= 1.7488), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:42,297 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.1268 (max= 1.7488), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:42,297 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.1268 (max= 1.7488), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:42,297 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.1268 (max= 1.7488), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:23:42,298 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.1268 (max= 1.7488), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:14,144 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.1241 (max= 1.5634), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:14,144 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.1241 (max= 1.5634), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:14,144 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.1241 (max= 1.5634), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:14,144 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.1241 (max= 1.5634), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:14,144 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.1241 (max= 1.5634), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:14,144 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.1241 (max= 1.5634), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:14,144 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.1241 (max= 1.5634), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:14,144 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.1241 (max= 1.5634), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:45,935 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.1167 (max= 1.7021), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:45,935 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.1167 (max= 1.7021), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:45,935 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.1167 (max= 1.7021), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:45,935 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.1167 (max= 1.7021), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:45,935 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.1167 (max= 1.7021), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:45,935 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.1167 (max= 1.7021), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:45,935 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.1167 (max= 1.7021), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:24:45,935 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.1167 (max= 1.7021), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:17,774 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.0955 (max= 1.5242), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:17,774 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.0955 (max= 1.5242), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:17,774 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.0955 (max= 1.5242), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:17,774 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.0955 (max= 1.5242), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:17,774 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.0955 (max= 1.5242), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:17,774 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.0955 (max= 1.5242), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:17,774 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.0955 (max= 1.5242), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:17,774 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.0955 (max= 1.5242), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:49,631 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.1272 (max= 1.6470), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:49,632 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.1272 (max= 1.6470), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:49,632 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.1272 (max= 1.6470), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:49,632 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.1272 (max= 1.6470), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:49,632 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.1272 (max= 1.6470), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:49,632 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.1272 (max= 1.6470), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:49,632 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.1272 (max= 1.6470), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:25:49,632 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.1272 (max= 1.6470), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:21,506 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.1179 (max= 1.5436), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:21,506 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.1179 (max= 1.5436), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:21,506 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.1179 (max= 1.5436), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:21,506 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.1179 (max= 1.5436), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:21,506 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.1179 (max= 1.5436), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:21,506 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.1179 (max= 1.5436), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:21,506 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.1179 (max= 1.5436), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:21,506 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.1179 (max= 1.5436), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:53,344 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.1081 (max= 1.5332), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:53,344 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.1081 (max= 1.5332), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:53,344 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.1081 (max= 1.5332), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:53,344 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.1081 (max= 1.5332), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:53,344 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.1081 (max= 1.5332), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:53,344 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.1081 (max= 1.5332), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:53,344 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.1081 (max= 1.5332), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:26:53,345 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.1081 (max= 1.5332), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:25,246 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.1328 (max= 1.6618), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:25,246 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.1328 (max= 1.6618), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:25,246 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.1328 (max= 1.6618), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:25,246 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.1328 (max= 1.6618), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:25,246 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.1328 (max= 1.6618), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:25,246 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.1328 (max= 1.6618), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:25,246 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.1328 (max= 1.6618), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:25,246 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.1328 (max= 1.6618), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:57,123 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.1122 (max= 1.5258), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:57,123 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.1122 (max= 1.5258), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:57,123 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.1122 (max= 1.5258), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:57,123 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.1122 (max= 1.5258), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:57,123 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.1122 (max= 1.5258), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:57,123 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.1122 (max= 1.5258), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:57,123 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.1122 (max= 1.5258), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:27:57,123 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.1122 (max= 1.5258), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:28:28,946 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.1114 (max= 1.5504), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:28:28,946 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.1114 (max= 1.5504), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:28:28,946 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.1114 (max= 1.5504), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:28:28,946 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.1114 (max= 1.5504), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:28:28,946 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.1114 (max= 1.5504), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:28:28,947 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.1114 (max= 1.5504), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:28:28,947 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.1114 (max= 1.5504), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:28:28,947 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.1114 (max= 1.5504), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:00,847 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.1219 (max= 1.5482), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:00,847 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.1219 (max= 1.5482), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:00,847 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.1219 (max= 1.5482), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:00,847 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.1219 (max= 1.5482), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:00,847 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.1219 (max= 1.5482), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:00,847 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.1219 (max= 1.5482), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:00,848 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.1219 (max= 1.5482), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:00,848 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.1219 (max= 1.5482), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:32,653 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.1411 (max= 1.5314), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:32,653 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.1411 (max= 1.5314), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:32,653 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.1411 (max= 1.5314), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:32,653 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.1411 (max= 1.5314), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:32,653 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.1411 (max= 1.5314), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:32,653 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.1411 (max= 1.5314), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:32,653 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.1411 (max= 1.5314), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:29:32,654 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.1411 (max= 1.5314), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:04,483 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.0948 (max= 1.4971), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:04,483 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.0948 (max= 1.4971), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:04,483 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.0948 (max= 1.4971), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:04,483 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.0948 (max= 1.4971), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:04,483 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.0948 (max= 1.4971), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:04,483 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.0948 (max= 1.4971), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:04,483 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.0948 (max= 1.4971), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:04,483 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.0948 (max= 1.4971), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:36,278 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.1316 (max= 1.7311), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:36,278 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.1316 (max= 1.7311), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:36,278 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.1316 (max= 1.7311), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:36,278 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.1316 (max= 1.7311), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:36,278 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.1316 (max= 1.7311), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:36,278 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.1316 (max= 1.7311), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:36,278 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.1316 (max= 1.7311), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:30:36,278 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.1316 (max= 1.7311), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:08,145 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.1094 (max= 1.5235), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:08,145 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.1094 (max= 1.5235), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:08,145 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.1094 (max= 1.5235), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:08,145 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.1094 (max= 1.5235), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:08,145 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.1094 (max= 1.5235), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:08,145 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.1094 (max= 1.5235), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:08,145 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.1094 (max= 1.5235), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:08,145 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.1094 (max= 1.5235), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:40,078 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.1372 (max= 1.6875), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:40,078 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.1372 (max= 1.6875), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:40,079 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.1372 (max= 1.6875), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:40,079 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.1372 (max= 1.6875), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:40,079 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.1372 (max= 1.6875), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:40,079 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.1372 (max= 1.6875), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:40,079 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.1372 (max= 1.6875), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:31:40,079 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.1372 (max= 1.6875), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:12,028 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.0999 (max= 1.5329), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:12,029 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.0999 (max= 1.5329), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:12,029 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.0999 (max= 1.5329), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:12,029 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.0999 (max= 1.5329), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:12,029 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.0999 (max= 1.5329), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:12,029 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.0999 (max= 1.5329), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:12,029 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.0999 (max= 1.5329), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:12,029 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.0999 (max= 1.5329), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:43,879 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.1406 (max= 1.5865), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:43,879 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.1406 (max= 1.5865), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:43,879 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.1406 (max= 1.5865), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:43,879 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.1406 (max= 1.5865), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:43,879 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.1406 (max= 1.5865), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:43,879 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.1406 (max= 1.5865), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:43,879 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.1406 (max= 1.5865), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:32:43,880 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.1406 (max= 1.5865), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:15,692 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.1276 (max= 1.5341), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:15,692 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.1276 (max= 1.5341), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:15,692 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.1276 (max= 1.5341), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:15,692 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.1276 (max= 1.5341), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:15,692 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.1276 (max= 1.5341), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:15,692 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.1276 (max= 1.5341), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:15,692 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.1276 (max= 1.5341), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:15,692 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.1276 (max= 1.5341), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:47,527 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.1306 (max= 1.6154), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:47,527 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.1306 (max= 1.6154), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:47,528 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.1306 (max= 1.6154), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:47,528 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.1306 (max= 1.6154), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:47,528 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.1306 (max= 1.6154), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:47,528 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.1306 (max= 1.6154), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:47,528 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.1306 (max= 1.6154), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:33:47,528 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.1306 (max= 1.6154), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:19,310 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.1160 (max= 1.6307), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:19,310 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.1160 (max= 1.6307), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:19,310 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.1160 (max= 1.6307), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:19,310 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.1160 (max= 1.6307), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:19,310 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.1160 (max= 1.6307), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:19,310 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.1160 (max= 1.6307), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:19,310 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.1160 (max= 1.6307), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:19,310 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.1160 (max= 1.6307), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:51,121 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.1115 (max= 1.4983), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:51,121 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.1115 (max= 1.4983), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:51,121 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.1115 (max= 1.4983), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:51,121 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.1115 (max= 1.4983), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:51,121 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.1115 (max= 1.4983), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:51,121 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.1115 (max= 1.4983), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:51,121 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.1115 (max= 1.4983), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:34:51,121 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.1115 (max= 1.4983), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:23,013 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.1153 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:23,013 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.1153 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:23,013 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.1153 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:23,013 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.1153 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:23,013 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.1153 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:23,013 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.1153 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:23,013 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.1153 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:23,013 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.1153 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:54,875 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.0997 (max= 1.4561), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:54,875 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.0997 (max= 1.4561), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:54,875 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.0997 (max= 1.4561), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:54,875 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.0997 (max= 1.4561), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:54,875 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.0997 (max= 1.4561), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:54,875 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.0997 (max= 1.4561), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:54,875 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.0997 (max= 1.4561), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:35:54,876 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.0997 (max= 1.4561), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:26,780 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.1329 (max= 1.7489), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:26,780 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.1329 (max= 1.7489), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:26,780 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.1329 (max= 1.7489), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:26,780 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.1329 (max= 1.7489), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:26,780 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.1329 (max= 1.7489), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:26,780 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.1329 (max= 1.7489), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:26,780 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.1329 (max= 1.7489), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:26,780 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.1329 (max= 1.7489), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:58,690 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.1108 (max= 1.5022), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:58,690 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.1108 (max= 1.5022), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:58,690 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.1108 (max= 1.5022), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:58,690 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.1108 (max= 1.5022), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:58,690 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.1108 (max= 1.5022), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:58,690 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.1108 (max= 1.5022), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:58,690 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.1108 (max= 1.5022), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:36:58,690 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.1108 (max= 1.5022), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:37:30,482 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.0880 (max= 1.4546), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:37:30,482 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.0880 (max= 1.4546), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:37:30,482 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.0880 (max= 1.4546), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:37:30,482 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.0880 (max= 1.4546), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:37:30,482 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.0880 (max= 1.4546), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:37:30,482 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.0880 (max= 1.4546), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:37:30,482 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.0880 (max= 1.4546), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:37:30,482 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.0880 (max= 1.4546), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:02,289 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.1167 (max= 1.5612), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:02,289 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.1167 (max= 1.5612), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:02,290 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.1167 (max= 1.5612), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:02,290 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.1167 (max= 1.5612), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:02,290 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.1167 (max= 1.5612), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:02,290 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.1167 (max= 1.5612), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:02,290 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.1167 (max= 1.5612), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:02,290 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.1167 (max= 1.5612), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:34,174 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.1279 (max= 1.6377), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:34,174 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.1279 (max= 1.6377), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:34,174 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.1279 (max= 1.6377), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:34,174 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.1279 (max= 1.6377), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:34,174 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.1279 (max= 1.6377), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:34,174 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.1279 (max= 1.6377), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:34,174 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.1279 (max= 1.6377), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:38:34,174 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.1279 (max= 1.6377), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:05,968 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.1130 (max= 1.5422), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:05,968 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.1130 (max= 1.5422), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:05,968 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.1130 (max= 1.5422), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:05,968 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.1130 (max= 1.5422), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:05,968 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.1130 (max= 1.5422), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:05,968 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.1130 (max= 1.5422), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:05,968 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.1130 (max= 1.5422), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:05,968 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.1130 (max= 1.5422), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:37,789 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.1023 (max= 1.5318), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:37,789 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.1023 (max= 1.5318), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:37,789 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.1023 (max= 1.5318), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:37,789 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.1023 (max= 1.5318), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:37,789 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.1023 (max= 1.5318), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:37,789 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.1023 (max= 1.5318), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:37,789 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.1023 (max= 1.5318), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:39:37,789 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.1023 (max= 1.5318), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:09,628 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.1179 (max= 1.6181), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:09,628 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.1179 (max= 1.6181), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:09,628 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.1179 (max= 1.6181), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:09,628 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.1179 (max= 1.6181), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:09,628 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.1179 (max= 1.6181), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:09,628 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.1179 (max= 1.6181), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:09,628 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.1179 (max= 1.6181), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:09,628 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.1179 (max= 1.6181), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:41,515 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.1279 (max= 1.6010), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:41,515 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.1279 (max= 1.6010), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:41,515 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.1279 (max= 1.6010), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:41,515 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.1279 (max= 1.6010), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:41,515 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.1279 (max= 1.6010), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:41,515 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.1279 (max= 1.6010), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:41,515 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.1279 (max= 1.6010), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:40:41,515 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.1279 (max= 1.6010), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:13,472 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.1072 (max= 1.6304), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:13,472 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.1072 (max= 1.6304), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:13,472 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.1072 (max= 1.6304), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:13,472 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.1072 (max= 1.6304), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:13,472 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.1072 (max= 1.6304), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:13,472 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.1072 (max= 1.6304), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:13,472 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.1072 (max= 1.6304), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:13,472 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.1072 (max= 1.6304), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:45,375 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.1148 (max= 1.6611), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:45,375 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.1148 (max= 1.6611), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:45,375 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.1148 (max= 1.6611), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:45,375 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.1148 (max= 1.6611), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:45,375 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.1148 (max= 1.6611), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:45,375 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.1148 (max= 1.6611), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:45,375 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.1148 (max= 1.6611), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:41:45,375 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.1148 (max= 1.6611), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:17,319 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.1127 (max= 1.5470), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:17,319 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.1127 (max= 1.5470), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:17,319 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.1127 (max= 1.5470), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:17,319 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.1127 (max= 1.5470), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:17,320 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.1127 (max= 1.5470), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:17,320 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.1127 (max= 1.5470), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:17,320 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.1127 (max= 1.5470), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:17,320 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.1127 (max= 1.5470), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:49,194 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.1176 (max= 1.4899), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:49,195 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.1176 (max= 1.4899), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:49,195 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.1176 (max= 1.4899), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:49,195 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.1176 (max= 1.4899), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:49,195 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.1176 (max= 1.4899), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:49,195 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.1176 (max= 1.4899), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:49,195 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.1176 (max= 1.4899), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:42:49,195 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.1176 (max= 1.4899), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:21,024 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.1206 (max= 1.7417), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:21,024 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.1206 (max= 1.7417), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:21,024 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.1206 (max= 1.7417), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:21,024 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.1206 (max= 1.7417), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:21,024 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.1206 (max= 1.7417), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:21,024 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.1206 (max= 1.7417), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:21,024 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.1206 (max= 1.7417), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:21,024 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.1206 (max= 1.7417), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:52,906 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.1079 (max= 1.6096), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:52,906 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.1079 (max= 1.6096), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:52,906 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.1079 (max= 1.6096), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:52,906 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.1079 (max= 1.6096), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:52,906 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.1079 (max= 1.6096), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:52,906 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.1079 (max= 1.6096), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:52,906 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.1079 (max= 1.6096), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:43:52,906 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.1079 (max= 1.6096), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:24,720 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.1418 (max= 1.5266), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:24,720 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.1418 (max= 1.5266), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:24,720 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.1418 (max= 1.5266), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:24,720 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.1418 (max= 1.5266), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:24,720 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.1418 (max= 1.5266), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:24,720 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.1418 (max= 1.5266), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:24,720 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.1418 (max= 1.5266), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:24,720 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.1418 (max= 1.5266), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:56,565 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.0906 (max= 1.5160), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:56,565 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.0906 (max= 1.5160), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:56,565 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.0906 (max= 1.5160), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:56,565 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.0906 (max= 1.5160), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:56,565 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.0906 (max= 1.5160), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:56,565 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.0906 (max= 1.5160), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:56,565 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.0906 (max= 1.5160), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:44:56,565 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.0906 (max= 1.5160), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:45:28,430 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.1333 (max= 1.5242), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:45:28,430 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.1333 (max= 1.5242), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:45:28,430 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.1333 (max= 1.5242), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:45:28,430 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.1333 (max= 1.5242), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:45:28,430 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.1333 (max= 1.5242), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:45:28,430 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.1333 (max= 1.5242), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:45:28,430 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.1333 (max= 1.5242), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:45:28,430 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.1333 (max= 1.5242), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:00,248 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.1102 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:00,248 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.1102 (max= 1.5729), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:00,248 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.1102 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:00,248 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.1102 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:00,248 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.1102 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:00,249 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.1102 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:00,249 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.1102 (max= 1.5729), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:00,249 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.1102 (max= 1.5729), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:32,046 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.1071 (max= 1.4910), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:32,046 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.1071 (max= 1.4910), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:32,046 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.1071 (max= 1.4910), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:32,046 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.1071 (max= 1.4910), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:32,046 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.1071 (max= 1.4910), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:32,046 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.1071 (max= 1.4910), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:32,046 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.1071 (max= 1.4910), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:46:32,046 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.1071 (max= 1.4910), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:03,885 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.1028 (max= 1.5598), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:03,885 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.1028 (max= 1.5598), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:03,885 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.1028 (max= 1.5598), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:03,885 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.1028 (max= 1.5598), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:03,885 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.1028 (max= 1.5598), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:03,885 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.1028 (max= 1.5598), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:03,885 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.1028 (max= 1.5598), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:03,885 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.1028 (max= 1.5598), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:35,711 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.1030 (max= 1.5332), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:35,711 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.1030 (max= 1.5332), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:35,711 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.1030 (max= 1.5332), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:35,711 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.1030 (max= 1.5332), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:35,711 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.1030 (max= 1.5332), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:35,711 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.1030 (max= 1.5332), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:35,711 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.1030 (max= 1.5332), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:47:35,711 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.1030 (max= 1.5332), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:07,616 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.1128 (max= 1.6420), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:07,616 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.1128 (max= 1.6420), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:07,616 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.1128 (max= 1.6420), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:07,616 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.1128 (max= 1.6420), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:07,616 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.1128 (max= 1.6420), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:07,617 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.1128 (max= 1.6420), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:07,617 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.1128 (max= 1.6420), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:07,617 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.1128 (max= 1.6420), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:39,456 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.1098 (max= 1.6394), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:39,456 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.1098 (max= 1.6394), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:39,456 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.1098 (max= 1.6394), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:39,456 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.1098 (max= 1.6394), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:39,456 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.1098 (max= 1.6394), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:39,456 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.1098 (max= 1.6394), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:39,456 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.1098 (max= 1.6394), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:48:39,456 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.1098 (max= 1.6394), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:11,323 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.1216 (max= 1.7513), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:11,323 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.1216 (max= 1.7513), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:11,323 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.1216 (max= 1.7513), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:11,323 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.1216 (max= 1.7513), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:11,323 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.1216 (max= 1.7513), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:11,324 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.1216 (max= 1.7513), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:11,324 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.1216 (max= 1.7513), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:11,324 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.1216 (max= 1.7513), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:43,245 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.1382 (max= 1.7256), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:43,245 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.1382 (max= 1.7256), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:43,245 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.1382 (max= 1.7256), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:43,245 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.1382 (max= 1.7256), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:43,245 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.1382 (max= 1.7256), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:43,245 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.1382 (max= 1.7256), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:43,245 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.1382 (max= 1.7256), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:49:43,245 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.1382 (max= 1.7256), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:15,147 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.1117 (max= 1.6607), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:15,147 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.1117 (max= 1.6607), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:15,147 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.1117 (max= 1.6607), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:15,147 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.1117 (max= 1.6607), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:15,147 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.1117 (max= 1.6607), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:15,147 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.1117 (max= 1.6607), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:15,147 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.1117 (max= 1.6607), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:15,147 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.1117 (max= 1.6607), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:47,036 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.1008 (max= 1.4303), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:47,036 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.1008 (max= 1.4303), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:47,036 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.1008 (max= 1.4303), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:47,036 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.1008 (max= 1.4303), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:47,036 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.1008 (max= 1.4303), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:47,036 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.1008 (max= 1.4303), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:47,036 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.1008 (max= 1.4303), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:50:47,036 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.1008 (max= 1.4303), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:18,921 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.1334 (max= 1.7434), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:18,921 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.1334 (max= 1.7434), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:18,921 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.1334 (max= 1.7434), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:18,921 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.1334 (max= 1.7434), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:18,921 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.1334 (max= 1.7434), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:18,921 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.1334 (max= 1.7434), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:18,921 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.1334 (max= 1.7434), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:18,921 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.1334 (max= 1.7434), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:50,786 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.1453 (max= 1.5546), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:50,786 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.1453 (max= 1.5546), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:50,787 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.1453 (max= 1.5546), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:50,787 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.1453 (max= 1.5546), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:50,787 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.1453 (max= 1.5546), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:50,787 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.1453 (max= 1.5546), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:50,787 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.1453 (max= 1.5546), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:51:50,787 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.1453 (max= 1.5546), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:22,680 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.1189 (max= 1.4692), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:22,681 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.1189 (max= 1.4692), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:22,681 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.1189 (max= 1.4692), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:22,681 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.1189 (max= 1.4692), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:22,681 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.1189 (max= 1.4692), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:22,681 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.1189 (max= 1.4692), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:22,681 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.1189 (max= 1.4692), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:22,681 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.1189 (max= 1.4692), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:54,668 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.1282 (max= 1.5870), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:54,668 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.1282 (max= 1.5870), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:54,668 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.1282 (max= 1.5870), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:54,668 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.1282 (max= 1.5870), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:54,668 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.1282 (max= 1.5870), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:54,668 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.1282 (max= 1.5870), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:54,668 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.1282 (max= 1.5870), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:52:54,668 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.1282 (max= 1.5870), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:26,513 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.1225 (max= 1.6366), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:26,513 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.1225 (max= 1.6366), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:26,513 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.1225 (max= 1.6366), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:26,513 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.1225 (max= 1.6366), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:26,513 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.1225 (max= 1.6366), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:26,513 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.1225 (max= 1.6366), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:26,513 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.1225 (max= 1.6366), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:26,514 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.1225 (max= 1.6366), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:58,401 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.1446 (max= 1.5550), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:58,401 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.1446 (max= 1.5550), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:58,401 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.1446 (max= 1.5550), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:58,401 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.1446 (max= 1.5550), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:58,401 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.1446 (max= 1.5550), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:58,401 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.1446 (max= 1.5550), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:58,401 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.1446 (max= 1.5550), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:53:58,402 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.1446 (max= 1.5550), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:54:30,279 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.1159 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:54:30,279 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.1159 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:54:30,279 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.1159 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:54:30,279 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.1159 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:54:30,279 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.1159 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:54:30,279 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.1159 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:54:30,279 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.1159 (max= 1.4685), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:54:30,279 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.1159 (max= 1.4685), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:02,115 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.1145 (max= 1.5865), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:02,115 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.1145 (max= 1.5865), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:02,115 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.1145 (max= 1.5865), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:02,115 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.1145 (max= 1.5865), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:02,115 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.1145 (max= 1.5865), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:02,115 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.1145 (max= 1.5865), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:02,115 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.1145 (max= 1.5865), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:02,116 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.1145 (max= 1.5865), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:33,985 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.1285 (max= 1.6435), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:33,985 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.1285 (max= 1.6435), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:33,986 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.1285 (max= 1.6435), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:33,986 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.1285 (max= 1.6435), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:33,986 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.1285 (max= 1.6435), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:33,986 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.1285 (max= 1.6435), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:33,986 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.1285 (max= 1.6435), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:55:33,986 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.1285 (max= 1.6435), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:05,807 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.1421 (max= 1.7435), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:05,807 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.1421 (max= 1.7435), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:05,807 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.1421 (max= 1.7435), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:05,807 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.1421 (max= 1.7435), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:05,807 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.1421 (max= 1.7435), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:05,807 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.1421 (max= 1.7435), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:05,807 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.1421 (max= 1.7435), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:05,807 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.1421 (max= 1.7435), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:37,639 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.1323 (max= 1.5137), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:37,639 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.1323 (max= 1.5137), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:37,639 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.1323 (max= 1.5137), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:37,639 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.1323 (max= 1.5137), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:37,639 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.1323 (max= 1.5137), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:37,639 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.1323 (max= 1.5137), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:37,639 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.1323 (max= 1.5137), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:56:37,639 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.1323 (max= 1.5137), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:09,535 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.1260 (max= 1.6652), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:09,535 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.1260 (max= 1.6652), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:09,535 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.1260 (max= 1.6652), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:09,536 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.1260 (max= 1.6652), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:09,536 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.1260 (max= 1.6652), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:09,536 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.1260 (max= 1.6652), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:09,536 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.1260 (max= 1.6652), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:09,536 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.1260 (max= 1.6652), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:41,514 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.1193 (max= 1.6650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:41,514 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.1193 (max= 1.6650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:41,514 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.1193 (max= 1.6650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:41,514 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.1193 (max= 1.6650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:41,514 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.1193 (max= 1.6650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:41,514 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.1193 (max= 1.6650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:41,514 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.1193 (max= 1.6650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:57:41,514 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.1193 (max= 1.6650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:13,352 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.1277 (max= 1.5455), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:13,352 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.1277 (max= 1.5455), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:13,352 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.1277 (max= 1.5455), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:13,352 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.1277 (max= 1.5455), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:13,352 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.1277 (max= 1.5455), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:13,352 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.1277 (max= 1.5455), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:13,352 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.1277 (max= 1.5455), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:13,352 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.1277 (max= 1.5455), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:45,170 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.1228 (max= 1.5385), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:45,170 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.1228 (max= 1.5385), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:45,170 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.1228 (max= 1.5385), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:45,170 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.1228 (max= 1.5385), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:45,170 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.1228 (max= 1.5385), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:45,170 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.1228 (max= 1.5385), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:45,170 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.1228 (max= 1.5385), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:58:45,170 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.1228 (max= 1.5385), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:17,070 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.1268 (max= 1.6551), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:17,070 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.1268 (max= 1.6551), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:17,071 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.1268 (max= 1.6551), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:17,071 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.1268 (max= 1.6551), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:17,071 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.1268 (max= 1.6551), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:17,071 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.1268 (max= 1.6551), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:17,071 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.1268 (max= 1.6551), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:17,071 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.1268 (max= 1.6551), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:48,932 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.1368 (max= 1.5957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:48,932 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.1368 (max= 1.5957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:48,932 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.1368 (max= 1.5957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:48,933 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.1368 (max= 1.5957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:48,933 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.1368 (max= 1.5957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:48,933 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.1368 (max= 1.5957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:48,933 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.1368 (max= 1.5957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 19:59:48,933 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.1368 (max= 1.5957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:20,796 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.1375 (max= 1.5959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:20,796 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.1375 (max= 1.5959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:20,797 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.1375 (max= 1.5959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:20,797 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.1375 (max= 1.5959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:20,797 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.1375 (max= 1.5959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:20,797 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.1375 (max= 1.5959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:20,797 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.1375 (max= 1.5959), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:20,797 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.1375 (max= 1.5959), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:52,645 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.1033 (max= 1.5031), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:52,645 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.1033 (max= 1.5031), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:52,645 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.1033 (max= 1.5031), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:52,645 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.1033 (max= 1.5031), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:52,645 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.1033 (max= 1.5031), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:52,645 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.1033 (max= 1.5031), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:52,645 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.1033 (max= 1.5031), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:00:52,645 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.1033 (max= 1.5031), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:24,540 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.1136 (max= 1.6868), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:24,540 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.1136 (max= 1.6868), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:24,541 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.1136 (max= 1.6868), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:24,541 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.1136 (max= 1.6868), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:24,541 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.1136 (max= 1.6868), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:24,541 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.1136 (max= 1.6868), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:24,541 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.1136 (max= 1.6868), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:24,541 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.1136 (max= 1.6868), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:56,451 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.1323 (max= 1.6557), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:56,451 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.1323 (max= 1.6557), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:56,451 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.1323 (max= 1.6557), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:56,451 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.1323 (max= 1.6557), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:56,451 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.1323 (max= 1.6557), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:56,451 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.1323 (max= 1.6557), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:56,451 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.1323 (max= 1.6557), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:01:56,451 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.1323 (max= 1.6557), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:02:28,273 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.1128 (max= 1.5706), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:02:28,273 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.1128 (max= 1.5706), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:02:28,273 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.1128 (max= 1.5706), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:02:28,273 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.1128 (max= 1.5706), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:02:28,273 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.1128 (max= 1.5706), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:02:28,273 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.1128 (max= 1.5706), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:02:28,273 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.1128 (max= 1.5706), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:02:28,273 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.1128 (max= 1.5706), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:00,148 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.1213 (max= 1.5802), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:00,148 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.1213 (max= 1.5802), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:00,148 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.1213 (max= 1.5802), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:00,148 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.1213 (max= 1.5802), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:00,148 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.1213 (max= 1.5802), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:00,148 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.1213 (max= 1.5802), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:00,148 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.1213 (max= 1.5802), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:00,148 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.1213 (max= 1.5802), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:32,011 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.1127 (max= 1.5494), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:32,011 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.1127 (max= 1.5494), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:32,011 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.1127 (max= 1.5494), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:32,012 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.1127 (max= 1.5494), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:32,012 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.1127 (max= 1.5494), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:32,012 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.1127 (max= 1.5494), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:32,012 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.1127 (max= 1.5494), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:03:32,012 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.1127 (max= 1.5494), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:03,890 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.1233 (max= 1.4871), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:03,890 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.1233 (max= 1.4871), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:03,890 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.1233 (max= 1.4871), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:03,890 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.1233 (max= 1.4871), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:03,890 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.1233 (max= 1.4871), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:03,890 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.1233 (max= 1.4871), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:03,890 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.1233 (max= 1.4871), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:03,890 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.1233 (max= 1.4871), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:35,736 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.1444 (max= 1.7645), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:35,736 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.1444 (max= 1.7645), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:35,736 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.1444 (max= 1.7645), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:35,736 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.1444 (max= 1.7645), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:35,736 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.1444 (max= 1.7645), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:35,736 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.1444 (max= 1.7645), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:35,736 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.1444 (max= 1.7645), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:04:35,736 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.1444 (max= 1.7645), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:07,559 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.1174 (max= 1.5974), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:07,559 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.1174 (max= 1.5974), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:07,559 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.1174 (max= 1.5974), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:07,559 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.1174 (max= 1.5974), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:07,559 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.1174 (max= 1.5974), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:07,559 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.1174 (max= 1.5974), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:07,559 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.1174 (max= 1.5974), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:07,559 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.1174 (max= 1.5974), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:39,369 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.0973 (max= 1.5674), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:39,369 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.0973 (max= 1.5674), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:39,369 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.0973 (max= 1.5674), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:39,369 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.0973 (max= 1.5674), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:39,369 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.0973 (max= 1.5674), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:39,369 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.0973 (max= 1.5674), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:39,369 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.0973 (max= 1.5674), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:05:39,369 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.0973 (max= 1.5674), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:11,194 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.1210 (max= 1.5253), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:11,194 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.1210 (max= 1.5253), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:11,194 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.1210 (max= 1.5253), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:11,194 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.1210 (max= 1.5253), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:11,195 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.1210 (max= 1.5253), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:11,195 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.1210 (max= 1.5253), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:11,195 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.1210 (max= 1.5253), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:11,195 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.1210 (max= 1.5253), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:43,109 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.1234 (max= 1.5489), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:43,109 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.1234 (max= 1.5489), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:43,109 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.1234 (max= 1.5489), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:43,109 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.1234 (max= 1.5489), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:43,109 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.1234 (max= 1.5489), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:43,109 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.1234 (max= 1.5489), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:43,109 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.1234 (max= 1.5489), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:06:43,109 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.1234 (max= 1.5489), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:14,983 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.1112 (max= 1.7363), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:14,983 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.1112 (max= 1.7363), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:14,983 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.1112 (max= 1.7363), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:14,983 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.1112 (max= 1.7363), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:14,983 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.1112 (max= 1.7363), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:14,983 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.1112 (max= 1.7363), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:14,983 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.1112 (max= 1.7363), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:14,983 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.1112 (max= 1.7363), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:46,857 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.0938 (max= 1.5214), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:46,857 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.0938 (max= 1.5214), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:46,857 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.0938 (max= 1.5214), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:46,857 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.0938 (max= 1.5214), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:46,857 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.0938 (max= 1.5214), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:46,857 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.0938 (max= 1.5214), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:46,857 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.0938 (max= 1.5214), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:07:46,857 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.0938 (max= 1.5214), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:18,746 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.1117 (max= 1.6677), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:18,746 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.1117 (max= 1.6677), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:18,746 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.1117 (max= 1.6677), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:18,746 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.1117 (max= 1.6677), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:18,746 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.1117 (max= 1.6677), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:18,746 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.1117 (max= 1.6677), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:18,746 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.1117 (max= 1.6677), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:18,746 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.1117 (max= 1.6677), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:50,605 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.1241 (max= 1.5831), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:50,605 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.1241 (max= 1.5831), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:50,605 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.1241 (max= 1.5831), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:50,605 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.1241 (max= 1.5831), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:50,605 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.1241 (max= 1.5831), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:50,605 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.1241 (max= 1.5831), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:50,605 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.1241 (max= 1.5831), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:08:50,605 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.1241 (max= 1.5831), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:22,423 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.1203 (max= 1.5320), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:22,423 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.1203 (max= 1.5320), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:22,423 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.1203 (max= 1.5320), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:22,423 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.1203 (max= 1.5320), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:22,423 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.1203 (max= 1.5320), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:22,423 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.1203 (max= 1.5320), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:22,423 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.1203 (max= 1.5320), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:22,424 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.1203 (max= 1.5320), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:54,274 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.1032 (max= 1.5529), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:54,274 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.1032 (max= 1.5529), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:54,274 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.1032 (max= 1.5529), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:54,274 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.1032 (max= 1.5529), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:54,274 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.1032 (max= 1.5529), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:54,274 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.1032 (max= 1.5529), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:54,274 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.1032 (max= 1.5529), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:09:54,274 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.1032 (max= 1.5529), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:26,130 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.1186 (max= 1.5799), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:26,130 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.1186 (max= 1.5799), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:26,130 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.1186 (max= 1.5799), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:26,130 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.1186 (max= 1.5799), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:26,130 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.1186 (max= 1.5799), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:26,130 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.1186 (max= 1.5799), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:26,130 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.1186 (max= 1.5799), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:26,130 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.1186 (max= 1.5799), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:58,025 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.1019 (max= 1.5395), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:58,025 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.1019 (max= 1.5395), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:58,025 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.1019 (max= 1.5395), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:58,025 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.1019 (max= 1.5395), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:58,025 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.1019 (max= 1.5395), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:58,026 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.1019 (max= 1.5395), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:58,026 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.1019 (max= 1.5395), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:10:58,026 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.1019 (max= 1.5395), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-11000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-11000! Save time: 4.580239772796631 +2025-10-25 20:11:29,880 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.1142 (max= 1.5283), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:11:29,880 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.1142 (max= 1.5283), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:11:29,880 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.1142 (max= 1.5283), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:11:29,880 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-25 20:11:29,880 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.1142 (max= 1.5283), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:11:29,880 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-25 20:11:29,880 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 20:11:29,880 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-25 20:11:29,880 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 20:11:29,880 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 20:11:29,880 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-25 20:11:29,880 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 20:11:29,880 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.1142 (max= 1.5283), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:11:29,880 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.1142 (max= 1.5283), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:11:29,880 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.1142 (max= 1.5283), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:11:29,880 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-25 20:11:29,880 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 20:11:29,880 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-25 20:11:29,880 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-25 20:11:29,880 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.1142 (max= 1.5283), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:11:29,880 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 20:11:29,880 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 20:11:29,880 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-25 20:11:29,881 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 20:11:43,523 - root - INFO - Finished saving the checkpoint in 13.64 seconds +2025-10-25 20:11:43,531 - root - INFO - Finished saving the checkpoint in 13.65 seconds +2025-10-25 20:11:43,531 - root - INFO - Finished saving the checkpoint in 13.65 seconds +2025-10-25 20:11:43,531 - root - INFO - Finished saving the checkpoint in 13.65 seconds +2025-10-25 20:11:43,532 - root - INFO - Finished saving the checkpoint in 13.65 seconds +2025-10-25 20:11:43,532 - root - INFO - Finished saving the checkpoint in 13.65 seconds +2025-10-25 20:11:43,532 - root - INFO - Finished saving the checkpoint in 13.65 seconds +2025-10-25 20:11:43,532 - root - INFO - Finished saving the checkpoint in 13.65 seconds +2025-10-25 20:12:15,291 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.1408 (max= 1.5830), tps=14433, mfu=30.07%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:15,291 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.1408 (max= 1.5830), tps=14433, mfu=30.07%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:15,291 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.1408 (max= 1.5830), tps=14433, mfu=30.07%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:15,291 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.1408 (max= 1.5830), tps=14433, mfu=30.07%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:15,291 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.1408 (max= 1.5830), tps=14433, mfu=30.07%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:15,291 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.1408 (max= 1.5830), tps=14433, mfu=30.07%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:15,291 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.1408 (max= 1.5830), tps=14433, mfu=30.07%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:15,291 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.1408 (max= 1.5830), tps=14433, mfu=30.07%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:47,165 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.0958 (max= 1.5796), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:47,165 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.0958 (max= 1.5796), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:47,165 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.0958 (max= 1.5796), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:47,166 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.0958 (max= 1.5796), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:47,166 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.0958 (max= 1.5796), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:47,166 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.0958 (max= 1.5796), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:47,166 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.0958 (max= 1.5796), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:12:47,166 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.0958 (max= 1.5796), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:19,055 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.0987 (max= 1.5881), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:19,055 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.0987 (max= 1.5881), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:19,055 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.0987 (max= 1.5881), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:19,055 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.0987 (max= 1.5881), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:19,055 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.0987 (max= 1.5881), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:19,055 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.0987 (max= 1.5881), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:19,055 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.0987 (max= 1.5881), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:19,055 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.0987 (max= 1.5881), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:50,931 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.1049 (max= 1.4950), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:50,931 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.1049 (max= 1.4950), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:50,932 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.1049 (max= 1.4950), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:50,932 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.1049 (max= 1.4950), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:50,932 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.1049 (max= 1.4950), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:50,932 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.1049 (max= 1.4950), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:50,932 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.1049 (max= 1.4950), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:13:50,932 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.1049 (max= 1.4950), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:22,806 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.1360 (max= 1.4884), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:22,806 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.1360 (max= 1.4884), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:22,806 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.1360 (max= 1.4884), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:22,806 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.1360 (max= 1.4884), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:22,806 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.1360 (max= 1.4884), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:22,806 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.1360 (max= 1.4884), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:22,806 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.1360 (max= 1.4884), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:22,806 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.1360 (max= 1.4884), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:54,646 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.1029 (max= 1.4902), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:54,646 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.1029 (max= 1.4902), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:54,646 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.1029 (max= 1.4902), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:54,646 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.1029 (max= 1.4902), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:54,646 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.1029 (max= 1.4902), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:54,646 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.1029 (max= 1.4902), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:54,646 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.1029 (max= 1.4902), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:14:54,646 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.1029 (max= 1.4902), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:26,518 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.1145 (max= 1.4738), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:26,518 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.1145 (max= 1.4738), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:26,518 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.1145 (max= 1.4738), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:26,518 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.1145 (max= 1.4738), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:26,518 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.1145 (max= 1.4738), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:26,518 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.1145 (max= 1.4738), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:26,518 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.1145 (max= 1.4738), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:26,518 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.1145 (max= 1.4738), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:58,415 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.1327 (max= 1.6423), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:58,415 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.1327 (max= 1.6423), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:58,415 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.1327 (max= 1.6423), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:58,415 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.1327 (max= 1.6423), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:58,415 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.1327 (max= 1.6423), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:58,415 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.1327 (max= 1.6423), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:58,415 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.1327 (max= 1.6423), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:15:58,415 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.1327 (max= 1.6423), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:16:30,274 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.1137 (max= 1.4561), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:16:30,274 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.1137 (max= 1.4561), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:16:30,274 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.1137 (max= 1.4561), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:16:30,274 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.1137 (max= 1.4561), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:16:30,274 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.1137 (max= 1.4561), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:16:30,274 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.1137 (max= 1.4561), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:16:30,274 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.1137 (max= 1.4561), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:16:30,275 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.1137 (max= 1.4561), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:02,135 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.1044 (max= 1.5811), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:02,135 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.1044 (max= 1.5811), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:02,135 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.1044 (max= 1.5811), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:02,136 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.1044 (max= 1.5811), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:02,136 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.1044 (max= 1.5811), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:02,136 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.1044 (max= 1.5811), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:02,136 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.1044 (max= 1.5811), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:02,136 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.1044 (max= 1.5811), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:33,982 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.1118 (max= 1.4817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:33,982 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.1118 (max= 1.4817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:33,982 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.1118 (max= 1.4817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:33,982 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.1118 (max= 1.4817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:33,982 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.1118 (max= 1.4817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:33,982 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.1118 (max= 1.4817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:33,982 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.1118 (max= 1.4817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:17:33,982 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.1118 (max= 1.4817), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:05,768 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.0987 (max= 1.5381), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:05,768 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.0987 (max= 1.5381), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:05,768 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.0987 (max= 1.5381), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:05,768 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.0987 (max= 1.5381), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:05,768 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.0987 (max= 1.5381), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:05,768 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.0987 (max= 1.5381), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:05,768 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.0987 (max= 1.5381), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:05,768 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.0987 (max= 1.5381), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:37,602 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.1009 (max= 1.7419), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:37,602 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.1009 (max= 1.7419), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:37,602 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.1009 (max= 1.7419), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:37,603 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.1009 (max= 1.7419), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:37,603 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.1009 (max= 1.7419), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:37,603 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.1009 (max= 1.7419), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:37,603 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.1009 (max= 1.7419), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:18:37,603 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.1009 (max= 1.7419), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:09,484 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.1040 (max= 1.6403), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:09,484 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.1040 (max= 1.6403), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:09,484 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.1040 (max= 1.6403), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:09,484 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.1040 (max= 1.6403), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:09,484 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.1040 (max= 1.6403), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:09,484 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.1040 (max= 1.6403), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:09,485 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.1040 (max= 1.6403), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:09,485 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.1040 (max= 1.6403), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:41,365 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.1109 (max= 1.4753), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:41,365 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.1109 (max= 1.4753), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:41,365 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.1109 (max= 1.4753), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:41,365 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.1109 (max= 1.4753), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:41,365 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.1109 (max= 1.4753), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:41,365 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.1109 (max= 1.4753), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:41,365 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.1109 (max= 1.4753), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:19:41,365 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.1109 (max= 1.4753), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:13,221 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.1191 (max= 1.6012), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:13,221 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.1191 (max= 1.6012), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:13,221 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.1191 (max= 1.6012), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:13,221 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.1191 (max= 1.6012), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:13,221 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.1191 (max= 1.6012), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:13,221 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.1191 (max= 1.6012), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:13,221 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.1191 (max= 1.6012), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:13,221 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.1191 (max= 1.6012), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:45,092 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.1006 (max= 1.5753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:45,092 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.1006 (max= 1.5753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:45,092 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.1006 (max= 1.5753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:45,092 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.1006 (max= 1.5753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:45,093 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.1006 (max= 1.5753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:45,093 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.1006 (max= 1.5753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:45,093 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.1006 (max= 1.5753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:20:45,093 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.1006 (max= 1.5753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:16,974 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.0959 (max= 1.6062), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:16,974 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.0959 (max= 1.6062), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:16,974 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.0959 (max= 1.6062), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:16,974 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.0959 (max= 1.6062), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:16,974 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.0959 (max= 1.6062), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:16,974 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.0959 (max= 1.6062), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:16,974 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.0959 (max= 1.6062), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:16,974 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.0959 (max= 1.6062), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:48,857 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.1219 (max= 1.5882), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:48,857 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.1219 (max= 1.5882), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:48,857 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.1219 (max= 1.5882), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:48,857 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.1219 (max= 1.5882), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:48,858 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.1219 (max= 1.5882), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:48,858 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.1219 (max= 1.5882), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:48,858 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.1219 (max= 1.5882), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:21:48,858 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.1219 (max= 1.5882), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:20,713 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.1037 (max= 1.5398), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:20,713 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.1037 (max= 1.5398), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:20,713 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.1037 (max= 1.5398), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:20,713 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.1037 (max= 1.5398), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:20,713 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.1037 (max= 1.5398), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:20,713 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.1037 (max= 1.5398), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:20,713 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.1037 (max= 1.5398), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:20,713 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.1037 (max= 1.5398), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:52,515 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.0824 (max= 1.5208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:52,515 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.0824 (max= 1.5208), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:52,516 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.0824 (max= 1.5208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:52,516 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.0824 (max= 1.5208), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:52,516 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.0824 (max= 1.5208), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:52,516 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.0824 (max= 1.5208), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:52,516 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.0824 (max= 1.5208), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:22:52,516 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.0824 (max= 1.5208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:24,347 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.0991 (max= 1.6227), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:24,347 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.0991 (max= 1.6227), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:24,347 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.0991 (max= 1.6227), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:24,347 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.0991 (max= 1.6227), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:24,347 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.0991 (max= 1.6227), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:24,347 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.0991 (max= 1.6227), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:24,347 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.0991 (max= 1.6227), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:24,347 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.0991 (max= 1.6227), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:56,307 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.0970 (max= 1.4901), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:56,307 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.0970 (max= 1.4901), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:56,307 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.0970 (max= 1.4901), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:56,307 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.0970 (max= 1.4901), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:56,307 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.0970 (max= 1.4901), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:56,307 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.0970 (max= 1.4901), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:56,307 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.0970 (max= 1.4901), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:23:56,307 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.0970 (max= 1.4901), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:24:28,168 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.1095 (max= 1.5643), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:24:28,168 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.1095 (max= 1.5643), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:24:28,168 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.1095 (max= 1.5643), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:24:28,168 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.1095 (max= 1.5643), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:24:28,168 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.1095 (max= 1.5643), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:24:28,168 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.1095 (max= 1.5643), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:24:28,168 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.1095 (max= 1.5643), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:24:28,169 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.1095 (max= 1.5643), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:00,064 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.1092 (max= 1.5408), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:00,064 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.1092 (max= 1.5408), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:00,064 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.1092 (max= 1.5408), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:00,064 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.1092 (max= 1.5408), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:00,064 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.1092 (max= 1.5408), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:00,064 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.1092 (max= 1.5408), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:00,064 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.1092 (max= 1.5408), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:00,064 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.1092 (max= 1.5408), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:31,915 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.0951 (max= 1.5650), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:31,915 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.0951 (max= 1.5650), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:31,915 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.0951 (max= 1.5650), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:31,915 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.0951 (max= 1.5650), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:31,915 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.0951 (max= 1.5650), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:31,915 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.0951 (max= 1.5650), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:31,915 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.0951 (max= 1.5650), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:25:31,915 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.0951 (max= 1.5650), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:03,802 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.0888 (max= 1.4125), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:03,802 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.0888 (max= 1.4125), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:03,802 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.0888 (max= 1.4125), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:03,802 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.0888 (max= 1.4125), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:03,802 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.0888 (max= 1.4125), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:03,802 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.0888 (max= 1.4125), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:03,802 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.0888 (max= 1.4125), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:03,803 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.0888 (max= 1.4125), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:35,681 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.1193 (max= 1.5335), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:35,681 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.1193 (max= 1.5335), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:35,682 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.1193 (max= 1.5335), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:35,682 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.1193 (max= 1.5335), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:35,682 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.1193 (max= 1.5335), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:35,682 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.1193 (max= 1.5335), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:35,682 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.1193 (max= 1.5335), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:26:35,682 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.1193 (max= 1.5335), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:07,570 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.1113 (max= 1.4598), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:07,570 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.1113 (max= 1.4598), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:07,570 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.1113 (max= 1.4598), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:07,570 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.1113 (max= 1.4598), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:07,570 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.1113 (max= 1.4598), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:07,570 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.1113 (max= 1.4598), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:07,570 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.1113 (max= 1.4598), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:07,570 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.1113 (max= 1.4598), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:39,466 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.1100 (max= 1.5328), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:39,467 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.1100 (max= 1.5328), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:39,467 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.1100 (max= 1.5328), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:39,467 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.1100 (max= 1.5328), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:39,467 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.1100 (max= 1.5328), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:39,467 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.1100 (max= 1.5328), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:39,467 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.1100 (max= 1.5328), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:27:39,467 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.1100 (max= 1.5328), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:11,304 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.1043 (max= 1.4812), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:11,304 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.1043 (max= 1.4812), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:11,304 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.1043 (max= 1.4812), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:11,305 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.1043 (max= 1.4812), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:11,305 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.1043 (max= 1.4812), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:11,305 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.1043 (max= 1.4812), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:11,305 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.1043 (max= 1.4812), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:11,305 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.1043 (max= 1.4812), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:43,180 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.1179 (max= 1.6193), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:43,180 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.1179 (max= 1.6193), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:43,180 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.1179 (max= 1.6193), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:43,180 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.1179 (max= 1.6193), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:43,180 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.1179 (max= 1.6193), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:43,180 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.1179 (max= 1.6193), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:43,180 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.1179 (max= 1.6193), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:28:43,180 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.1179 (max= 1.6193), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:15,044 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.0910 (max= 1.4363), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:15,044 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.0910 (max= 1.4363), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:15,044 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.0910 (max= 1.4363), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:15,044 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.0910 (max= 1.4363), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:15,044 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.0910 (max= 1.4363), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:15,044 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.0910 (max= 1.4363), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:15,044 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.0910 (max= 1.4363), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:15,044 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.0910 (max= 1.4363), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:46,915 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.1106 (max= 1.5451), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:46,916 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.1106 (max= 1.5451), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:46,916 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.1106 (max= 1.5451), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:46,916 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.1106 (max= 1.5451), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:46,916 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.1106 (max= 1.5451), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:46,916 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.1106 (max= 1.5451), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:46,916 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.1106 (max= 1.5451), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:29:46,916 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.1106 (max= 1.5451), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:18,805 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.1015 (max= 1.5650), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:18,805 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.1015 (max= 1.5650), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:18,806 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.1015 (max= 1.5650), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:18,806 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.1015 (max= 1.5650), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:18,806 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.1015 (max= 1.5650), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:18,806 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.1015 (max= 1.5650), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:18,806 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.1015 (max= 1.5650), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:18,806 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.1015 (max= 1.5650), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:50,679 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.0966 (max= 1.5024), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:50,679 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.0966 (max= 1.5024), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:50,680 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.0966 (max= 1.5024), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:50,680 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.0966 (max= 1.5024), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:50,680 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.0966 (max= 1.5024), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:50,680 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.0966 (max= 1.5024), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:50,680 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.0966 (max= 1.5024), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:30:50,680 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.0966 (max= 1.5024), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:22,545 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.1018 (max= 1.5500), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:22,545 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.1018 (max= 1.5500), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:22,545 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.1018 (max= 1.5500), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:22,545 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.1018 (max= 1.5500), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:22,545 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.1018 (max= 1.5500), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:22,545 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.1018 (max= 1.5500), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:22,545 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.1018 (max= 1.5500), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:22,545 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.1018 (max= 1.5500), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:54,471 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.0949 (max= 1.5267), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:54,471 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.0949 (max= 1.5267), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:54,471 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.0949 (max= 1.5267), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:54,471 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.0949 (max= 1.5267), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:54,471 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.0949 (max= 1.5267), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:54,471 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.0949 (max= 1.5267), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:54,471 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.0949 (max= 1.5267), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:31:54,471 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.0949 (max= 1.5267), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:26,481 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.1070 (max= 1.4734), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:26,481 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.1070 (max= 1.4734), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:26,481 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.1070 (max= 1.4734), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:26,481 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.1070 (max= 1.4734), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:26,481 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.1070 (max= 1.4734), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:26,481 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.1070 (max= 1.4734), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:26,481 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.1070 (max= 1.4734), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:26,481 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.1070 (max= 1.4734), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:54,355 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5032272 +2025-10-25 20:32:58,418 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.1023 (max= 1.7631), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:58,418 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.1023 (max= 1.7631), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:58,418 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.1023 (max= 1.7631), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:58,418 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.1023 (max= 1.7631), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:58,418 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.1023 (max= 1.7631), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:58,418 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.1023 (max= 1.7631), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:58,418 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.1023 (max= 1.7631), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:32:58,419 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.1023 (max= 1.7631), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:33:30,323 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.0875 (max= 1.6715), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:33:30,323 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.0875 (max= 1.6715), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:33:30,323 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.0875 (max= 1.6715), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:33:30,323 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.0875 (max= 1.6715), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:33:30,323 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.0875 (max= 1.6715), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:33:30,323 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.0875 (max= 1.6715), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:33:30,323 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.0875 (max= 1.6715), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:33:30,323 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.0875 (max= 1.6715), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:02,224 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.1051 (max= 1.5246), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:02,224 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.1051 (max= 1.5246), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:02,224 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.1051 (max= 1.5246), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:02,225 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.1051 (max= 1.5246), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:02,225 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.1051 (max= 1.5246), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:02,225 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.1051 (max= 1.5246), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:02,225 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.1051 (max= 1.5246), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:02,225 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.1051 (max= 1.5246), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:34,114 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.1007 (max= 1.5089), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:34,114 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.1007 (max= 1.5089), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:34,114 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.1007 (max= 1.5089), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:34,114 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.1007 (max= 1.5089), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:34,114 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.1007 (max= 1.5089), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:34,114 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.1007 (max= 1.5089), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:34,114 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.1007 (max= 1.5089), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:34:34,114 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.1007 (max= 1.5089), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:06,036 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.0921 (max= 1.5825), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:06,036 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.0921 (max= 1.5825), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:06,036 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.0921 (max= 1.5825), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:06,036 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.0921 (max= 1.5825), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:06,036 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.0921 (max= 1.5825), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:06,036 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.0921 (max= 1.5825), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:06,036 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.0921 (max= 1.5825), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:06,036 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.0921 (max= 1.5825), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:37,922 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.0897 (max= 1.5381), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:37,922 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.0897 (max= 1.5381), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:37,922 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.0897 (max= 1.5381), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:37,922 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.0897 (max= 1.5381), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:37,922 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.0897 (max= 1.5381), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:37,922 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.0897 (max= 1.5381), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:37,922 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.0897 (max= 1.5381), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:35:37,922 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.0897 (max= 1.5381), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:09,814 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.0920 (max= 1.6242), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:09,814 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.0920 (max= 1.6242), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:09,814 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.0920 (max= 1.6242), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:09,814 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.0920 (max= 1.6242), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:09,814 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.0920 (max= 1.6242), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:09,814 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.0920 (max= 1.6242), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:09,814 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.0920 (max= 1.6242), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:09,814 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.0920 (max= 1.6242), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:41,699 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.1088 (max= 1.5389), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:41,699 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.1088 (max= 1.5389), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:41,699 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.1088 (max= 1.5389), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:41,699 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.1088 (max= 1.5389), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:41,699 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.1088 (max= 1.5389), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:41,699 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.1088 (max= 1.5389), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:41,699 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.1088 (max= 1.5389), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:36:41,699 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.1088 (max= 1.5389), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:13,535 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.1240 (max= 1.5106), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:13,535 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.1240 (max= 1.5106), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:13,535 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.1240 (max= 1.5106), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:13,535 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.1240 (max= 1.5106), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:13,535 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.1240 (max= 1.5106), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:13,535 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.1240 (max= 1.5106), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:13,535 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.1240 (max= 1.5106), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:13,535 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.1240 (max= 1.5106), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:45,370 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.0943 (max= 1.4680), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:45,370 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.0943 (max= 1.4680), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:45,370 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.0943 (max= 1.4680), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:45,370 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.0943 (max= 1.4680), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:45,370 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.0943 (max= 1.4680), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:45,370 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.0943 (max= 1.4680), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:45,370 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.0943 (max= 1.4680), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:37:45,370 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.0943 (max= 1.4680), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:17,181 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.1052 (max= 1.6138), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:17,181 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.1052 (max= 1.6138), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:17,182 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.1052 (max= 1.6138), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:17,182 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.1052 (max= 1.6138), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:17,182 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.1052 (max= 1.6138), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:17,182 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.1052 (max= 1.6138), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:17,182 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.1052 (max= 1.6138), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:17,182 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.1052 (max= 1.6138), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:48,987 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.0823 (max= 1.4449), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:48,987 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.0823 (max= 1.4449), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:48,987 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.0823 (max= 1.4449), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:48,987 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.0823 (max= 1.4449), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:48,987 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.0823 (max= 1.4449), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:48,988 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.0823 (max= 1.4449), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:48,988 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.0823 (max= 1.4449), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:38:48,988 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.0823 (max= 1.4449), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:20,900 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.1070 (max= 1.4922), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:20,900 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.1070 (max= 1.4922), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:20,900 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.1070 (max= 1.4922), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:20,900 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.1070 (max= 1.4922), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:20,900 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.1070 (max= 1.4922), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:20,900 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.1070 (max= 1.4922), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:20,900 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.1070 (max= 1.4922), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:20,900 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.1070 (max= 1.4922), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:52,729 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.0783 (max= 1.6238), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:52,729 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.0783 (max= 1.6238), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:52,729 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.0783 (max= 1.6238), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:52,729 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.0783 (max= 1.6238), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:52,729 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.0783 (max= 1.6238), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:52,729 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.0783 (max= 1.6238), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:52,729 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.0783 (max= 1.6238), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:39:52,730 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.0783 (max= 1.6238), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:24,630 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.0724 (max= 1.5088), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:24,630 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.0724 (max= 1.5088), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:24,630 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.0724 (max= 1.5088), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:24,630 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.0724 (max= 1.5088), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:24,630 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.0724 (max= 1.5088), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:24,630 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.0724 (max= 1.5088), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:24,630 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.0724 (max= 1.5088), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:24,630 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.0724 (max= 1.5088), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:56,508 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.0842 (max= 1.5359), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:56,508 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.0842 (max= 1.5359), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:56,508 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.0842 (max= 1.5359), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:56,508 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.0842 (max= 1.5359), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:56,508 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.0842 (max= 1.5359), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:56,508 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.0842 (max= 1.5359), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:56,508 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.0842 (max= 1.5359), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:40:56,508 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.0842 (max= 1.5359), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:41:28,414 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.1034 (max= 1.4941), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:41:28,414 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.1034 (max= 1.4941), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:41:28,414 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.1034 (max= 1.4941), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:41:28,414 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.1034 (max= 1.4941), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:41:28,414 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.1034 (max= 1.4941), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:41:28,414 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.1034 (max= 1.4941), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:41:28,414 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.1034 (max= 1.4941), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:41:28,414 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.1034 (max= 1.4941), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:00,320 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.1099 (max= 1.5642), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:00,320 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.1099 (max= 1.5642), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:00,320 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.1099 (max= 1.5642), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:00,320 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.1099 (max= 1.5642), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:00,320 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.1099 (max= 1.5642), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:00,320 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.1099 (max= 1.5642), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:00,320 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.1099 (max= 1.5642), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:00,321 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.1099 (max= 1.5642), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:32,155 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.1052 (max= 1.5453), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:32,155 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.1052 (max= 1.5453), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:32,155 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.1052 (max= 1.5453), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:32,155 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.1052 (max= 1.5453), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:32,155 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.1052 (max= 1.5453), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:32,155 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.1052 (max= 1.5453), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:32,155 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.1052 (max= 1.5453), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:42:32,155 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.1052 (max= 1.5453), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:04,047 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.0970 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:04,047 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.0970 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:04,047 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.0970 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:04,047 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.0970 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:04,047 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.0970 (max= 1.5718), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:04,047 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.0970 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:04,047 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.0970 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:04,047 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.0970 (max= 1.5718), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:23,756 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:6152410 +2025-10-25 20:43:35,900 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.1097 (max= 1.5171), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:35,900 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.1097 (max= 1.5171), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:35,900 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.1097 (max= 1.5171), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:35,900 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.1097 (max= 1.5171), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:35,900 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.1097 (max= 1.5171), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:35,900 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.1097 (max= 1.5171), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:35,900 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.1097 (max= 1.5171), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:43:35,900 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.1097 (max= 1.5171), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:07,675 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.0982 (max= 1.5454), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:07,675 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.0982 (max= 1.5454), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:07,675 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.0982 (max= 1.5454), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:07,675 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.0982 (max= 1.5454), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:07,675 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.0982 (max= 1.5454), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:07,675 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.0982 (max= 1.5454), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:07,675 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.0982 (max= 1.5454), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:07,675 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.0982 (max= 1.5454), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:39,462 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.0719 (max= 1.5149), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:39,462 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.0719 (max= 1.5149), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:39,462 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.0719 (max= 1.5149), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:39,462 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.0719 (max= 1.5149), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:39,462 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.0719 (max= 1.5149), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:39,462 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.0719 (max= 1.5149), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:39,462 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.0719 (max= 1.5149), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:44:39,462 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.0719 (max= 1.5149), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:11,313 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.1030 (max= 1.5439), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:11,313 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.1030 (max= 1.5439), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:11,313 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.1030 (max= 1.5439), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:11,313 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.1030 (max= 1.5439), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:11,313 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.1030 (max= 1.5439), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:11,313 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.1030 (max= 1.5439), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:11,313 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.1030 (max= 1.5439), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:11,313 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.1030 (max= 1.5439), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:43,184 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.0997 (max= 1.6383), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:43,184 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.0997 (max= 1.6383), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:43,184 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.0997 (max= 1.6383), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:43,184 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.0997 (max= 1.6383), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:43,184 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.0997 (max= 1.6383), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:43,184 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.0997 (max= 1.6383), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:43,184 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.0997 (max= 1.6383), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:45:43,184 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.0997 (max= 1.6383), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:15,120 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.0881 (max= 1.6200), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:15,120 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.0881 (max= 1.6200), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:15,120 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.0881 (max= 1.6200), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:15,121 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.0881 (max= 1.6200), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:15,121 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.0881 (max= 1.6200), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:15,121 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.0881 (max= 1.6200), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:15,121 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.0881 (max= 1.6200), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:15,121 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.0881 (max= 1.6200), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:46,915 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.0756 (max= 1.7441), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:46,915 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.0756 (max= 1.7441), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:46,915 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.0756 (max= 1.7441), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:46,916 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.0756 (max= 1.7441), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:46,916 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.0756 (max= 1.7441), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:46,916 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.0756 (max= 1.7441), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:46,916 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.0756 (max= 1.7441), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:46:46,916 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.0756 (max= 1.7441), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:18,800 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.1025 (max= 1.5343), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:18,800 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.1025 (max= 1.5343), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:18,800 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.1025 (max= 1.5343), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:18,800 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.1025 (max= 1.5343), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:18,800 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.1025 (max= 1.5343), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:18,800 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.1025 (max= 1.5343), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:18,800 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.1025 (max= 1.5343), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:18,800 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.1025 (max= 1.5343), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:50,692 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.0986 (max= 1.7667), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:50,692 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.0986 (max= 1.7667), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:50,692 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.0986 (max= 1.7667), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:50,692 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.0986 (max= 1.7667), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:50,692 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.0986 (max= 1.7667), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:50,692 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.0986 (max= 1.7667), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:50,692 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.0986 (max= 1.7667), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:47:50,692 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.0986 (max= 1.7667), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:22,582 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.0982 (max= 1.6664), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:22,583 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.0982 (max= 1.6664), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:22,583 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.0982 (max= 1.6664), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:22,583 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.0982 (max= 1.6664), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:22,583 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.0982 (max= 1.6664), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:22,583 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.0982 (max= 1.6664), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:22,583 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.0982 (max= 1.6664), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:22,583 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.0982 (max= 1.6664), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:54,530 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.0994 (max= 1.5748), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:54,531 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.0994 (max= 1.5748), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:54,531 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.0994 (max= 1.5748), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:54,531 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.0994 (max= 1.5748), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:54,531 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.0994 (max= 1.5748), tps=20516, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:54,531 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.0994 (max= 1.5748), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:54,531 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.0994 (max= 1.5748), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:48:54,531 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.0994 (max= 1.5748), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:26,329 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.1418 (max= 1.6506), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:26,329 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.1418 (max= 1.6506), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:26,329 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.1418 (max= 1.6506), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:26,329 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.1418 (max= 1.6506), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:26,329 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.1418 (max= 1.6506), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:26,329 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.1418 (max= 1.6506), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:26,329 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.1418 (max= 1.6506), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:26,329 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.1418 (max= 1.6506), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:58,160 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.1144 (max= 1.4497), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:58,160 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.1144 (max= 1.4497), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:58,160 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.1144 (max= 1.4497), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:58,160 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.1144 (max= 1.4497), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:58,161 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.1144 (max= 1.4497), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:58,161 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.1144 (max= 1.4497), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:58,161 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.1144 (max= 1.4497), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:49:58,161 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.1144 (max= 1.4497), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:50:30,080 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.0913 (max= 1.5708), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:50:30,080 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.0913 (max= 1.5708), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:50:30,080 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.0913 (max= 1.5708), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:50:30,080 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.0913 (max= 1.5708), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:50:30,080 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.0913 (max= 1.5708), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:50:30,080 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.0913 (max= 1.5708), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:50:30,081 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.0913 (max= 1.5708), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:50:30,081 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.0913 (max= 1.5708), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:01,980 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.1117 (max= 1.5634), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:01,980 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.1117 (max= 1.5634), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:01,980 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.1117 (max= 1.5634), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:01,980 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.1117 (max= 1.5634), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:01,980 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.1117 (max= 1.5634), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:01,980 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.1117 (max= 1.5634), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:01,980 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.1117 (max= 1.5634), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:01,980 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.1117 (max= 1.5634), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:33,794 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.1149 (max= 1.5337), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:33,794 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.1149 (max= 1.5337), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:33,794 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.1149 (max= 1.5337), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:33,794 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.1149 (max= 1.5337), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:33,794 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.1149 (max= 1.5337), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:33,794 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.1149 (max= 1.5337), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:33,794 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.1149 (max= 1.5337), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:51:33,794 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.1149 (max= 1.5337), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:05,596 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.1229 (max= 1.5676), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:05,596 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.1229 (max= 1.5676), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:05,596 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.1229 (max= 1.5676), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:05,596 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.1229 (max= 1.5676), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:05,596 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.1229 (max= 1.5676), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:05,596 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.1229 (max= 1.5676), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:05,596 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.1229 (max= 1.5676), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:05,597 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.1229 (max= 1.5676), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:37,437 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.1306 (max= 1.4925), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:37,437 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.1306 (max= 1.4925), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:37,437 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.1306 (max= 1.4925), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:37,437 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.1306 (max= 1.4925), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:37,437 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.1306 (max= 1.4925), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:37,437 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.1306 (max= 1.4925), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:37,437 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.1306 (max= 1.4925), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:52:37,437 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.1306 (max= 1.4925), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:09,256 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.1255 (max= 1.5435), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:09,256 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.1255 (max= 1.5435), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:09,256 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.1255 (max= 1.5435), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:09,256 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.1255 (max= 1.5435), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:09,256 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.1255 (max= 1.5435), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:09,256 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.1255 (max= 1.5435), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:09,257 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.1255 (max= 1.5435), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:09,257 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.1255 (max= 1.5435), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:41,165 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.1145 (max= 1.6946), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:41,165 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.1145 (max= 1.6946), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:41,165 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.1145 (max= 1.6946), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:41,165 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.1145 (max= 1.6946), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:41,165 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.1145 (max= 1.6946), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:41,165 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.1145 (max= 1.6946), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:41,165 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.1145 (max= 1.6946), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:53:41,166 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.1145 (max= 1.6946), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:13,127 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.1016 (max= 1.5196), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:13,127 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.1016 (max= 1.5196), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:13,128 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.1016 (max= 1.5196), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:13,128 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.1016 (max= 1.5196), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:13,128 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.1016 (max= 1.5196), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:13,128 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.1016 (max= 1.5196), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:13,128 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.1016 (max= 1.5196), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:13,128 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.1016 (max= 1.5196), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:44,996 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.1051 (max= 1.5007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:44,996 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.1051 (max= 1.5007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:44,996 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.1051 (max= 1.5007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:44,996 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.1051 (max= 1.5007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:44,996 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.1051 (max= 1.5007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:44,996 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.1051 (max= 1.5007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:44,996 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.1051 (max= 1.5007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:54:44,996 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.1051 (max= 1.5007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:16,873 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.1186 (max= 1.5411), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:16,873 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.1186 (max= 1.5411), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:16,873 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.1186 (max= 1.5411), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:16,873 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.1186 (max= 1.5411), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:16,873 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.1186 (max= 1.5411), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:16,873 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.1186 (max= 1.5411), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:16,873 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.1186 (max= 1.5411), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:16,873 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.1186 (max= 1.5411), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:48,756 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.1452 (max= 1.5782), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:48,756 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.1452 (max= 1.5782), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:48,756 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.1452 (max= 1.5782), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:48,756 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.1452 (max= 1.5782), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:48,756 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.1452 (max= 1.5782), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:48,756 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.1452 (max= 1.5782), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:48,756 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.1452 (max= 1.5782), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:55:48,756 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.1452 (max= 1.5782), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:20,663 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.1172 (max= 1.6242), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:20,663 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.1172 (max= 1.6242), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:20,663 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.1172 (max= 1.6242), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:20,663 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.1172 (max= 1.6242), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:20,663 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.1172 (max= 1.6242), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:20,663 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.1172 (max= 1.6242), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:20,663 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.1172 (max= 1.6242), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:20,663 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.1172 (max= 1.6242), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:52,563 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.1010 (max= 1.5824), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:52,563 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.1010 (max= 1.5824), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:52,563 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.1010 (max= 1.5824), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:52,563 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.1010 (max= 1.5824), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:52,563 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.1010 (max= 1.5824), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:52,563 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.1010 (max= 1.5824), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:52,563 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.1010 (max= 1.5824), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:56:52,563 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.1010 (max= 1.5824), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:24,446 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.1069 (max= 1.5282), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:24,446 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.1069 (max= 1.5282), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:24,446 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.1069 (max= 1.5282), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:24,446 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.1069 (max= 1.5282), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:24,446 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.1069 (max= 1.5282), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:24,446 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.1069 (max= 1.5282), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:24,446 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.1069 (max= 1.5282), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:24,446 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.1069 (max= 1.5282), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:56,247 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.1065 (max= 1.5290), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:56,247 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.1065 (max= 1.5290), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:56,247 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.1065 (max= 1.5290), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:56,247 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.1065 (max= 1.5290), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:56,247 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.1065 (max= 1.5290), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:56,247 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.1065 (max= 1.5290), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:56,247 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.1065 (max= 1.5290), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:57:56,247 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.1065 (max= 1.5290), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:58:28,068 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1271 (max= 1.5550), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:58:28,068 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1271 (max= 1.5550), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:58:28,068 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1271 (max= 1.5550), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:58:28,068 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1271 (max= 1.5550), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:58:28,068 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1271 (max= 1.5550), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:58:28,068 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1271 (max= 1.5550), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:58:28,068 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1271 (max= 1.5550), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:58:28,069 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1271 (max= 1.5550), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:00,001 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.1085 (max= 1.7167), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:00,001 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.1085 (max= 1.7167), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:00,001 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.1085 (max= 1.7167), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:00,001 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.1085 (max= 1.7167), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:00,001 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.1085 (max= 1.7167), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:00,001 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.1085 (max= 1.7167), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:00,002 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.1085 (max= 1.7167), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:00,002 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.1085 (max= 1.7167), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:31,886 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.1073 (max= 1.5146), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:31,886 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.1073 (max= 1.5146), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:31,886 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.1073 (max= 1.5146), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:31,886 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.1073 (max= 1.5146), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:31,887 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.1073 (max= 1.5146), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:31,887 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.1073 (max= 1.5146), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:31,887 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.1073 (max= 1.5146), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 20:59:31,887 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.1073 (max= 1.5146), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:03,698 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.1095 (max= 1.6977), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:03,698 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.1095 (max= 1.6977), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:03,698 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.1095 (max= 1.6977), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:03,698 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.1095 (max= 1.6977), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:03,699 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.1095 (max= 1.6977), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:03,699 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.1095 (max= 1.6977), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:03,699 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.1095 (max= 1.6977), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:03,699 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.1095 (max= 1.6977), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:35,514 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.1201 (max= 1.5451), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:35,514 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.1201 (max= 1.5451), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:35,514 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.1201 (max= 1.5451), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:35,514 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.1201 (max= 1.5451), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:35,514 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.1201 (max= 1.5451), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:35,514 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.1201 (max= 1.5451), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:35,514 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.1201 (max= 1.5451), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:00:35,515 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.1201 (max= 1.5451), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:07,358 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.1048 (max= 1.5126), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:07,358 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.1048 (max= 1.5126), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:07,358 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.1048 (max= 1.5126), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:07,358 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.1048 (max= 1.5126), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:07,358 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.1048 (max= 1.5126), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:07,358 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.1048 (max= 1.5126), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:07,359 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.1048 (max= 1.5126), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:07,359 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.1048 (max= 1.5126), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:39,202 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.1240 (max= 1.5347), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:39,202 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.1240 (max= 1.5347), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:39,202 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.1240 (max= 1.5347), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:39,202 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.1240 (max= 1.5347), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:39,202 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.1240 (max= 1.5347), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:39,202 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.1240 (max= 1.5347), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:39,202 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.1240 (max= 1.5347), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:01:39,202 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.1240 (max= 1.5347), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:11,102 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.1034 (max= 1.5575), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:11,102 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.1034 (max= 1.5575), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:11,102 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.1034 (max= 1.5575), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:11,102 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.1034 (max= 1.5575), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:11,102 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.1034 (max= 1.5575), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:11,102 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.1034 (max= 1.5575), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:11,102 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.1034 (max= 1.5575), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:11,102 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.1034 (max= 1.5575), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:42,929 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.0950 (max= 1.5323), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:42,929 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.0950 (max= 1.5323), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:42,929 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.0950 (max= 1.5323), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:42,929 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.0950 (max= 1.5323), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:42,929 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.0950 (max= 1.5323), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:42,929 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.0950 (max= 1.5323), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:42,929 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.0950 (max= 1.5323), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:02:42,929 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.0950 (max= 1.5323), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:14,810 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.1117 (max= 1.6080), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:14,810 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.1117 (max= 1.6080), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:14,810 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.1117 (max= 1.6080), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:14,810 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.1117 (max= 1.6080), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:14,810 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.1117 (max= 1.6080), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:14,810 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.1117 (max= 1.6080), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:14,810 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.1117 (max= 1.6080), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:14,810 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.1117 (max= 1.6080), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:46,576 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1174 (max= 1.6970), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:46,576 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1174 (max= 1.6970), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:46,576 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1174 (max= 1.6970), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:46,576 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1174 (max= 1.6970), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:46,576 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1174 (max= 1.6970), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:46,576 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1174 (max= 1.6970), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:46,576 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1174 (max= 1.6970), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:03:46,577 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1174 (max= 1.6970), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:18,496 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.1006 (max= 1.5913), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:18,496 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.1006 (max= 1.5913), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:18,497 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.1006 (max= 1.5913), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:18,497 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.1006 (max= 1.5913), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:18,497 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.1006 (max= 1.5913), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:18,497 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.1006 (max= 1.5913), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:18,497 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.1006 (max= 1.5913), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:18,497 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.1006 (max= 1.5913), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-12000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-12000! Save time: 4.54196310043335 +2025-10-25 21:04:50,423 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.0822 (max= 1.5207), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:50,423 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.0822 (max= 1.5207), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:50,423 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-25 21:04:50,424 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:04:50,423 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.0822 (max= 1.5207), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:50,424 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-25 21:04:50,424 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:04:50,424 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-25 21:04:50,424 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:04:50,424 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.0822 (max= 1.5207), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:50,424 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.0822 (max= 1.5207), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:50,424 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.0822 (max= 1.5207), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:50,424 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.0822 (max= 1.5207), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:50,424 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.0822 (max= 1.5207), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:04:50,424 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-25 21:04:50,424 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-25 21:04:50,424 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-25 21:04:50,424 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:04:50,424 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-25 21:04:50,424 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:04:50,424 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:04:50,424 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:04:50,424 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-25 21:04:50,424 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:05:04,001 - root - INFO - Finished saving the checkpoint in 13.58 seconds +2025-10-25 21:05:04,009 - root - INFO - Finished saving the checkpoint in 13.59 seconds +2025-10-25 21:05:04,009 - root - INFO - Finished saving the checkpoint in 13.59 seconds +2025-10-25 21:05:04,009 - root - INFO - Finished saving the checkpoint in 13.59 seconds +2025-10-25 21:05:04,010 - root - INFO - Finished saving the checkpoint in 13.59 seconds +2025-10-25 21:05:04,010 - root - INFO - Finished saving the checkpoint in 13.59 seconds +2025-10-25 21:05:04,010 - root - INFO - Finished saving the checkpoint in 13.59 seconds +2025-10-25 21:05:04,011 - root - INFO - Finished saving the checkpoint in 13.59 seconds +2025-10-25 21:05:35,804 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.0963 (max= 1.5349), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:05:35,804 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.0963 (max= 1.5349), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:05:35,804 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.0963 (max= 1.5349), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:05:35,804 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.0963 (max= 1.5349), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:05:35,804 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.0963 (max= 1.5349), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:05:35,804 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.0963 (max= 1.5349), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:05:35,804 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.0963 (max= 1.5349), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:05:35,805 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.0963 (max= 1.5349), tps=14442, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:07,717 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1329 (max= 1.6129), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:07,717 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1329 (max= 1.6129), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:07,717 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1329 (max= 1.6129), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:07,717 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1329 (max= 1.6129), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:07,717 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1329 (max= 1.6129), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:07,717 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1329 (max= 1.6129), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:07,717 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1329 (max= 1.6129), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:07,717 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1329 (max= 1.6129), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:39,532 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.1076 (max= 1.5654), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:39,532 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.1076 (max= 1.5654), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:39,532 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.1076 (max= 1.5654), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:39,532 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.1076 (max= 1.5654), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:39,532 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.1076 (max= 1.5654), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:39,533 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.1076 (max= 1.5654), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:39,533 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.1076 (max= 1.5654), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:06:39,533 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.1076 (max= 1.5654), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:11,439 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.1178 (max= 1.5907), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:11,439 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.1178 (max= 1.5907), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:11,439 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.1178 (max= 1.5907), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:11,439 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.1178 (max= 1.5907), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:11,439 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.1178 (max= 1.5907), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:11,439 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.1178 (max= 1.5907), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:11,439 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.1178 (max= 1.5907), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:11,439 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.1178 (max= 1.5907), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:43,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.1196 (max= 1.5216), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:43,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.1196 (max= 1.5216), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:43,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.1196 (max= 1.5216), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:43,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.1196 (max= 1.5216), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:43,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.1196 (max= 1.5216), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:43,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.1196 (max= 1.5216), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:43,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.1196 (max= 1.5216), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:07:43,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.1196 (max= 1.5216), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:15,122 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1136 (max= 1.5710), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:15,122 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1136 (max= 1.5710), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:15,122 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1136 (max= 1.5710), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:15,122 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1136 (max= 1.5710), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:15,122 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1136 (max= 1.5710), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:15,122 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1136 (max= 1.5710), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:15,122 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1136 (max= 1.5710), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:15,122 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1136 (max= 1.5710), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:47,044 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.1011 (max= 1.4764), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:47,044 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.1011 (max= 1.4764), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:47,044 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.1011 (max= 1.4764), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:47,044 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.1011 (max= 1.4764), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:47,044 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.1011 (max= 1.4764), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:47,044 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.1011 (max= 1.4764), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:47,044 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.1011 (max= 1.4764), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:08:47,044 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.1011 (max= 1.4764), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:18,843 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.0914 (max= 1.6175), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:18,843 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.0914 (max= 1.6175), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:18,843 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.0914 (max= 1.6175), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:18,843 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.0914 (max= 1.6175), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:18,843 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.0914 (max= 1.6175), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:18,843 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.0914 (max= 1.6175), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:18,843 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.0914 (max= 1.6175), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:18,843 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.0914 (max= 1.6175), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:50,673 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.0933 (max= 1.6978), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:50,673 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.0933 (max= 1.6978), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:50,673 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.0933 (max= 1.6978), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:50,673 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.0933 (max= 1.6978), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:50,673 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.0933 (max= 1.6978), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:50,673 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.0933 (max= 1.6978), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:50,673 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.0933 (max= 1.6978), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:09:50,673 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.0933 (max= 1.6978), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:22,520 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.0855 (max= 1.5740), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:22,520 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.0855 (max= 1.5740), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:22,520 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.0855 (max= 1.5740), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:22,520 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.0855 (max= 1.5740), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:22,520 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.0855 (max= 1.5740), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:22,520 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.0855 (max= 1.5740), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:22,521 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.0855 (max= 1.5740), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:22,521 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.0855 (max= 1.5740), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:54,409 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.0961 (max= 1.4797), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:54,409 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.0961 (max= 1.4797), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:54,409 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.0961 (max= 1.4797), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:54,409 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.0961 (max= 1.4797), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:54,409 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.0961 (max= 1.4797), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:54,410 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.0961 (max= 1.4797), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:54,410 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.0961 (max= 1.4797), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:10:54,410 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.0961 (max= 1.4797), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:26,233 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.1193 (max= 1.5683), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:26,233 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.1193 (max= 1.5683), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:26,233 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.1193 (max= 1.5683), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:26,233 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.1193 (max= 1.5683), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:26,233 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.1193 (max= 1.5683), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:26,233 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.1193 (max= 1.5683), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:26,233 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.1193 (max= 1.5683), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:26,233 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.1193 (max= 1.5683), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:58,149 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.1081 (max= 1.6068), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:58,150 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.1081 (max= 1.6068), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:58,150 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.1081 (max= 1.6068), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:58,150 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.1081 (max= 1.6068), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:58,150 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.1081 (max= 1.6068), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:58,150 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.1081 (max= 1.6068), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:58,150 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.1081 (max= 1.6068), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:11:58,150 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.1081 (max= 1.6068), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:12:29,990 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.0905 (max= 1.4392), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:12:29,990 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.0905 (max= 1.4392), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:12:29,990 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.0905 (max= 1.4392), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:12:29,990 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.0905 (max= 1.4392), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:12:29,990 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.0905 (max= 1.4392), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:12:29,990 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.0905 (max= 1.4392), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:12:29,990 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.0905 (max= 1.4392), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:12:29,990 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.0905 (max= 1.4392), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:01,840 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.1021 (max= 1.6817), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:01,840 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.1021 (max= 1.6817), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:01,840 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.1021 (max= 1.6817), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:01,840 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.1021 (max= 1.6817), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:01,840 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.1021 (max= 1.6817), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:01,840 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.1021 (max= 1.6817), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:01,840 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.1021 (max= 1.6817), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:01,840 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.1021 (max= 1.6817), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:33,685 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.1114 (max= 1.6070), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:33,686 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.1114 (max= 1.6070), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:33,686 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.1114 (max= 1.6070), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:33,686 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.1114 (max= 1.6070), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:33,686 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.1114 (max= 1.6070), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:33,686 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.1114 (max= 1.6070), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:33,686 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.1114 (max= 1.6070), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:13:33,686 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.1114 (max= 1.6070), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:05,528 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1144 (max= 1.5546), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:05,528 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1144 (max= 1.5546), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:05,528 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1144 (max= 1.5546), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:05,528 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1144 (max= 1.5546), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:05,528 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1144 (max= 1.5546), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:05,528 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1144 (max= 1.5546), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:05,528 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1144 (max= 1.5546), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:05,528 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1144 (max= 1.5546), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:37,428 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.1072 (max= 1.5997), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:37,428 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.1072 (max= 1.5997), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:37,428 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.1072 (max= 1.5997), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:37,428 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.1072 (max= 1.5997), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:37,428 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.1072 (max= 1.5997), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:37,428 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.1072 (max= 1.5997), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:37,428 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.1072 (max= 1.5997), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:14:37,428 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.1072 (max= 1.5997), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:09,296 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.1097 (max= 1.6590), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:09,296 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.1097 (max= 1.6590), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:09,296 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.1097 (max= 1.6590), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:09,296 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.1097 (max= 1.6590), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:09,296 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.1097 (max= 1.6590), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:09,296 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.1097 (max= 1.6590), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:09,297 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.1097 (max= 1.6590), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:09,297 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.1097 (max= 1.6590), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:41,198 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.1049 (max= 1.5419), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:41,198 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.1049 (max= 1.5419), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:41,198 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.1049 (max= 1.5419), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:41,198 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.1049 (max= 1.5419), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:41,198 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.1049 (max= 1.5419), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:41,198 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.1049 (max= 1.5419), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:41,198 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.1049 (max= 1.5419), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:15:41,198 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.1049 (max= 1.5419), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:13,046 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.1068 (max= 1.6876), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:13,046 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.1068 (max= 1.6876), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:13,046 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.1068 (max= 1.6876), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:13,046 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.1068 (max= 1.6876), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:13,046 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.1068 (max= 1.6876), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:13,046 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.1068 (max= 1.6876), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:13,046 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.1068 (max= 1.6876), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:13,046 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.1068 (max= 1.6876), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:44,915 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.0900 (max= 1.5012), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:44,915 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.0900 (max= 1.5012), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:44,915 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.0900 (max= 1.5012), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:44,915 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.0900 (max= 1.5012), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:44,915 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.0900 (max= 1.5012), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:44,915 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.0900 (max= 1.5012), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:44,915 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.0900 (max= 1.5012), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:16:44,915 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.0900 (max= 1.5012), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:16,785 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.0967 (max= 1.4921), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:16,785 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.0967 (max= 1.4921), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:16,785 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.0967 (max= 1.4921), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:16,785 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.0967 (max= 1.4921), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:16,785 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.0967 (max= 1.4921), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:16,786 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.0967 (max= 1.4921), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:16,786 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.0967 (max= 1.4921), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:16,786 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.0967 (max= 1.4921), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:48,552 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.1089 (max= 1.4608), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:48,552 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.1089 (max= 1.4608), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:48,552 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.1089 (max= 1.4608), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:48,552 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.1089 (max= 1.4608), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:48,552 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.1089 (max= 1.4608), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:48,552 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.1089 (max= 1.4608), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:48,552 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.1089 (max= 1.4608), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:17:48,552 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.1089 (max= 1.4608), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:20,405 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.0928 (max= 1.4535), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:20,405 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.0928 (max= 1.4535), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:20,405 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.0928 (max= 1.4535), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:20,405 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.0928 (max= 1.4535), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:20,405 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.0928 (max= 1.4535), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:20,405 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.0928 (max= 1.4535), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:20,405 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.0928 (max= 1.4535), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:20,405 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.0928 (max= 1.4535), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:52,340 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.1211 (max= 1.5207), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:52,340 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.1211 (max= 1.5207), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:52,340 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.1211 (max= 1.5207), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:52,340 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.1211 (max= 1.5207), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:52,340 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.1211 (max= 1.5207), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:52,340 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.1211 (max= 1.5207), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:52,340 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.1211 (max= 1.5207), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:18:52,340 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.1211 (max= 1.5207), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:24,202 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.1231 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:24,202 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.1231 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:24,202 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.1231 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:24,203 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.1231 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:24,203 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.1231 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:24,203 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.1231 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:24,203 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.1231 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:24,203 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.1231 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:56,047 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.1098 (max= 1.5726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:56,047 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.1098 (max= 1.5726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:56,047 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.1098 (max= 1.5726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:56,047 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.1098 (max= 1.5726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:56,047 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.1098 (max= 1.5726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:56,047 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.1098 (max= 1.5726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:56,047 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.1098 (max= 1.5726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:19:56,047 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.1098 (max= 1.5726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:27,919 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.0964 (max= 1.6157), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:27,919 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.0964 (max= 1.6157), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:27,919 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.0964 (max= 1.6157), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:27,919 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.0964 (max= 1.6157), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:27,919 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.0964 (max= 1.6157), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:27,919 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.0964 (max= 1.6157), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:27,919 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.0964 (max= 1.6157), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:27,919 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.0964 (max= 1.6157), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:59,772 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.0856 (max= 1.5462), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:59,772 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.0856 (max= 1.5462), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:59,772 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.0856 (max= 1.5462), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:59,772 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.0856 (max= 1.5462), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:59,772 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.0856 (max= 1.5462), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:59,772 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.0856 (max= 1.5462), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:59,772 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.0856 (max= 1.5462), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:20:59,772 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.0856 (max= 1.5462), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:21:31,649 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.1115 (max= 1.5841), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:21:31,649 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.1115 (max= 1.5841), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:21:31,649 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.1115 (max= 1.5841), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:21:31,649 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.1115 (max= 1.5841), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:21:31,649 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.1115 (max= 1.5841), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:21:31,649 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.1115 (max= 1.5841), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:21:31,650 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.1115 (max= 1.5841), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:21:31,650 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.1115 (max= 1.5841), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:03,502 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.1176 (max= 1.4978), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:03,502 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.1176 (max= 1.4978), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:03,502 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.1176 (max= 1.4978), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:03,502 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.1176 (max= 1.4978), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:03,502 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.1176 (max= 1.4978), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:03,502 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.1176 (max= 1.4978), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:03,502 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.1176 (max= 1.4978), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:03,502 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.1176 (max= 1.4978), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:35,332 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.1134 (max= 1.5583), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:35,332 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.1134 (max= 1.5583), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:35,332 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.1134 (max= 1.5583), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:35,332 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.1134 (max= 1.5583), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:35,332 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.1134 (max= 1.5583), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:35,332 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.1134 (max= 1.5583), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:35,332 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.1134 (max= 1.5583), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:22:35,332 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.1134 (max= 1.5583), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:07,183 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.1210 (max= 1.5401), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:07,183 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.1210 (max= 1.5401), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:07,184 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.1210 (max= 1.5401), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:07,184 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.1210 (max= 1.5401), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:07,184 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.1210 (max= 1.5401), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:07,184 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.1210 (max= 1.5401), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:07,184 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.1210 (max= 1.5401), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:07,184 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.1210 (max= 1.5401), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:39,007 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.1024 (max= 1.5317), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:39,007 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.1024 (max= 1.5317), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:39,007 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.1024 (max= 1.5317), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:39,007 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.1024 (max= 1.5317), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:39,007 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.1024 (max= 1.5317), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:39,007 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.1024 (max= 1.5317), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:39,008 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.1024 (max= 1.5317), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:23:39,008 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.1024 (max= 1.5317), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:10,854 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.0987 (max= 1.5516), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:10,854 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.0987 (max= 1.5516), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:10,854 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.0987 (max= 1.5516), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:10,854 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.0987 (max= 1.5516), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:10,854 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.0987 (max= 1.5516), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:10,854 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.0987 (max= 1.5516), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:10,854 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.0987 (max= 1.5516), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:10,854 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.0987 (max= 1.5516), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:42,749 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.0986 (max= 1.4699), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:42,749 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.0986 (max= 1.4699), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:42,749 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.0986 (max= 1.4699), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:42,750 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.0986 (max= 1.4699), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:42,750 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.0986 (max= 1.4699), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:42,750 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.0986 (max= 1.4699), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:42,750 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.0986 (max= 1.4699), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:24:42,750 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.0986 (max= 1.4699), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:14,654 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.1033 (max= 1.5005), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:14,654 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.1033 (max= 1.5005), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:14,654 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.1033 (max= 1.5005), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:14,654 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.1033 (max= 1.5005), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:14,655 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.1033 (max= 1.5005), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:14,655 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.1033 (max= 1.5005), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:14,655 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.1033 (max= 1.5005), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:14,655 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.1033 (max= 1.5005), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:46,464 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.0979 (max= 1.6414), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:46,464 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.0979 (max= 1.6414), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:46,464 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.0979 (max= 1.6414), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:46,464 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.0979 (max= 1.6414), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:46,464 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.0979 (max= 1.6414), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:46,464 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.0979 (max= 1.6414), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:46,464 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.0979 (max= 1.6414), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:25:46,464 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.0979 (max= 1.6414), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:18,391 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.1046 (max= 1.5472), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:18,391 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.1046 (max= 1.5472), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:18,391 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.1046 (max= 1.5472), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:18,391 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.1046 (max= 1.5472), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:18,391 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.1046 (max= 1.5472), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:18,391 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.1046 (max= 1.5472), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:18,391 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.1046 (max= 1.5472), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:18,392 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.1046 (max= 1.5472), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:50,270 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.0803 (max= 1.5587), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:50,270 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.0803 (max= 1.5587), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:50,270 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.0803 (max= 1.5587), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:50,270 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.0803 (max= 1.5587), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:50,270 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.0803 (max= 1.5587), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:50,270 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.0803 (max= 1.5587), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:50,270 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.0803 (max= 1.5587), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:26:50,270 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.0803 (max= 1.5587), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:22,135 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.1066 (max= 1.5714), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:22,135 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.1066 (max= 1.5714), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:22,135 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.1066 (max= 1.5714), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:22,135 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.1066 (max= 1.5714), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:22,135 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.1066 (max= 1.5714), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:22,135 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.1066 (max= 1.5714), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:22,135 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.1066 (max= 1.5714), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:22,135 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.1066 (max= 1.5714), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:54,006 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.0881 (max= 1.6102), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:54,006 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.0881 (max= 1.6102), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:54,006 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.0881 (max= 1.6102), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:54,006 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.0881 (max= 1.6102), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:54,006 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.0881 (max= 1.6102), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:54,006 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.0881 (max= 1.6102), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:54,006 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.0881 (max= 1.6102), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:27:54,006 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.0881 (max= 1.6102), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:25,869 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.0928 (max= 1.4532), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:25,869 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.0928 (max= 1.4532), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:25,869 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.0928 (max= 1.4532), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:25,869 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.0928 (max= 1.4532), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:25,869 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.0928 (max= 1.4532), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:25,870 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.0928 (max= 1.4532), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:25,870 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.0928 (max= 1.4532), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:25,870 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.0928 (max= 1.4532), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:57,722 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.1210 (max= 1.5232), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:57,722 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.1210 (max= 1.5232), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:57,722 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.1210 (max= 1.5232), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:57,722 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.1210 (max= 1.5232), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:57,722 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.1210 (max= 1.5232), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:57,722 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.1210 (max= 1.5232), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:57,722 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.1210 (max= 1.5232), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:28:57,723 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.1210 (max= 1.5232), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:29:29,582 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.0927 (max= 1.5756), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:29:29,582 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.0927 (max= 1.5756), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:29:29,582 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.0927 (max= 1.5756), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:29:29,582 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.0927 (max= 1.5756), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:29:29,582 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.0927 (max= 1.5756), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:29:29,582 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.0927 (max= 1.5756), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:29:29,582 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.0927 (max= 1.5756), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:29:29,582 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.0927 (max= 1.5756), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:01,506 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.0943 (max= 1.5629), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:01,506 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.0943 (max= 1.5629), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:01,506 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.0943 (max= 1.5629), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:01,506 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.0943 (max= 1.5629), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:01,506 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.0943 (max= 1.5629), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:01,506 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.0943 (max= 1.5629), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:01,507 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.0943 (max= 1.5629), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:01,507 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.0943 (max= 1.5629), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:33,377 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.0979 (max= 1.4973), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:33,377 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.0979 (max= 1.4973), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:33,377 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.0979 (max= 1.4973), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:33,377 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.0979 (max= 1.4973), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:33,377 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.0979 (max= 1.4973), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:33,377 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.0979 (max= 1.4973), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:33,377 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.0979 (max= 1.4973), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:30:33,377 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.0979 (max= 1.4973), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:05,122 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.1427 (max= 1.4908), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:05,122 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.1427 (max= 1.4908), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:05,122 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.1427 (max= 1.4908), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:05,123 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.1427 (max= 1.4908), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:05,123 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.1427 (max= 1.4908), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:05,123 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.1427 (max= 1.4908), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:05,123 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.1427 (max= 1.4908), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:05,123 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.1427 (max= 1.4908), tps=20646, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:36,945 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.0709 (max= 1.5587), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:36,945 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.0709 (max= 1.5587), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:36,945 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.0709 (max= 1.5587), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:36,945 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.0709 (max= 1.5587), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:36,945 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.0709 (max= 1.5587), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:36,945 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.0709 (max= 1.5587), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:36,945 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.0709 (max= 1.5587), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:31:36,945 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.0709 (max= 1.5587), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:08,844 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.0980 (max= 1.5308), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:08,844 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.0980 (max= 1.5308), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:08,844 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.0980 (max= 1.5308), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:08,845 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.0980 (max= 1.5308), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:08,845 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.0980 (max= 1.5308), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:08,845 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.0980 (max= 1.5308), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:08,845 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.0980 (max= 1.5308), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:08,845 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.0980 (max= 1.5308), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:40,740 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.1170 (max= 1.6448), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:40,740 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.1170 (max= 1.6448), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:40,741 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.1170 (max= 1.6448), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:40,741 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.1170 (max= 1.6448), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:40,741 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.1170 (max= 1.6448), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:40,741 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.1170 (max= 1.6448), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:40,741 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.1170 (max= 1.6448), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:32:40,741 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.1170 (max= 1.6448), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:12,523 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.0969 (max= 1.8400), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:12,523 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.0969 (max= 1.8400), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:12,524 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.0969 (max= 1.8400), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:12,524 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.0969 (max= 1.8400), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:12,524 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.0969 (max= 1.8400), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:12,524 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.0969 (max= 1.8400), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:12,524 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.0969 (max= 1.8400), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:12,524 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.0969 (max= 1.8400), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:44,357 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.0651 (max= 1.6718), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:44,357 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.0651 (max= 1.6718), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:44,357 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.0651 (max= 1.6718), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:44,357 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.0651 (max= 1.6718), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:44,358 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.0651 (max= 1.6718), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:44,358 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.0651 (max= 1.6718), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:44,358 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.0651 (max= 1.6718), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:33:44,358 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.0651 (max= 1.6718), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:16,219 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.1059 (max= 1.6172), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:16,219 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.1059 (max= 1.6172), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:16,219 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.1059 (max= 1.6172), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:16,219 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.1059 (max= 1.6172), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:16,219 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.1059 (max= 1.6172), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:16,219 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.1059 (max= 1.6172), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:16,219 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.1059 (max= 1.6172), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:16,219 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.1059 (max= 1.6172), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:48,097 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.0947 (max= 1.6494), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:48,098 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.0947 (max= 1.6494), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:48,098 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.0947 (max= 1.6494), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:48,098 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.0947 (max= 1.6494), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:48,098 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.0947 (max= 1.6494), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:48,098 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.0947 (max= 1.6494), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:48,098 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.0947 (max= 1.6494), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:34:48,098 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.0947 (max= 1.6494), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:19,956 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.0943 (max= 1.4677), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:19,956 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.0943 (max= 1.4677), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:19,956 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.0943 (max= 1.4677), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:19,957 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.0943 (max= 1.4677), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:19,957 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.0943 (max= 1.4677), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:19,957 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.0943 (max= 1.4677), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:19,957 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.0943 (max= 1.4677), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:19,957 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.0943 (max= 1.4677), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:51,767 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.1221 (max= 1.5373), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:51,767 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.1221 (max= 1.5373), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:51,767 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.1221 (max= 1.5373), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:51,767 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.1221 (max= 1.5373), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:51,767 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.1221 (max= 1.5373), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:51,767 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.1221 (max= 1.5373), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:51,767 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.1221 (max= 1.5373), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:35:51,767 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.1221 (max= 1.5373), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:23,622 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.1021 (max= 1.5896), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:23,622 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.1021 (max= 1.5896), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:23,622 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.1021 (max= 1.5896), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:23,622 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.1021 (max= 1.5896), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:23,622 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.1021 (max= 1.5896), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:23,622 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.1021 (max= 1.5896), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:23,622 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.1021 (max= 1.5896), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:23,622 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.1021 (max= 1.5896), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:55,505 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.0961 (max= 1.5528), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:55,505 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.0961 (max= 1.5528), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:55,505 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.0961 (max= 1.5528), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:55,505 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.0961 (max= 1.5528), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:55,506 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.0961 (max= 1.5528), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:55,506 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.0961 (max= 1.5528), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:55,506 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.0961 (max= 1.5528), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:36:55,506 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.0961 (max= 1.5528), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:27,318 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.0997 (max= 1.4752), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:27,318 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.0997 (max= 1.4752), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:27,318 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.0997 (max= 1.4752), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:27,318 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.0997 (max= 1.4752), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:27,318 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.0997 (max= 1.4752), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:27,318 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.0997 (max= 1.4752), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:27,318 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.0997 (max= 1.4752), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:27,319 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.0997 (max= 1.4752), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:59,118 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.1027 (max= 1.6296), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:59,118 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.1027 (max= 1.6296), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:59,118 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.1027 (max= 1.6296), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:59,118 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.1027 (max= 1.6296), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:59,119 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.1027 (max= 1.6296), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:59,119 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.1027 (max= 1.6296), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:59,119 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.1027 (max= 1.6296), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:37:59,119 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.1027 (max= 1.6296), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:38:30,970 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.0922 (max= 1.5498), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:38:30,970 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.0922 (max= 1.5498), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:38:30,970 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.0922 (max= 1.5498), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:38:30,970 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.0922 (max= 1.5498), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:38:30,970 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.0922 (max= 1.5498), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:38:30,970 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.0922 (max= 1.5498), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:38:30,970 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.0922 (max= 1.5498), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:38:30,970 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.0922 (max= 1.5498), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:02,828 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.1100 (max= 1.7712), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:02,828 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.1100 (max= 1.7712), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:02,828 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.1100 (max= 1.7712), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:02,828 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.1100 (max= 1.7712), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:02,828 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.1100 (max= 1.7712), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:02,828 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.1100 (max= 1.7712), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:02,828 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.1100 (max= 1.7712), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:02,828 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.1100 (max= 1.7712), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:24,254 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:3689696 +2025-10-25 21:39:34,658 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1033 (max= 1.5504), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:34,658 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1033 (max= 1.5504), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:34,658 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1033 (max= 1.5504), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:34,658 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1033 (max= 1.5504), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:34,658 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1033 (max= 1.5504), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:34,658 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1033 (max= 1.5504), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:34,658 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1033 (max= 1.5504), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:39:34,658 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1033 (max= 1.5504), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:06,503 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.1127 (max= 1.6206), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:06,503 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.1127 (max= 1.6206), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:06,504 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.1127 (max= 1.6206), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:06,504 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.1127 (max= 1.6206), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:06,504 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.1127 (max= 1.6206), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:06,504 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.1127 (max= 1.6206), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:06,504 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.1127 (max= 1.6206), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:06,504 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.1127 (max= 1.6206), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:38,422 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.1128 (max= 1.5480), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:38,422 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.1128 (max= 1.5480), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:38,422 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.1128 (max= 1.5480), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:38,422 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.1128 (max= 1.5480), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:38,422 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.1128 (max= 1.5480), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:38,422 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.1128 (max= 1.5480), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:38,422 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.1128 (max= 1.5480), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:40:38,423 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.1128 (max= 1.5480), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:10,316 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.0719 (max= 1.4940), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:10,316 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.0719 (max= 1.4940), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:10,316 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.0719 (max= 1.4940), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:10,316 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.0719 (max= 1.4940), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:10,316 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.0719 (max= 1.4940), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:10,316 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.0719 (max= 1.4940), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:10,316 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.0719 (max= 1.4940), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:10,316 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.0719 (max= 1.4940), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:42,156 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.0888 (max= 1.4333), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:42,156 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.0888 (max= 1.4333), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:42,156 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.0888 (max= 1.4333), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:42,156 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.0888 (max= 1.4333), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:42,156 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.0888 (max= 1.4333), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:42,156 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.0888 (max= 1.4333), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:42,156 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.0888 (max= 1.4333), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:41:42,156 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.0888 (max= 1.4333), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:14,019 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.1079 (max= 1.5513), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:14,019 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.1079 (max= 1.5513), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:14,019 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.1079 (max= 1.5513), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:14,019 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.1079 (max= 1.5513), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:14,019 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.1079 (max= 1.5513), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:14,019 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.1079 (max= 1.5513), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:14,019 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.1079 (max= 1.5513), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:14,019 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.1079 (max= 1.5513), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:45,891 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.1021 (max= 1.6969), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:45,891 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.1021 (max= 1.6969), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:45,891 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.1021 (max= 1.6969), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:45,891 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.1021 (max= 1.6969), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:45,891 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.1021 (max= 1.6969), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:45,891 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.1021 (max= 1.6969), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:45,891 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.1021 (max= 1.6969), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:42:45,891 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.1021 (max= 1.6969), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:17,794 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.1009 (max= 1.6158), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:17,794 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.1009 (max= 1.6158), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:17,794 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.1009 (max= 1.6158), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:17,794 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.1009 (max= 1.6158), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:17,794 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.1009 (max= 1.6158), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:17,794 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.1009 (max= 1.6158), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:17,794 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.1009 (max= 1.6158), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:17,794 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.1009 (max= 1.6158), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:49,711 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.1007 (max= 1.5219), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:49,711 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.1007 (max= 1.5219), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:49,712 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.1007 (max= 1.5219), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:49,712 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.1007 (max= 1.5219), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:49,712 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.1007 (max= 1.5219), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:49,712 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.1007 (max= 1.5219), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:49,712 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.1007 (max= 1.5219), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:43:49,712 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.1007 (max= 1.5219), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:21,614 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.0650 (max= 1.5398), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:21,614 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.0650 (max= 1.5398), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:21,614 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.0650 (max= 1.5398), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:21,614 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.0650 (max= 1.5398), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:21,614 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.0650 (max= 1.5398), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:21,614 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.0650 (max= 1.5398), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:21,614 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.0650 (max= 1.5398), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:21,614 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.0650 (max= 1.5398), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:53,492 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.1118 (max= 1.6874), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:53,492 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.1118 (max= 1.6874), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:53,492 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.1118 (max= 1.6874), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:53,492 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.1118 (max= 1.6874), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:53,492 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.1118 (max= 1.6874), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:53,492 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.1118 (max= 1.6874), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:53,492 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.1118 (max= 1.6874), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:44:53,492 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.1118 (max= 1.6874), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:25,311 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.1135 (max= 1.5471), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:25,311 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.1135 (max= 1.5471), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:25,311 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.1135 (max= 1.5471), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:25,311 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.1135 (max= 1.5471), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:25,311 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.1135 (max= 1.5471), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:25,311 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.1135 (max= 1.5471), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:25,311 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.1135 (max= 1.5471), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:25,311 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.1135 (max= 1.5471), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:57,101 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.0798 (max= 1.5247), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:57,102 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.0798 (max= 1.5247), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:57,102 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.0798 (max= 1.5247), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:57,102 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.0798 (max= 1.5247), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:57,102 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.0798 (max= 1.5247), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:57,102 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.0798 (max= 1.5247), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:57,102 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.0798 (max= 1.5247), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:45:57,102 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.0798 (max= 1.5247), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:46:28,962 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1091 (max= 1.6500), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:46:28,962 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1091 (max= 1.6500), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:46:28,962 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1091 (max= 1.6500), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:46:28,962 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1091 (max= 1.6500), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:46:28,962 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1091 (max= 1.6500), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:46:28,962 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1091 (max= 1.6500), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:46:28,962 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1091 (max= 1.6500), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:46:28,962 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1091 (max= 1.6500), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:00,764 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1089 (max= 1.4908), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:00,764 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1089 (max= 1.4908), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:00,764 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1089 (max= 1.4908), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:00,764 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1089 (max= 1.4908), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:00,764 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1089 (max= 1.4908), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:00,764 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1089 (max= 1.4908), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:00,764 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1089 (max= 1.4908), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:00,764 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1089 (max= 1.4908), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:32,652 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.1095 (max= 1.5008), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:32,652 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.1095 (max= 1.5008), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:32,652 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.1095 (max= 1.5008), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:32,652 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.1095 (max= 1.5008), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:32,652 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.1095 (max= 1.5008), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:32,652 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.1095 (max= 1.5008), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:32,652 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.1095 (max= 1.5008), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:47:32,652 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.1095 (max= 1.5008), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:04,511 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.1179 (max= 1.8283), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:04,511 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.1179 (max= 1.8283), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:04,511 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.1179 (max= 1.8283), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:04,511 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.1179 (max= 1.8283), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:04,511 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.1179 (max= 1.8283), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:04,511 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.1179 (max= 1.8283), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:04,511 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.1179 (max= 1.8283), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:04,511 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.1179 (max= 1.8283), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:36,286 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.0971 (max= 1.5227), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:36,286 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.0971 (max= 1.5227), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:36,286 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.0971 (max= 1.5227), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:36,286 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.0971 (max= 1.5227), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:36,286 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.0971 (max= 1.5227), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:36,286 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.0971 (max= 1.5227), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:36,286 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.0971 (max= 1.5227), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:48:36,286 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.0971 (max= 1.5227), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:08,118 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.0881 (max= 1.5253), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:08,118 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.0881 (max= 1.5253), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:08,118 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.0881 (max= 1.5253), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:08,118 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.0881 (max= 1.5253), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:08,118 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.0881 (max= 1.5253), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:08,118 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.0881 (max= 1.5253), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:08,118 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.0881 (max= 1.5253), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:08,118 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.0881 (max= 1.5253), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:39,945 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.0747 (max= 1.4676), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:39,945 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.0747 (max= 1.4676), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:39,945 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.0747 (max= 1.4676), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:39,945 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.0747 (max= 1.4676), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:39,945 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.0747 (max= 1.4676), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:39,945 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.0747 (max= 1.4676), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:39,946 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.0747 (max= 1.4676), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:49:39,946 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.0747 (max= 1.4676), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:11,775 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.0964 (max= 1.4928), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:11,776 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.0964 (max= 1.4928), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:11,776 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.0964 (max= 1.4928), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:11,776 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.0964 (max= 1.4928), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:11,776 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.0964 (max= 1.4928), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:11,776 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.0964 (max= 1.4928), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:11,776 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.0964 (max= 1.4928), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:11,776 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.0964 (max= 1.4928), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:43,643 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.0945 (max= 1.6760), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:43,643 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.0945 (max= 1.6760), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:43,643 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.0945 (max= 1.6760), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:43,643 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.0945 (max= 1.6760), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:43,643 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.0945 (max= 1.6760), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:43,643 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.0945 (max= 1.6760), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:43,643 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.0945 (max= 1.6760), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:43,643 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.0945 (max= 1.6760), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:50:50,598 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:4735268 +2025-10-25 21:51:15,505 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.1032 (max= 1.5396), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:15,505 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.1032 (max= 1.5396), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:15,505 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.1032 (max= 1.5396), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:15,505 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.1032 (max= 1.5396), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:15,505 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.1032 (max= 1.5396), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:15,505 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.1032 (max= 1.5396), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:15,505 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.1032 (max= 1.5396), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:15,505 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.1032 (max= 1.5396), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:47,358 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.0918 (max= 1.5289), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:47,358 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.0918 (max= 1.5289), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:47,358 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.0918 (max= 1.5289), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:47,359 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.0918 (max= 1.5289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:47,359 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.0918 (max= 1.5289), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:47,359 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.0918 (max= 1.5289), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:47,359 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.0918 (max= 1.5289), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:51:47,359 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.0918 (max= 1.5289), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:19,228 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.0849 (max= 1.5186), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:19,228 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.0849 (max= 1.5186), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:19,228 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.0849 (max= 1.5186), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:19,228 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.0849 (max= 1.5186), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:19,228 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.0849 (max= 1.5186), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:19,228 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.0849 (max= 1.5186), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:19,228 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.0849 (max= 1.5186), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:19,228 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.0849 (max= 1.5186), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:51,093 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.0948 (max= 1.5954), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:51,094 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.0948 (max= 1.5954), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:51,094 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.0948 (max= 1.5954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:51,094 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.0948 (max= 1.5954), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:51,094 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.0948 (max= 1.5954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:51,094 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.0948 (max= 1.5954), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:51,094 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.0948 (max= 1.5954), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:52:51,094 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.0948 (max= 1.5954), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:22,999 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.1020 (max= 1.6144), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:22,999 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.1020 (max= 1.6144), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:22,999 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.1020 (max= 1.6144), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:22,999 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.1020 (max= 1.6144), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:22,999 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.1020 (max= 1.6144), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:22,999 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.1020 (max= 1.6144), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:22,999 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.1020 (max= 1.6144), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:22,999 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.1020 (max= 1.6144), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:54,876 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.1153 (max= 1.4649), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:54,877 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.1153 (max= 1.4649), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:54,877 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.1153 (max= 1.4649), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:54,877 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.1153 (max= 1.4649), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:54,877 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.1153 (max= 1.4649), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:54,877 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.1153 (max= 1.4649), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:54,877 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.1153 (max= 1.4649), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:53:54,877 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.1153 (max= 1.4649), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:26,761 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.1093 (max= 1.5988), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:26,761 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.1093 (max= 1.5988), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:26,762 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.1093 (max= 1.5988), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:26,762 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.1093 (max= 1.5988), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:26,762 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.1093 (max= 1.5988), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:26,762 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.1093 (max= 1.5988), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:26,762 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.1093 (max= 1.5988), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:26,762 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.1093 (max= 1.5988), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:58,633 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.0854 (max= 1.4404), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:58,633 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.0854 (max= 1.4404), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:58,633 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.0854 (max= 1.4404), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:58,633 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.0854 (max= 1.4404), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:58,633 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.0854 (max= 1.4404), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:58,633 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.0854 (max= 1.4404), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:58,633 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.0854 (max= 1.4404), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:54:58,633 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.0854 (max= 1.4404), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:55:30,418 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.0807 (max= 1.5335), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:55:30,419 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.0807 (max= 1.5335), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:55:30,419 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.0807 (max= 1.5335), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:55:30,419 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.0807 (max= 1.5335), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:55:30,419 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.0807 (max= 1.5335), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:55:30,419 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.0807 (max= 1.5335), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:55:30,419 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.0807 (max= 1.5335), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:55:30,419 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.0807 (max= 1.5335), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:02,198 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1214 (max= 1.4558), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:02,198 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1214 (max= 1.4558), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:02,198 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1214 (max= 1.4558), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:02,198 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1214 (max= 1.4558), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:02,198 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1214 (max= 1.4558), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:02,198 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1214 (max= 1.4558), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:02,198 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1214 (max= 1.4558), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:02,198 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1214 (max= 1.4558), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:34,000 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.1084 (max= 1.5443), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:34,000 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.1084 (max= 1.5443), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:34,000 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.1084 (max= 1.5443), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:34,000 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.1084 (max= 1.5443), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:34,000 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.1084 (max= 1.5443), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:34,000 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.1084 (max= 1.5443), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:34,000 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.1084 (max= 1.5443), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:56:34,000 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.1084 (max= 1.5443), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:05,792 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.1086 (max= 1.4366), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:05,792 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.1086 (max= 1.4366), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:05,792 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.1086 (max= 1.4366), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:05,792 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.1086 (max= 1.4366), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:05,792 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.1086 (max= 1.4366), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:05,792 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.1086 (max= 1.4366), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:05,792 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.1086 (max= 1.4366), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:05,793 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.1086 (max= 1.4366), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:37,637 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.1063 (max= 1.5902), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:37,637 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.1063 (max= 1.5902), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:37,638 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.1063 (max= 1.5902), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:37,638 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.1063 (max= 1.5902), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:37,638 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.1063 (max= 1.5902), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:37,638 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.1063 (max= 1.5902), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:37,638 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.1063 (max= 1.5902), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:57:37,638 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.1063 (max= 1.5902), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-13000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-13000! Save time: 4.557758569717407 +2025-10-25 21:58:09,556 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.1143 (max= 1.8197), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:09,557 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-25 21:58:09,557 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:58:09,557 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.1143 (max= 1.8197), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:09,557 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.1143 (max= 1.8197), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:09,557 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.1143 (max= 1.8197), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:09,557 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-25 21:58:09,557 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-25 21:58:09,557 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.1143 (max= 1.8197), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:09,557 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:58:09,557 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:58:09,557 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-25 21:58:09,557 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:58:09,557 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-25 21:58:09,557 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:58:09,557 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.1143 (max= 1.8197), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:09,557 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.1143 (max= 1.8197), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:09,557 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-25 21:58:09,557 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-25 21:58:09,557 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:58:09,557 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:58:09,557 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.1143 (max= 1.8197), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:09,557 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-25 21:58:09,557 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 21:58:23,332 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-25 21:58:23,340 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-25 21:58:23,340 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-25 21:58:23,340 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-25 21:58:23,341 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-25 21:58:23,341 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-25 21:58:23,342 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-25 21:58:23,342 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-25 21:58:55,186 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1067 (max= 1.4926), tps=14364, mfu=29.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:55,186 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1067 (max= 1.4926), tps=14364, mfu=29.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:55,186 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1067 (max= 1.4926), tps=14364, mfu=29.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:55,186 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1067 (max= 1.4926), tps=14364, mfu=29.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:55,186 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1067 (max= 1.4926), tps=14364, mfu=29.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:55,186 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1067 (max= 1.4926), tps=14364, mfu=29.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:55,186 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1067 (max= 1.4926), tps=14364, mfu=29.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:58:55,186 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1067 (max= 1.4926), tps=14364, mfu=29.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:27,106 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.0930 (max= 1.6251), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:27,107 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.0930 (max= 1.6251), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:27,107 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.0930 (max= 1.6251), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:27,107 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.0930 (max= 1.6251), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:27,107 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.0930 (max= 1.6251), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:27,107 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.0930 (max= 1.6251), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:27,107 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.0930 (max= 1.6251), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:27,107 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.0930 (max= 1.6251), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:58,876 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.1112 (max= 1.4758), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:58,876 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.1112 (max= 1.4758), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:58,876 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.1112 (max= 1.4758), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:58,876 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.1112 (max= 1.4758), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:58,876 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.1112 (max= 1.4758), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:58,876 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.1112 (max= 1.4758), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:58,876 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.1112 (max= 1.4758), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 21:59:58,876 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.1112 (max= 1.4758), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:00:30,806 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.0942 (max= 1.5085), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:00:30,806 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.0942 (max= 1.5085), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:00:30,807 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.0942 (max= 1.5085), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:00:30,807 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.0942 (max= 1.5085), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:00:30,807 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.0942 (max= 1.5085), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:00:30,807 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.0942 (max= 1.5085), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:00:30,807 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.0942 (max= 1.5085), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:00:30,807 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.0942 (max= 1.5085), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:02,636 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.0936 (max= 1.5983), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:02,636 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.0936 (max= 1.5983), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:02,636 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.0936 (max= 1.5983), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:02,636 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.0936 (max= 1.5983), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:02,636 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.0936 (max= 1.5983), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:02,636 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.0936 (max= 1.5983), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:02,636 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.0936 (max= 1.5983), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:02,636 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.0936 (max= 1.5983), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:34,488 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.1194 (max= 1.4809), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:34,488 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.1194 (max= 1.4809), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:34,488 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.1194 (max= 1.4809), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:34,488 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.1194 (max= 1.4809), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:34,488 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.1194 (max= 1.4809), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:34,488 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.1194 (max= 1.4809), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:34,488 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.1194 (max= 1.4809), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:01:34,488 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.1194 (max= 1.4809), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:06,414 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.1049 (max= 1.7833), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:06,414 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.1049 (max= 1.7833), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:06,414 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.1049 (max= 1.7833), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:06,414 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.1049 (max= 1.7833), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:06,414 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.1049 (max= 1.7833), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:06,414 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.1049 (max= 1.7833), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:06,414 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.1049 (max= 1.7833), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:06,414 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.1049 (max= 1.7833), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:38,248 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.0896 (max= 1.6204), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:38,248 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.0896 (max= 1.6204), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:38,248 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.0896 (max= 1.6204), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:38,248 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.0896 (max= 1.6204), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:38,248 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.0896 (max= 1.6204), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:38,248 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.0896 (max= 1.6204), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:38,248 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.0896 (max= 1.6204), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:02:38,248 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.0896 (max= 1.6204), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:10,113 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.0862 (max= 1.5190), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:10,113 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.0862 (max= 1.5190), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:10,113 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.0862 (max= 1.5190), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:10,113 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.0862 (max= 1.5190), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:10,113 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.0862 (max= 1.5190), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:10,113 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.0862 (max= 1.5190), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:10,113 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.0862 (max= 1.5190), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:10,113 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.0862 (max= 1.5190), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:42,056 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.1115 (max= 1.6350), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:42,056 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.1115 (max= 1.6350), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:42,056 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.1115 (max= 1.6350), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:42,056 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.1115 (max= 1.6350), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:42,056 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.1115 (max= 1.6350), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:42,056 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.1115 (max= 1.6350), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:42,056 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.1115 (max= 1.6350), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:03:42,056 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.1115 (max= 1.6350), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:13,877 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.0923 (max= 1.4712), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:13,877 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.0923 (max= 1.4712), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:13,877 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.0923 (max= 1.4712), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:13,877 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.0923 (max= 1.4712), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:13,877 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.0923 (max= 1.4712), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:13,877 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.0923 (max= 1.4712), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:13,877 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.0923 (max= 1.4712), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:13,877 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.0923 (max= 1.4712), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:45,778 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.1210 (max= 1.5453), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:45,778 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.1210 (max= 1.5453), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:45,778 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.1210 (max= 1.5453), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:45,778 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.1210 (max= 1.5453), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:45,778 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.1210 (max= 1.5453), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:45,778 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.1210 (max= 1.5453), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:45,778 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.1210 (max= 1.5453), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:04:45,778 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.1210 (max= 1.5453), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:17,678 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.1219 (max= 1.5885), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:17,678 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.1219 (max= 1.5885), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:17,678 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.1219 (max= 1.5885), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:17,678 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.1219 (max= 1.5885), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:17,678 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.1219 (max= 1.5885), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:17,678 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.1219 (max= 1.5885), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:17,678 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.1219 (max= 1.5885), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:17,679 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.1219 (max= 1.5885), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:49,572 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.0973 (max= 1.5099), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:49,572 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.0973 (max= 1.5099), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:49,572 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.0973 (max= 1.5099), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:49,572 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.0973 (max= 1.5099), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:49,572 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.0973 (max= 1.5099), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:49,572 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.0973 (max= 1.5099), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:49,572 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.0973 (max= 1.5099), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:05:49,572 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.0973 (max= 1.5099), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:21,370 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.1014 (max= 1.5404), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:21,370 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.1014 (max= 1.5404), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:21,370 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.1014 (max= 1.5404), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:21,370 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.1014 (max= 1.5404), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:21,370 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.1014 (max= 1.5404), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:21,370 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.1014 (max= 1.5404), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:21,370 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.1014 (max= 1.5404), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:21,370 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.1014 (max= 1.5404), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:53,191 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.0868 (max= 1.5699), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:53,191 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.0868 (max= 1.5699), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:53,191 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.0868 (max= 1.5699), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:53,192 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.0868 (max= 1.5699), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:53,192 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.0868 (max= 1.5699), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:53,192 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.0868 (max= 1.5699), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:53,192 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.0868 (max= 1.5699), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:06:53,192 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.0868 (max= 1.5699), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:25,056 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.0981 (max= 1.5409), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:25,057 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.0981 (max= 1.5409), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:25,057 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.0981 (max= 1.5409), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:25,057 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.0981 (max= 1.5409), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:25,057 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.0981 (max= 1.5409), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:25,057 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.0981 (max= 1.5409), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:25,057 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.0981 (max= 1.5409), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:25,057 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.0981 (max= 1.5409), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:56,846 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.1116 (max= 1.4536), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:56,846 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.1116 (max= 1.4536), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:56,846 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.1116 (max= 1.4536), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:56,846 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.1116 (max= 1.4536), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:56,846 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.1116 (max= 1.4536), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:56,846 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.1116 (max= 1.4536), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:56,846 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.1116 (max= 1.4536), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:07:56,846 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.1116 (max= 1.4536), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:08:28,750 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.1423 (max= 1.6662), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:08:28,750 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.1423 (max= 1.6662), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:08:28,750 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.1423 (max= 1.6662), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:08:28,750 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.1423 (max= 1.6662), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:08:28,750 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.1423 (max= 1.6662), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:08:28,750 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.1423 (max= 1.6662), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:08:28,750 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.1423 (max= 1.6662), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:08:28,750 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.1423 (max= 1.6662), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:00,606 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.1064 (max= 1.4995), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:00,606 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.1064 (max= 1.4995), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:00,606 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.1064 (max= 1.4995), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:00,606 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.1064 (max= 1.4995), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:00,606 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.1064 (max= 1.4995), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:00,606 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.1064 (max= 1.4995), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:00,606 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.1064 (max= 1.4995), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:00,606 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.1064 (max= 1.4995), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:32,548 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.0982 (max= 1.5931), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:32,548 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.0982 (max= 1.5931), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:32,548 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.0982 (max= 1.5931), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:32,548 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.0982 (max= 1.5931), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:32,548 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.0982 (max= 1.5931), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:32,548 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.0982 (max= 1.5931), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:32,548 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.0982 (max= 1.5931), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:09:32,548 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.0982 (max= 1.5931), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:04,390 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.1005 (max= 1.5905), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:04,390 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.1005 (max= 1.5905), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:04,390 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.1005 (max= 1.5905), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:04,390 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.1005 (max= 1.5905), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:04,390 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.1005 (max= 1.5905), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:04,390 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.1005 (max= 1.5905), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:04,390 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.1005 (max= 1.5905), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:04,390 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.1005 (max= 1.5905), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:36,248 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.1266 (max= 1.5370), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:36,249 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.1266 (max= 1.5370), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:36,249 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.1266 (max= 1.5370), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:36,249 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.1266 (max= 1.5370), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:36,249 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.1266 (max= 1.5370), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:36,249 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.1266 (max= 1.5370), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:36,249 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.1266 (max= 1.5370), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:10:36,249 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.1266 (max= 1.5370), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:08,074 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.1116 (max= 1.4995), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:08,074 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.1116 (max= 1.4995), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:08,074 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.1116 (max= 1.4995), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:08,074 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.1116 (max= 1.4995), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:08,074 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.1116 (max= 1.4995), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:08,074 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.1116 (max= 1.4995), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:08,074 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.1116 (max= 1.4995), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:08,074 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.1116 (max= 1.4995), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:39,950 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.1301 (max= 1.7718), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:39,950 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.1301 (max= 1.7718), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:39,950 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.1301 (max= 1.7718), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:39,950 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.1301 (max= 1.7718), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:39,951 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.1301 (max= 1.7718), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:39,951 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.1301 (max= 1.7718), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:39,951 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.1301 (max= 1.7718), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:11:39,951 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.1301 (max= 1.7718), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:11,818 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.1031 (max= 1.5993), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:11,818 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.1031 (max= 1.5993), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:11,818 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.1031 (max= 1.5993), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:11,818 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.1031 (max= 1.5993), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:11,818 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.1031 (max= 1.5993), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:11,818 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.1031 (max= 1.5993), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:11,818 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.1031 (max= 1.5993), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:11,818 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.1031 (max= 1.5993), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:43,659 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.1150 (max= 1.5043), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:43,659 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.1150 (max= 1.5043), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:43,659 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.1150 (max= 1.5043), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:43,659 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.1150 (max= 1.5043), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:43,659 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.1150 (max= 1.5043), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:43,659 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.1150 (max= 1.5043), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:43,659 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.1150 (max= 1.5043), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:12:43,660 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.1150 (max= 1.5043), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:15,542 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.0857 (max= 1.5650), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:15,542 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.0857 (max= 1.5650), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:15,542 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.0857 (max= 1.5650), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:15,542 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.0857 (max= 1.5650), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:15,542 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.0857 (max= 1.5650), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:15,542 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.0857 (max= 1.5650), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:15,543 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.0857 (max= 1.5650), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:15,543 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.0857 (max= 1.5650), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:47,426 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.1037 (max= 1.5440), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:47,426 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.1037 (max= 1.5440), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:47,426 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.1037 (max= 1.5440), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:47,426 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.1037 (max= 1.5440), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:47,426 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.1037 (max= 1.5440), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:47,426 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.1037 (max= 1.5440), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:47,427 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.1037 (max= 1.5440), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:13:47,427 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.1037 (max= 1.5440), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:19,298 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.1044 (max= 1.4814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:19,298 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.1044 (max= 1.4814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:19,298 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.1044 (max= 1.4814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:19,298 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.1044 (max= 1.4814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:19,298 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.1044 (max= 1.4814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:19,298 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.1044 (max= 1.4814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:19,298 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.1044 (max= 1.4814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:19,298 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.1044 (max= 1.4814), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:51,147 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.1000 (max= 1.5050), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:51,147 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.1000 (max= 1.5050), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:51,147 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.1000 (max= 1.5050), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:51,147 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.1000 (max= 1.5050), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:51,147 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.1000 (max= 1.5050), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:51,147 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.1000 (max= 1.5050), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:51,147 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.1000 (max= 1.5050), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:14:51,147 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.1000 (max= 1.5050), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:09,360 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:224572 +2025-10-25 22:15:22,940 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.1004 (max= 1.6382), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:22,940 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.1004 (max= 1.6382), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:22,940 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.1004 (max= 1.6382), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:22,940 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.1004 (max= 1.6382), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:22,940 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.1004 (max= 1.6382), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:22,940 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.1004 (max= 1.6382), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:22,940 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.1004 (max= 1.6382), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:22,940 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.1004 (max= 1.6382), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:54,785 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.1028 (max= 1.5825), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:54,785 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.1028 (max= 1.5825), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:54,785 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.1028 (max= 1.5825), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:54,785 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.1028 (max= 1.5825), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:54,785 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.1028 (max= 1.5825), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:54,785 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.1028 (max= 1.5825), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:54,785 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.1028 (max= 1.5825), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:15:54,785 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.1028 (max= 1.5825), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:26,674 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.1019 (max= 1.5955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:26,674 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.1019 (max= 1.5955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:26,674 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.1019 (max= 1.5955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:26,674 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.1019 (max= 1.5955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:26,674 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.1019 (max= 1.5955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:26,674 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.1019 (max= 1.5955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:26,674 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.1019 (max= 1.5955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:26,674 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.1019 (max= 1.5955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:58,515 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.0926 (max= 1.4596), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:58,515 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.0926 (max= 1.4596), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:58,515 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.0926 (max= 1.4596), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:58,515 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.0926 (max= 1.4596), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:58,516 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.0926 (max= 1.4596), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:58,516 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.0926 (max= 1.4596), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:58,516 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.0926 (max= 1.4596), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:16:58,516 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.0926 (max= 1.4596), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:17:30,432 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.1037 (max= 1.4657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:17:30,432 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.1037 (max= 1.4657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:17:30,432 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.1037 (max= 1.4657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:17:30,432 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.1037 (max= 1.4657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:17:30,432 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.1037 (max= 1.4657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:17:30,432 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.1037 (max= 1.4657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:17:30,432 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.1037 (max= 1.4657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:17:30,432 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.1037 (max= 1.4657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:02,254 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.1176 (max= 1.5707), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:02,254 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.1176 (max= 1.5707), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:02,254 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.1176 (max= 1.5707), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:02,254 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.1176 (max= 1.5707), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:02,254 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.1176 (max= 1.5707), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:02,254 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.1176 (max= 1.5707), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:02,254 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.1176 (max= 1.5707), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:02,254 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.1176 (max= 1.5707), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:34,094 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.1168 (max= 1.5395), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:34,094 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.1168 (max= 1.5395), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:34,094 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.1168 (max= 1.5395), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:34,094 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.1168 (max= 1.5395), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:34,094 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.1168 (max= 1.5395), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:34,094 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.1168 (max= 1.5395), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:34,094 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.1168 (max= 1.5395), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:18:34,094 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.1168 (max= 1.5395), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:05,962 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.0816 (max= 1.5237), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:05,962 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.0816 (max= 1.5237), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:05,962 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.0816 (max= 1.5237), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:05,962 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.0816 (max= 1.5237), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:05,962 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.0816 (max= 1.5237), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:05,962 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.0816 (max= 1.5237), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:05,962 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.0816 (max= 1.5237), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:05,962 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.0816 (max= 1.5237), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:37,858 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.1305 (max= 1.5229), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:37,858 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.1305 (max= 1.5229), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:37,858 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.1305 (max= 1.5229), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:37,858 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.1305 (max= 1.5229), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:37,858 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.1305 (max= 1.5229), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:37,858 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.1305 (max= 1.5229), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:37,858 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.1305 (max= 1.5229), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:19:37,858 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.1305 (max= 1.5229), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:09,830 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.1033 (max= 1.7155), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:09,830 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.1033 (max= 1.7155), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:09,830 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.1033 (max= 1.7155), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:09,830 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.1033 (max= 1.7155), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:09,830 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.1033 (max= 1.7155), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:09,830 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.1033 (max= 1.7155), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:09,830 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.1033 (max= 1.7155), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:09,830 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.1033 (max= 1.7155), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:41,683 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.0718 (max= 1.5539), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:41,683 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.0718 (max= 1.5539), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:41,683 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.0718 (max= 1.5539), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:41,683 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.0718 (max= 1.5539), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:41,683 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.0718 (max= 1.5539), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:41,684 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.0718 (max= 1.5539), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:41,684 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.0718 (max= 1.5539), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:20:41,684 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.0718 (max= 1.5539), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:13,567 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.1084 (max= 1.5235), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:13,567 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.1084 (max= 1.5235), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:13,567 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.1084 (max= 1.5235), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:13,567 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.1084 (max= 1.5235), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:13,568 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.1084 (max= 1.5235), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:13,568 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.1084 (max= 1.5235), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:13,568 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.1084 (max= 1.5235), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:13,568 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.1084 (max= 1.5235), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:45,433 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.1054 (max= 1.5227), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:45,433 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.1054 (max= 1.5227), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:45,434 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.1054 (max= 1.5227), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:45,434 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.1054 (max= 1.5227), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:45,434 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.1054 (max= 1.5227), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:45,434 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.1054 (max= 1.5227), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:45,434 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.1054 (max= 1.5227), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:21:45,434 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.1054 (max= 1.5227), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:17,329 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.1312 (max= 1.5444), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:17,329 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.1312 (max= 1.5444), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:17,329 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.1312 (max= 1.5444), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:17,329 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.1312 (max= 1.5444), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:17,329 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.1312 (max= 1.5444), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:17,329 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.1312 (max= 1.5444), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:17,329 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.1312 (max= 1.5444), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:17,329 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.1312 (max= 1.5444), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:49,207 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.1137 (max= 1.5617), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:49,207 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.1137 (max= 1.5617), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:49,208 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.1137 (max= 1.5617), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:49,208 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.1137 (max= 1.5617), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:49,208 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.1137 (max= 1.5617), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:49,208 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.1137 (max= 1.5617), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:49,208 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.1137 (max= 1.5617), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:22:49,208 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.1137 (max= 1.5617), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:21,039 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.0903 (max= 1.6346), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:21,039 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.0903 (max= 1.6346), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:21,039 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.0903 (max= 1.6346), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:21,039 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.0903 (max= 1.6346), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:21,039 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.0903 (max= 1.6346), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:21,039 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.0903 (max= 1.6346), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:21,039 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.0903 (max= 1.6346), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:21,039 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.0903 (max= 1.6346), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:52,943 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.1269 (max= 1.5357), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:52,943 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.1269 (max= 1.5357), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:52,944 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.1269 (max= 1.5357), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:52,944 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.1269 (max= 1.5357), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:52,944 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.1269 (max= 1.5357), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:52,944 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.1269 (max= 1.5357), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:52,944 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.1269 (max= 1.5357), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:23:52,944 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.1269 (max= 1.5357), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:24,750 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.0847 (max= 1.5465), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:24,750 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.0847 (max= 1.5465), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:24,750 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.0847 (max= 1.5465), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:24,750 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.0847 (max= 1.5465), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:24,750 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.0847 (max= 1.5465), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:24,750 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.0847 (max= 1.5465), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:24,750 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.0847 (max= 1.5465), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:24,750 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.0847 (max= 1.5465), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:56,670 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.1065 (max= 1.5873), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:56,670 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.1065 (max= 1.5873), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:56,670 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.1065 (max= 1.5873), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:56,670 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.1065 (max= 1.5873), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:56,670 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.1065 (max= 1.5873), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:56,670 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.1065 (max= 1.5873), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:56,670 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.1065 (max= 1.5873), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:24:56,670 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.1065 (max= 1.5873), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:25:28,460 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.1294 (max= 1.6306), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:25:28,460 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.1294 (max= 1.6306), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:25:28,460 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.1294 (max= 1.6306), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:25:28,460 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.1294 (max= 1.6306), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:25:28,460 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.1294 (max= 1.6306), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:25:28,460 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.1294 (max= 1.6306), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:25:28,460 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.1294 (max= 1.6306), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:25:28,460 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.1294 (max= 1.6306), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:00,305 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.0975 (max= 1.5340), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:00,305 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.0975 (max= 1.5340), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:00,305 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.0975 (max= 1.5340), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:00,305 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.0975 (max= 1.5340), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:00,305 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.0975 (max= 1.5340), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:00,305 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.0975 (max= 1.5340), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:00,306 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.0975 (max= 1.5340), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:00,306 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.0975 (max= 1.5340), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:32,173 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.0991 (max= 1.5939), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:32,173 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.0991 (max= 1.5939), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:32,173 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.0991 (max= 1.5939), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:32,173 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.0991 (max= 1.5939), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:32,173 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.0991 (max= 1.5939), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:32,173 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.0991 (max= 1.5939), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:32,173 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.0991 (max= 1.5939), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:26:32,173 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.0991 (max= 1.5939), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:04,048 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.1006 (max= 1.5545), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:04,048 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.1006 (max= 1.5545), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:04,048 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.1006 (max= 1.5545), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:04,048 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.1006 (max= 1.5545), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:04,048 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.1006 (max= 1.5545), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:04,048 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.1006 (max= 1.5545), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:04,048 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.1006 (max= 1.5545), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:04,048 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.1006 (max= 1.5545), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:35,844 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.1162 (max= 1.5033), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:35,844 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.1162 (max= 1.5033), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:35,844 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.1162 (max= 1.5033), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:35,844 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.1162 (max= 1.5033), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:35,844 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.1162 (max= 1.5033), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:35,844 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.1162 (max= 1.5033), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:35,844 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.1162 (max= 1.5033), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:27:35,844 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.1162 (max= 1.5033), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:07,761 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.0974 (max= 1.6451), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:07,761 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.0974 (max= 1.6451), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:07,761 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.0974 (max= 1.6451), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:07,761 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.0974 (max= 1.6451), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:07,761 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.0974 (max= 1.6451), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:07,761 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.0974 (max= 1.6451), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:07,761 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.0974 (max= 1.6451), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:07,762 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.0974 (max= 1.6451), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:39,575 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.0980 (max= 1.6402), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:39,575 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.0980 (max= 1.6402), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:39,575 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.0980 (max= 1.6402), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:39,575 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.0980 (max= 1.6402), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:39,575 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.0980 (max= 1.6402), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:39,575 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.0980 (max= 1.6402), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:39,576 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.0980 (max= 1.6402), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:28:39,576 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.0980 (max= 1.6402), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:07,363 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:5003341 +2025-10-25 22:29:11,372 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.1085 (max= 1.6427), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:11,372 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.1085 (max= 1.6427), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:11,372 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.1085 (max= 1.6427), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:11,372 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.1085 (max= 1.6427), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:11,372 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.1085 (max= 1.6427), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:11,372 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.1085 (max= 1.6427), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:11,372 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.1085 (max= 1.6427), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:11,372 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.1085 (max= 1.6427), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:43,237 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.1235 (max= 1.8253), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:43,237 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.1235 (max= 1.8253), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:43,237 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.1235 (max= 1.8253), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:43,237 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.1235 (max= 1.8253), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:43,237 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.1235 (max= 1.8253), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:43,237 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.1235 (max= 1.8253), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:43,237 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.1235 (max= 1.8253), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:29:43,238 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.1235 (max= 1.8253), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:15,024 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.0989 (max= 1.5509), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:15,024 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.0989 (max= 1.5509), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:15,024 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.0989 (max= 1.5509), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:15,024 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.0989 (max= 1.5509), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:15,024 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.0989 (max= 1.5509), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:15,024 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.0989 (max= 1.5509), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:15,024 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.0989 (max= 1.5509), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:15,024 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.0989 (max= 1.5509), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:46,972 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.1023 (max= 1.5174), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:46,973 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.1023 (max= 1.5174), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:46,973 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.1023 (max= 1.5174), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:46,973 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.1023 (max= 1.5174), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:46,973 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.1023 (max= 1.5174), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:46,973 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.1023 (max= 1.5174), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:46,973 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.1023 (max= 1.5174), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:30:46,973 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.1023 (max= 1.5174), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:18,772 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.1144 (max= 1.4809), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:18,772 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.1144 (max= 1.4809), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:18,772 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.1144 (max= 1.4809), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:18,772 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.1144 (max= 1.4809), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:18,772 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.1144 (max= 1.4809), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:18,772 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.1144 (max= 1.4809), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:18,772 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.1144 (max= 1.4809), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:18,772 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.1144 (max= 1.4809), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:50,682 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.1049 (max= 1.4788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:50,682 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.1049 (max= 1.4788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:50,682 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.1049 (max= 1.4788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:50,682 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.1049 (max= 1.4788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:50,682 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.1049 (max= 1.4788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:50,682 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.1049 (max= 1.4788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:50,682 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.1049 (max= 1.4788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:31:50,682 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.1049 (max= 1.4788), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:22,480 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.1283 (max= 1.4936), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:22,480 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.1283 (max= 1.4936), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:22,481 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.1283 (max= 1.4936), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:22,481 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.1283 (max= 1.4936), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:22,481 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.1283 (max= 1.4936), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:22,481 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.1283 (max= 1.4936), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:22,481 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.1283 (max= 1.4936), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:22,481 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.1283 (max= 1.4936), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:54,380 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.1199 (max= 1.6394), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:54,380 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.1199 (max= 1.6394), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:54,381 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.1199 (max= 1.6394), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:54,381 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.1199 (max= 1.6394), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:54,381 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.1199 (max= 1.6394), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:54,381 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.1199 (max= 1.6394), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:54,381 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.1199 (max= 1.6394), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:32:54,381 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.1199 (max= 1.6394), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:26,201 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.0868 (max= 1.6515), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:26,201 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.0868 (max= 1.6515), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:26,202 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.0868 (max= 1.6515), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:26,202 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.0868 (max= 1.6515), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:26,202 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.0868 (max= 1.6515), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:26,202 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.0868 (max= 1.6515), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:26,202 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.0868 (max= 1.6515), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:26,202 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.0868 (max= 1.6515), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:58,021 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.1033 (max= 1.5753), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:58,021 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.1033 (max= 1.5753), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:58,021 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.1033 (max= 1.5753), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:58,021 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.1033 (max= 1.5753), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:58,021 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.1033 (max= 1.5753), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:58,021 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.1033 (max= 1.5753), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:58,021 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.1033 (max= 1.5753), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:33:58,021 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.1033 (max= 1.5753), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:34:29,906 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.0898 (max= 1.5319), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:34:29,906 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.0898 (max= 1.5319), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:34:29,906 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.0898 (max= 1.5319), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:34:29,906 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.0898 (max= 1.5319), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:34:29,906 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.0898 (max= 1.5319), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:34:29,906 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.0898 (max= 1.5319), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:34:29,906 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.0898 (max= 1.5319), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:34:29,906 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.0898 (max= 1.5319), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:01,779 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.1054 (max= 1.4380), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:01,779 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.1054 (max= 1.4380), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:01,779 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.1054 (max= 1.4380), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:01,780 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.1054 (max= 1.4380), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:01,780 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.1054 (max= 1.4380), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:01,780 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.1054 (max= 1.4380), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:01,780 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.1054 (max= 1.4380), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:01,780 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.1054 (max= 1.4380), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:33,663 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.1120 (max= 1.4893), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:33,663 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.1120 (max= 1.4893), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:33,663 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.1120 (max= 1.4893), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:33,663 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.1120 (max= 1.4893), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:33,663 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.1120 (max= 1.4893), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:33,663 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.1120 (max= 1.4893), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:33,663 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.1120 (max= 1.4893), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:35:33,663 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.1120 (max= 1.4893), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:05,451 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.1161 (max= 1.5688), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:05,451 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.1161 (max= 1.5688), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:05,451 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.1161 (max= 1.5688), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:05,451 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.1161 (max= 1.5688), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:05,451 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.1161 (max= 1.5688), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:05,451 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.1161 (max= 1.5688), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:05,451 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.1161 (max= 1.5688), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:05,451 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.1161 (max= 1.5688), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:37,323 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.1089 (max= 1.6351), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:37,323 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.1089 (max= 1.6351), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:37,323 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.1089 (max= 1.6351), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:37,324 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.1089 (max= 1.6351), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:37,324 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.1089 (max= 1.6351), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:37,324 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.1089 (max= 1.6351), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:37,324 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.1089 (max= 1.6351), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:36:37,324 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.1089 (max= 1.6351), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:09,246 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.1172 (max= 1.4918), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:09,246 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.1172 (max= 1.4918), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:09,246 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.1172 (max= 1.4918), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:09,246 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.1172 (max= 1.4918), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:09,246 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.1172 (max= 1.4918), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:09,246 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.1172 (max= 1.4918), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:09,246 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.1172 (max= 1.4918), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:09,246 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.1172 (max= 1.4918), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:41,223 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.1058 (max= 1.6293), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:41,223 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.1058 (max= 1.6293), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:41,223 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.1058 (max= 1.6293), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:41,223 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.1058 (max= 1.6293), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:41,223 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.1058 (max= 1.6293), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:41,224 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.1058 (max= 1.6293), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:41,224 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.1058 (max= 1.6293), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:37:41,224 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.1058 (max= 1.6293), tps=20497, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:13,050 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.0972 (max= 1.5334), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:13,050 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.0972 (max= 1.5334), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:13,050 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.0972 (max= 1.5334), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:13,050 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.0972 (max= 1.5334), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:13,050 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.0972 (max= 1.5334), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:13,050 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.0972 (max= 1.5334), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:13,050 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.0972 (max= 1.5334), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:13,050 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.0972 (max= 1.5334), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:44,925 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.1051 (max= 1.6287), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:44,925 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.1051 (max= 1.6287), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:44,925 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.1051 (max= 1.6287), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:44,925 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.1051 (max= 1.6287), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:44,925 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.1051 (max= 1.6287), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:44,925 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.1051 (max= 1.6287), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:44,925 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.1051 (max= 1.6287), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:38:44,925 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.1051 (max= 1.6287), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:16,777 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.1165 (max= 1.5603), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:16,777 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.1165 (max= 1.5603), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:16,777 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.1165 (max= 1.5603), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:16,777 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.1165 (max= 1.5603), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:16,777 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.1165 (max= 1.5603), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:16,777 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.1165 (max= 1.5603), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:16,777 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.1165 (max= 1.5603), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:16,777 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.1165 (max= 1.5603), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:48,587 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.0942 (max= 1.5458), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:48,587 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.0942 (max= 1.5458), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:48,587 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.0942 (max= 1.5458), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:48,587 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.0942 (max= 1.5458), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:48,587 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.0942 (max= 1.5458), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:48,587 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.0942 (max= 1.5458), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:48,587 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.0942 (max= 1.5458), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:39:48,587 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.0942 (max= 1.5458), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:20,422 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.1138 (max= 1.6329), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:20,423 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.1138 (max= 1.6329), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:20,423 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.1138 (max= 1.6329), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:20,423 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.1138 (max= 1.6329), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:20,423 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.1138 (max= 1.6329), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:20,423 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.1138 (max= 1.6329), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:20,423 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.1138 (max= 1.6329), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:20,423 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.1138 (max= 1.6329), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:52,262 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.1345 (max= 1.5020), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:52,262 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.1345 (max= 1.5020), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:52,262 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.1345 (max= 1.5020), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:52,262 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.1345 (max= 1.5020), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:52,262 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.1345 (max= 1.5020), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:52,262 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.1345 (max= 1.5020), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:52,262 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.1345 (max= 1.5020), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:40:52,263 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.1345 (max= 1.5020), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:24,045 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.1283 (max= 1.6248), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:24,045 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.1283 (max= 1.6248), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:24,045 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.1283 (max= 1.6248), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:24,045 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.1283 (max= 1.6248), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:24,045 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.1283 (max= 1.6248), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:24,045 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.1283 (max= 1.6248), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:24,045 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.1283 (max= 1.6248), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:24,045 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.1283 (max= 1.6248), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:55,911 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.1079 (max= 1.4957), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:55,911 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.1079 (max= 1.4957), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:55,911 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.1079 (max= 1.4957), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:55,911 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.1079 (max= 1.4957), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:55,911 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.1079 (max= 1.4957), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:55,911 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.1079 (max= 1.4957), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:55,911 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.1079 (max= 1.4957), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:41:55,911 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.1079 (max= 1.4957), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:27,740 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.1261 (max= 1.6236), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:27,740 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.1261 (max= 1.6236), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:27,740 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.1261 (max= 1.6236), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:27,740 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.1261 (max= 1.6236), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:27,740 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.1261 (max= 1.6236), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:27,740 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.1261 (max= 1.6236), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:27,740 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.1261 (max= 1.6236), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:27,740 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.1261 (max= 1.6236), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:59,673 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.1121 (max= 1.4895), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:59,673 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.1121 (max= 1.4895), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:59,673 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.1121 (max= 1.4895), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:59,673 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.1121 (max= 1.4895), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:59,673 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.1121 (max= 1.4895), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:59,673 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.1121 (max= 1.4895), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:59,673 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.1121 (max= 1.4895), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:42:59,673 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.1121 (max= 1.4895), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:43:31,509 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.1016 (max= 1.4478), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:43:31,509 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.1016 (max= 1.4478), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:43:31,509 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.1016 (max= 1.4478), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:43:31,509 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.1016 (max= 1.4478), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:43:31,509 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.1016 (max= 1.4478), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:43:31,509 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.1016 (max= 1.4478), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:43:31,509 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.1016 (max= 1.4478), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:43:31,509 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.1016 (max= 1.4478), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:03,470 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.1091 (max= 1.5969), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:03,470 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.1091 (max= 1.5969), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:03,470 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.1091 (max= 1.5969), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:03,470 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.1091 (max= 1.5969), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:03,470 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.1091 (max= 1.5969), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:03,470 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.1091 (max= 1.5969), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:03,470 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.1091 (max= 1.5969), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:03,470 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.1091 (max= 1.5969), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:35,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.1101 (max= 1.5481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:35,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.1101 (max= 1.5481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:35,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.1101 (max= 1.5481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:35,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.1101 (max= 1.5481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:35,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.1101 (max= 1.5481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:35,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.1101 (max= 1.5481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:35,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.1101 (max= 1.5481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:44:35,305 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.1101 (max= 1.5481), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:07,108 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.1327 (max= 1.5086), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:07,108 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.1327 (max= 1.5086), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:07,108 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.1327 (max= 1.5086), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:07,108 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.1327 (max= 1.5086), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:07,108 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.1327 (max= 1.5086), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:07,108 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.1327 (max= 1.5086), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:07,108 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.1327 (max= 1.5086), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:07,108 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.1327 (max= 1.5086), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:39,039 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.1163 (max= 1.5411), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:39,040 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.1163 (max= 1.5411), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:39,040 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.1163 (max= 1.5411), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:39,040 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.1163 (max= 1.5411), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:39,040 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.1163 (max= 1.5411), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:39,040 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.1163 (max= 1.5411), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:39,040 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.1163 (max= 1.5411), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:45:39,040 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.1163 (max= 1.5411), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:10,881 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.1234 (max= 1.5361), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:10,881 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.1234 (max= 1.5361), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:10,881 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.1234 (max= 1.5361), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:10,881 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.1234 (max= 1.5361), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:10,881 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.1234 (max= 1.5361), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:10,881 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.1234 (max= 1.5361), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:10,881 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.1234 (max= 1.5361), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:10,881 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.1234 (max= 1.5361), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:42,732 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.1067 (max= 1.5391), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:42,732 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.1067 (max= 1.5391), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:42,732 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.1067 (max= 1.5391), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:42,732 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.1067 (max= 1.5391), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:42,732 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.1067 (max= 1.5391), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:42,732 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.1067 (max= 1.5391), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:42,732 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.1067 (max= 1.5391), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:46:42,732 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.1067 (max= 1.5391), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:14,609 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.1151 (max= 1.6032), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:14,610 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.1151 (max= 1.6032), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:14,610 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.1151 (max= 1.6032), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:14,610 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.1151 (max= 1.6032), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:14,610 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.1151 (max= 1.6032), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:14,610 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.1151 (max= 1.6032), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:14,610 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.1151 (max= 1.6032), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:14,610 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.1151 (max= 1.6032), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:46,440 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.1256 (max= 1.5278), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:46,440 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.1256 (max= 1.5278), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:46,440 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.1256 (max= 1.5278), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:46,440 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.1256 (max= 1.5278), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:46,440 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.1256 (max= 1.5278), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:46,440 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.1256 (max= 1.5278), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:46,440 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.1256 (max= 1.5278), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:47:46,440 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.1256 (max= 1.5278), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:18,274 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.1170 (max= 1.5005), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:18,274 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.1170 (max= 1.5005), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:18,274 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.1170 (max= 1.5005), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:18,274 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.1170 (max= 1.5005), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:18,274 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.1170 (max= 1.5005), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:18,274 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.1170 (max= 1.5005), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:18,274 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.1170 (max= 1.5005), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:18,274 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.1170 (max= 1.5005), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:50,129 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.1167 (max= 1.6170), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:50,129 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.1167 (max= 1.6170), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:50,129 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.1167 (max= 1.6170), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:50,130 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.1167 (max= 1.6170), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:50,130 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.1167 (max= 1.6170), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:50,130 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.1167 (max= 1.6170), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:50,130 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.1167 (max= 1.6170), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:48:50,130 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.1167 (max= 1.6170), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:22,074 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.1181 (max= 1.5074), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:22,075 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.1181 (max= 1.5074), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:22,075 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.1181 (max= 1.5074), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:22,075 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.1181 (max= 1.5074), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:22,075 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.1181 (max= 1.5074), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:22,075 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.1181 (max= 1.5074), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:22,075 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.1181 (max= 1.5074), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:22,075 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.1181 (max= 1.5074), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:53,914 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.0876 (max= 1.4927), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:53,914 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.0876 (max= 1.4927), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:53,914 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.0876 (max= 1.4927), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:53,914 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.0876 (max= 1.4927), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:53,914 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.0876 (max= 1.4927), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:53,914 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.0876 (max= 1.4927), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:53,914 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.0876 (max= 1.4927), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:49:53,914 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.0876 (max= 1.4927), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:25,759 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.1128 (max= 1.6217), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:25,759 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.1128 (max= 1.6217), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:25,759 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.1128 (max= 1.6217), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:25,759 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.1128 (max= 1.6217), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:25,759 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.1128 (max= 1.6217), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:25,759 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.1128 (max= 1.6217), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:25,759 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.1128 (max= 1.6217), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:25,759 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.1128 (max= 1.6217), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:57,574 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.1249 (max= 1.6106), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:57,574 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.1249 (max= 1.6106), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:57,574 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.1249 (max= 1.6106), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:57,574 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.1249 (max= 1.6106), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:57,574 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.1249 (max= 1.6106), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:57,574 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.1249 (max= 1.6106), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:57,574 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.1249 (max= 1.6106), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:50:57,574 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.1249 (max= 1.6106), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-14000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-14000! Save time: 4.5226850509643555 +2025-10-25 22:51:29,446 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.0889 (max= 1.4772), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:51:29,446 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-25 22:51:29,446 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 22:51:29,446 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.0889 (max= 1.4772), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:51:29,446 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.0889 (max= 1.4772), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:51:29,446 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.0889 (max= 1.4772), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:51:29,446 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.0889 (max= 1.4772), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:51:29,446 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-25 22:51:29,446 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 22:51:29,446 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.0889 (max= 1.4772), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:51:29,446 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-25 22:51:29,446 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-25 22:51:29,446 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.0889 (max= 1.4772), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:51:29,446 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 22:51:29,446 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 22:51:29,446 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-25 22:51:29,446 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 22:51:29,446 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-25 22:51:29,446 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 22:51:29,446 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-25 22:51:29,446 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 22:51:29,446 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.0889 (max= 1.4772), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:51:29,446 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-25 22:51:29,446 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 22:51:46,169 - root - INFO - Finished saving the checkpoint in 16.72 seconds +2025-10-25 22:51:46,177 - root - INFO - Finished saving the checkpoint in 16.73 seconds +2025-10-25 22:51:46,177 - root - INFO - Finished saving the checkpoint in 16.73 seconds +2025-10-25 22:51:46,178 - root - INFO - Finished saving the checkpoint in 16.73 seconds +2025-10-25 22:51:46,178 - root - INFO - Finished saving the checkpoint in 16.73 seconds +2025-10-25 22:51:46,179 - root - INFO - Finished saving the checkpoint in 16.73 seconds +2025-10-25 22:51:46,179 - root - INFO - Finished saving the checkpoint in 16.73 seconds +2025-10-25 22:51:46,179 - root - INFO - Finished saving the checkpoint in 16.73 seconds +2025-10-25 22:52:18,017 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.1107 (max= 1.7805), tps=13494, mfu=28.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:18,018 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.1107 (max= 1.7805), tps=13494, mfu=28.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:18,018 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.1107 (max= 1.7805), tps=13494, mfu=28.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:18,018 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.1107 (max= 1.7805), tps=13494, mfu=28.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:18,018 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.1107 (max= 1.7805), tps=13494, mfu=28.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:18,018 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.1107 (max= 1.7805), tps=13494, mfu=28.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:18,018 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.1107 (max= 1.7805), tps=13494, mfu=28.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:18,018 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.1107 (max= 1.7805), tps=13494, mfu=28.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:49,861 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.1169 (max= 1.8225), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:49,861 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.1169 (max= 1.8225), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:49,861 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.1169 (max= 1.8225), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:49,861 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.1169 (max= 1.8225), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:49,861 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.1169 (max= 1.8225), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:49,861 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.1169 (max= 1.8225), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:49,861 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.1169 (max= 1.8225), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:52:49,861 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.1169 (max= 1.8225), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:21,751 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.1137 (max= 1.8408), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:21,751 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.1137 (max= 1.8408), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:21,751 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.1137 (max= 1.8408), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:21,751 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.1137 (max= 1.8408), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:21,752 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.1137 (max= 1.8408), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:21,752 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.1137 (max= 1.8408), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:21,752 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.1137 (max= 1.8408), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:21,752 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.1137 (max= 1.8408), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:53,622 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.1276 (max= 1.6254), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:53,622 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.1276 (max= 1.6254), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:53,622 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.1276 (max= 1.6254), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:53,623 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.1276 (max= 1.6254), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:53,623 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.1276 (max= 1.6254), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:53,623 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.1276 (max= 1.6254), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:53,623 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.1276 (max= 1.6254), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:53:53,623 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.1276 (max= 1.6254), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:25,462 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.1274 (max= 1.7276), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:25,462 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.1274 (max= 1.7276), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:25,462 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.1274 (max= 1.7276), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:25,462 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.1274 (max= 1.7276), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:25,462 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.1274 (max= 1.7276), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:25,462 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.1274 (max= 1.7276), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:25,462 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.1274 (max= 1.7276), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:25,462 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.1274 (max= 1.7276), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:57,328 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.1577 (max= 1.7462), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:57,328 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.1577 (max= 1.7462), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:57,328 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.1577 (max= 1.7462), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:57,328 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.1577 (max= 1.7462), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:57,328 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.1577 (max= 1.7462), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:57,328 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.1577 (max= 1.7462), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:57,328 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.1577 (max= 1.7462), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:54:57,328 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.1577 (max= 1.7462), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:55:29,252 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.1375 (max= 1.8437), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:55:29,253 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.1375 (max= 1.8437), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:55:29,253 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.1375 (max= 1.8437), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:55:29,253 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.1375 (max= 1.8437), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:55:29,253 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.1375 (max= 1.8437), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:55:29,253 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.1375 (max= 1.8437), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:55:29,253 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.1375 (max= 1.8437), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:55:29,253 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.1375 (max= 1.8437), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:01,169 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.1499 (max= 1.9861), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:01,169 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.1499 (max= 1.9861), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:01,169 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.1499 (max= 1.9861), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:01,169 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.1499 (max= 1.9861), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:01,169 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.1499 (max= 1.9861), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:01,169 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.1499 (max= 1.9861), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:01,169 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.1499 (max= 1.9861), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:01,169 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.1499 (max= 1.9861), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:33,062 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.1282 (max= 1.7847), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:33,062 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.1282 (max= 1.7847), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:33,062 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.1282 (max= 1.7847), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:33,062 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.1282 (max= 1.7847), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:33,062 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.1282 (max= 1.7847), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:33,062 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.1282 (max= 1.7847), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:33,062 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.1282 (max= 1.7847), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:56:33,062 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.1282 (max= 1.7847), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:04,918 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.1630 (max= 2.2589), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:04,918 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.1630 (max= 2.2589), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:04,918 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.1630 (max= 2.2589), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:04,918 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.1630 (max= 2.2589), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:04,918 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.1630 (max= 2.2589), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:04,918 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.1630 (max= 2.2589), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:04,918 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.1630 (max= 2.2589), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:04,919 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.1630 (max= 2.2589), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:36,706 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.1412 (max= 1.9739), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:36,707 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.1412 (max= 1.9739), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:36,707 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.1412 (max= 1.9739), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:36,707 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.1412 (max= 1.9739), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:36,707 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.1412 (max= 1.9739), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:36,707 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.1412 (max= 1.9739), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:36,707 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.1412 (max= 1.9739), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:57:36,707 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.1412 (max= 1.9739), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:08,539 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.1402 (max= 1.6889), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:08,539 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.1402 (max= 1.6889), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:08,539 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.1402 (max= 1.6889), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:08,539 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.1402 (max= 1.6889), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:08,539 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.1402 (max= 1.6889), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:08,539 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.1402 (max= 1.6889), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:08,539 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.1402 (max= 1.6889), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:08,539 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.1402 (max= 1.6889), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:40,338 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.1568 (max= 2.3138), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:40,338 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.1568 (max= 2.3138), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:40,338 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.1568 (max= 2.3138), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:40,338 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.1568 (max= 2.3138), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:40,338 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.1568 (max= 2.3138), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:40,338 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.1568 (max= 2.3138), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:40,338 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.1568 (max= 2.3138), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:58:40,338 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.1568 (max= 2.3138), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:12,212 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.1607 (max= 1.9373), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:12,212 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.1607 (max= 1.9373), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:12,212 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.1607 (max= 1.9373), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:12,212 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.1607 (max= 1.9373), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:12,212 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.1607 (max= 1.9373), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:12,212 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.1607 (max= 1.9373), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:12,212 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.1607 (max= 1.9373), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:12,212 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.1607 (max= 1.9373), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:44,081 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.1428 (max= 1.9952), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:44,081 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.1428 (max= 1.9952), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:44,081 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.1428 (max= 1.9952), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:44,082 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.1428 (max= 1.9952), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:44,082 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.1428 (max= 1.9952), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:44,082 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.1428 (max= 1.9952), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:44,082 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.1428 (max= 1.9952), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 22:59:44,082 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.1428 (max= 1.9952), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:10,184 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:7085502 +2025-10-25 23:00:16,011 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.1551 (max= 1.8120), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:16,011 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.1551 (max= 1.8120), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:16,011 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.1551 (max= 1.8120), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:16,011 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.1551 (max= 1.8120), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:16,011 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.1551 (max= 1.8120), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:16,011 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.1551 (max= 1.8120), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:16,011 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.1551 (max= 1.8120), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:16,011 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.1551 (max= 1.8120), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:47,822 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.1666 (max= 2.2382), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:47,822 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.1666 (max= 2.2382), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:47,823 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.1666 (max= 2.2382), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:47,823 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.1666 (max= 2.2382), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:47,823 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.1666 (max= 2.2382), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:47,823 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.1666 (max= 2.2382), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:47,823 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.1666 (max= 2.2382), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:00:47,823 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.1666 (max= 2.2382), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:13,876 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:3006627 +2025-10-25 23:01:19,685 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.1669 (max= 1.6485), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:19,685 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.1669 (max= 1.6485), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:19,685 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.1669 (max= 1.6485), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:19,685 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.1669 (max= 1.6485), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:19,685 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.1669 (max= 1.6485), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:19,685 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.1669 (max= 1.6485), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:19,685 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.1669 (max= 1.6485), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:19,685 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.1669 (max= 1.6485), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:20,249 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:6230317 +2025-10-25 23:01:51,618 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.1585 (max= 1.6879), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:51,618 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.1585 (max= 1.6879), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:51,618 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.1585 (max= 1.6879), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:51,618 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.1585 (max= 1.6879), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:51,618 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.1585 (max= 1.6879), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:51,618 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.1585 (max= 1.6879), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:51,618 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.1585 (max= 1.6879), tps=20525, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:01:51,618 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.1585 (max= 1.6879), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:23,482 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.1232 (max= 1.8596), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:23,482 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.1232 (max= 1.8596), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:23,482 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.1232 (max= 1.8596), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:23,482 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.1232 (max= 1.8596), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:23,482 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.1232 (max= 1.8596), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:23,482 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.1232 (max= 1.8596), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:23,482 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.1232 (max= 1.8596), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:23,482 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.1232 (max= 1.8596), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:55,343 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.1444 (max= 2.0957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:55,343 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.1444 (max= 2.0957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:55,343 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.1444 (max= 2.0957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:55,343 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.1444 (max= 2.0957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:55,343 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.1444 (max= 2.0957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:55,343 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.1444 (max= 2.0957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:55,343 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.1444 (max= 2.0957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:02:55,343 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.1444 (max= 2.0957), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:27,188 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.1534 (max= 2.0413), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:27,189 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.1534 (max= 2.0413), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:27,189 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.1534 (max= 2.0413), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:27,189 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.1534 (max= 2.0413), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:27,189 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.1534 (max= 2.0413), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:27,189 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.1534 (max= 2.0413), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:27,189 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.1534 (max= 2.0413), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:27,189 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.1534 (max= 2.0413), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:59,003 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.1686 (max= 1.8164), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:59,003 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.1686 (max= 1.8164), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:59,003 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.1686 (max= 1.8164), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:59,003 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.1686 (max= 1.8164), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:59,003 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.1686 (max= 1.8164), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:59,003 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.1686 (max= 1.8164), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:59,003 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.1686 (max= 1.8164), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:03:59,003 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.1686 (max= 1.8164), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:04:30,923 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.1467 (max= 1.9911), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:04:30,923 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.1467 (max= 1.9911), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:04:30,923 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.1467 (max= 1.9911), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:04:30,923 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.1467 (max= 1.9911), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:04:30,923 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.1467 (max= 1.9911), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:04:30,923 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.1467 (max= 1.9911), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:04:30,923 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.1467 (max= 1.9911), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:04:30,923 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.1467 (max= 1.9911), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:02,726 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.1421 (max= 2.0218), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:02,726 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.1421 (max= 2.0218), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:02,726 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.1421 (max= 2.0218), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:02,726 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.1421 (max= 2.0218), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:02,726 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.1421 (max= 2.0218), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:02,726 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.1421 (max= 2.0218), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:02,726 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.1421 (max= 2.0218), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:02,726 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.1421 (max= 2.0218), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:34,556 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.1446 (max= 1.9534), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:34,556 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.1446 (max= 1.9534), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:34,556 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.1446 (max= 1.9534), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:34,556 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.1446 (max= 1.9534), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:34,556 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.1446 (max= 1.9534), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:34,556 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.1446 (max= 1.9534), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:34,556 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.1446 (max= 1.9534), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:05:34,556 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.1446 (max= 1.9534), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:06,427 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.1337 (max= 1.8616), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:06,427 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.1337 (max= 1.8616), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:06,427 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.1337 (max= 1.8616), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:06,427 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.1337 (max= 1.8616), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:06,427 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.1337 (max= 1.8616), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:06,427 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.1337 (max= 1.8616), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:06,427 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.1337 (max= 1.8616), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:06,427 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.1337 (max= 1.8616), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:38,247 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.1358 (max= 1.8354), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:38,247 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.1358 (max= 1.8354), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:38,248 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.1358 (max= 1.8354), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:38,248 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.1358 (max= 1.8354), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:38,248 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.1358 (max= 1.8354), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:38,248 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.1358 (max= 1.8354), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:38,248 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.1358 (max= 1.8354), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:06:38,248 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.1358 (max= 1.8354), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:10,057 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.1515 (max= 1.9546), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:10,058 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.1515 (max= 1.9546), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:10,058 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.1515 (max= 1.9546), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:10,058 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.1515 (max= 1.9546), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:10,058 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.1515 (max= 1.9546), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:10,058 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.1515 (max= 1.9546), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:10,058 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.1515 (max= 1.9546), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:10,058 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.1515 (max= 1.9546), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:41,942 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.1481 (max= 2.0521), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:41,942 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.1481 (max= 2.0521), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:41,942 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.1481 (max= 2.0521), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:41,942 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.1481 (max= 2.0521), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:41,942 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.1481 (max= 2.0521), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:41,942 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.1481 (max= 2.0521), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:41,942 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.1481 (max= 2.0521), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:07:41,942 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.1481 (max= 2.0521), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:13,784 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.1318 (max= 1.8252), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:13,784 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.1318 (max= 1.8252), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:13,784 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.1318 (max= 1.8252), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:13,784 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.1318 (max= 1.8252), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:13,785 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.1318 (max= 1.8252), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:13,785 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.1318 (max= 1.8252), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:13,785 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.1318 (max= 1.8252), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:13,785 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.1318 (max= 1.8252), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:45,555 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.1143 (max= 2.1958), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:45,556 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.1143 (max= 2.1958), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:45,556 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.1143 (max= 2.1958), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:45,556 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.1143 (max= 2.1958), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:45,556 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.1143 (max= 2.1958), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:45,556 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.1143 (max= 2.1958), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:45,556 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.1143 (max= 2.1958), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:08:45,556 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.1143 (max= 2.1958), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:17,406 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.1456 (max= 1.8237), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:17,406 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.1456 (max= 1.8237), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:17,406 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.1456 (max= 1.8237), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:17,406 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.1456 (max= 1.8237), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:17,406 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.1456 (max= 1.8237), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:17,406 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.1456 (max= 1.8237), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:17,406 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.1456 (max= 1.8237), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:17,406 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.1456 (max= 1.8237), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:49,348 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.1480 (max= 1.5895), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:49,348 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.1480 (max= 1.5895), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:49,348 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.1480 (max= 1.5895), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:49,348 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.1480 (max= 1.5895), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:49,348 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.1480 (max= 1.5895), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:49,348 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.1480 (max= 1.5895), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:49,348 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.1480 (max= 1.5895), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:09:49,348 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.1480 (max= 1.5895), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:21,155 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.1438 (max= 1.8984), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:21,155 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.1438 (max= 1.8984), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:21,155 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.1438 (max= 1.8984), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:21,155 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.1438 (max= 1.8984), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:21,155 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.1438 (max= 1.8984), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:21,155 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.1438 (max= 1.8984), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:21,155 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.1438 (max= 1.8984), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:21,156 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.1438 (max= 1.8984), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:52,990 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.1459 (max= 2.2308), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:52,990 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.1459 (max= 2.2308), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:52,990 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.1459 (max= 2.2308), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:52,990 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.1459 (max= 2.2308), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:52,990 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.1459 (max= 2.2308), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:52,990 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.1459 (max= 2.2308), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:52,990 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.1459 (max= 2.2308), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:10:52,990 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.1459 (max= 2.2308), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:24,838 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.1867 (max= 2.0209), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:24,838 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.1867 (max= 2.0209), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:24,838 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.1867 (max= 2.0209), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:24,838 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.1867 (max= 2.0209), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:24,838 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.1867 (max= 2.0209), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:24,838 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.1867 (max= 2.0209), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:24,838 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.1867 (max= 2.0209), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:24,838 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.1867 (max= 2.0209), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:56,665 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.1247 (max= 1.7571), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:56,665 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.1247 (max= 1.7571), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:56,666 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.1247 (max= 1.7571), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:56,666 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.1247 (max= 1.7571), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:56,666 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.1247 (max= 1.7571), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:56,666 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.1247 (max= 1.7571), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:56,666 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.1247 (max= 1.7571), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:11:56,666 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.1247 (max= 1.7571), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:12:28,474 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.1150 (max= 1.8566), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:12:28,474 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.1150 (max= 1.8566), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:12:28,474 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.1150 (max= 1.8566), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:12:28,474 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.1150 (max= 1.8566), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:12:28,474 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.1150 (max= 1.8566), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:12:28,474 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.1150 (max= 1.8566), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:12:28,474 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.1150 (max= 1.8566), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:12:28,474 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.1150 (max= 1.8566), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:00,375 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.1385 (max= 1.7931), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:00,375 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.1385 (max= 1.7931), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:00,375 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.1385 (max= 1.7931), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:00,375 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.1385 (max= 1.7931), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:00,375 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.1385 (max= 1.7931), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:00,375 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.1385 (max= 1.7931), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:00,376 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.1385 (max= 1.7931), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:00,376 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.1385 (max= 1.7931), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:32,257 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.1478 (max= 1.9279), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:32,257 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.1478 (max= 1.9279), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:32,257 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.1478 (max= 1.9279), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:32,257 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.1478 (max= 1.9279), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:32,257 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.1478 (max= 1.9279), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:32,257 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.1478 (max= 1.9279), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:32,257 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.1478 (max= 1.9279), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:13:32,258 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.1478 (max= 1.9279), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:04,106 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.1505 (max= 1.8588), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:04,106 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.1505 (max= 1.8588), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:04,106 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.1505 (max= 1.8588), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:04,106 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.1505 (max= 1.8588), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:04,106 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.1505 (max= 1.8588), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:04,106 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.1505 (max= 1.8588), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:04,106 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.1505 (max= 1.8588), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:04,106 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.1505 (max= 1.8588), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:36,026 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.1245 (max= 1.7127), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:36,026 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.1245 (max= 1.7127), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:36,026 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.1245 (max= 1.7127), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:36,026 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.1245 (max= 1.7127), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:36,026 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.1245 (max= 1.7127), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:36,026 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.1245 (max= 1.7127), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:36,026 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.1245 (max= 1.7127), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:14:36,026 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.1245 (max= 1.7127), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:07,843 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.1416 (max= 1.7526), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:07,843 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.1416 (max= 1.7526), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:07,843 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.1416 (max= 1.7526), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:07,843 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.1416 (max= 1.7526), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:07,843 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.1416 (max= 1.7526), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:07,843 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.1416 (max= 1.7526), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:07,843 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.1416 (max= 1.7526), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:07,843 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.1416 (max= 1.7526), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:39,756 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.1273 (max= 1.6759), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:39,756 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.1273 (max= 1.6759), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:39,756 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.1273 (max= 1.6759), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:39,756 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.1273 (max= 1.6759), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:39,756 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.1273 (max= 1.6759), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:39,756 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.1273 (max= 1.6759), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:39,756 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.1273 (max= 1.6759), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:15:39,756 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.1273 (max= 1.6759), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:11,677 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.1159 (max= 1.9318), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:11,677 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.1159 (max= 1.9318), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:11,677 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.1159 (max= 1.9318), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:11,677 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.1159 (max= 1.9318), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:11,677 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.1159 (max= 1.9318), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:11,677 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.1159 (max= 1.9318), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:11,677 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.1159 (max= 1.9318), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:11,677 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.1159 (max= 1.9318), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:43,483 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.1575 (max= 1.6666), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:43,483 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.1575 (max= 1.6666), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:43,483 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.1575 (max= 1.6666), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:43,483 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.1575 (max= 1.6666), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:43,483 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.1575 (max= 1.6666), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:43,483 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.1575 (max= 1.6666), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:43,483 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.1575 (max= 1.6666), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:16:43,483 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.1575 (max= 1.6666), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:15,328 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1303 (max= 1.7895), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:15,328 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1303 (max= 1.7895), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:15,328 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1303 (max= 1.7895), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:15,328 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1303 (max= 1.7895), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:15,328 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1303 (max= 1.7895), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:15,328 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1303 (max= 1.7895), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:15,328 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1303 (max= 1.7895), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:15,328 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1303 (max= 1.7895), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:47,198 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.1482 (max= 1.8514), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:47,198 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.1482 (max= 1.8514), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:47,198 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.1482 (max= 1.8514), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:47,198 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.1482 (max= 1.8514), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:47,198 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.1482 (max= 1.8514), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:47,198 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.1482 (max= 1.8514), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:47,198 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.1482 (max= 1.8514), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:17:47,198 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.1482 (max= 1.8514), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:19,032 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.1379 (max= 1.8401), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:19,032 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.1379 (max= 1.8401), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:19,032 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.1379 (max= 1.8401), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:19,032 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.1379 (max= 1.8401), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:19,032 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.1379 (max= 1.8401), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:19,032 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.1379 (max= 1.8401), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:19,032 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.1379 (max= 1.8401), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:19,033 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.1379 (max= 1.8401), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:50,898 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.1689 (max= 1.8954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:50,898 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.1689 (max= 1.8954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:50,898 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.1689 (max= 1.8954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:50,898 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.1689 (max= 1.8954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:50,898 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.1689 (max= 1.8954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:50,898 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.1689 (max= 1.8954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:50,898 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.1689 (max= 1.8954), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:18:50,898 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.1689 (max= 1.8954), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:22,783 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.1073 (max= 1.7602), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:22,783 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.1073 (max= 1.7602), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:22,783 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.1073 (max= 1.7602), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:22,783 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.1073 (max= 1.7602), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:22,783 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.1073 (max= 1.7602), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:22,783 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.1073 (max= 1.7602), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:22,783 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.1073 (max= 1.7602), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:22,783 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.1073 (max= 1.7602), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:54,677 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.1180 (max= 1.5088), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:54,678 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.1180 (max= 1.5088), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:54,678 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.1180 (max= 1.5088), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:54,678 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.1180 (max= 1.5088), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:54,678 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.1180 (max= 1.5088), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:54,678 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.1180 (max= 1.5088), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:54,678 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.1180 (max= 1.5088), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:19:54,678 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.1180 (max= 1.5088), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:26,518 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.1278 (max= 1.7225), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:26,518 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.1278 (max= 1.7225), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:26,518 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.1278 (max= 1.7225), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:26,518 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.1278 (max= 1.7225), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:26,518 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.1278 (max= 1.7225), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:26,518 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.1278 (max= 1.7225), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:26,518 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.1278 (max= 1.7225), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:26,518 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.1278 (max= 1.7225), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:58,357 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.1509 (max= 1.6954), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:58,357 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.1509 (max= 1.6954), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:58,357 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.1509 (max= 1.6954), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:58,357 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.1509 (max= 1.6954), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:58,357 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.1509 (max= 1.6954), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:58,357 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.1509 (max= 1.6954), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:58,357 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.1509 (max= 1.6954), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:20:58,357 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.1509 (max= 1.6954), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:21:30,201 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.1306 (max= 1.6833), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:21:30,201 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.1306 (max= 1.6833), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:21:30,202 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.1306 (max= 1.6833), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:21:30,202 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.1306 (max= 1.6833), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:21:30,202 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.1306 (max= 1.6833), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:21:30,202 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.1306 (max= 1.6833), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:21:30,202 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.1306 (max= 1.6833), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:21:30,202 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.1306 (max= 1.6833), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:02,050 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.1317 (max= 1.9948), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:02,050 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.1317 (max= 1.9948), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:02,050 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.1317 (max= 1.9948), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:02,050 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.1317 (max= 1.9948), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:02,050 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.1317 (max= 1.9948), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:02,050 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.1317 (max= 1.9948), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:02,050 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.1317 (max= 1.9948), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:02,050 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.1317 (max= 1.9948), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:33,875 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.1145 (max= 1.6711), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:33,876 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.1145 (max= 1.6711), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:33,876 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.1145 (max= 1.6711), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:33,876 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.1145 (max= 1.6711), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:33,876 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.1145 (max= 1.6711), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:33,876 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.1145 (max= 1.6711), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:33,876 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.1145 (max= 1.6711), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:22:33,876 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.1145 (max= 1.6711), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:05,843 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.1319 (max= 1.8040), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:05,843 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.1319 (max= 1.8040), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:05,843 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.1319 (max= 1.8040), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:05,843 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.1319 (max= 1.8040), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:05,843 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.1319 (max= 1.8040), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:05,843 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.1319 (max= 1.8040), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:05,844 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.1319 (max= 1.8040), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:05,844 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.1319 (max= 1.8040), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:37,712 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.0992 (max= 1.8266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:37,712 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.0992 (max= 1.8266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:37,712 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.0992 (max= 1.8266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:37,712 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.0992 (max= 1.8266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:37,712 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.0992 (max= 1.8266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:37,712 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.0992 (max= 1.8266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:37,712 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.0992 (max= 1.8266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:23:37,712 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.0992 (max= 1.8266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:09,569 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.0981 (max= 1.6116), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:09,570 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.0981 (max= 1.6116), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:09,570 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.0981 (max= 1.6116), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:09,570 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.0981 (max= 1.6116), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:09,570 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.0981 (max= 1.6116), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:09,570 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.0981 (max= 1.6116), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:09,570 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.0981 (max= 1.6116), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:09,570 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.0981 (max= 1.6116), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:41,492 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.1185 (max= 1.6567), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:41,492 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.1185 (max= 1.6567), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:41,492 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.1185 (max= 1.6567), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:41,492 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.1185 (max= 1.6567), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:41,492 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.1185 (max= 1.6567), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:41,492 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.1185 (max= 1.6567), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:41,492 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.1185 (max= 1.6567), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:24:41,493 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.1185 (max= 1.6567), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:13,354 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.1151 (max= 1.5929), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:13,354 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.1151 (max= 1.5929), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:13,354 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.1151 (max= 1.5929), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:13,354 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.1151 (max= 1.5929), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:13,354 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.1151 (max= 1.5929), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:13,354 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.1151 (max= 1.5929), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:13,354 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.1151 (max= 1.5929), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:13,354 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.1151 (max= 1.5929), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:45,162 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.1550 (max= 1.5718), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:45,163 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.1550 (max= 1.5718), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:45,163 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.1550 (max= 1.5718), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:45,163 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.1550 (max= 1.5718), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:45,163 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.1550 (max= 1.5718), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:45,163 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.1550 (max= 1.5718), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:45,163 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.1550 (max= 1.5718), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:25:45,163 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.1550 (max= 1.5718), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:17,054 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.1263 (max= 1.7007), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:17,054 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.1263 (max= 1.7007), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:17,054 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.1263 (max= 1.7007), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:17,054 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.1263 (max= 1.7007), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:17,054 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.1263 (max= 1.7007), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:17,054 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.1263 (max= 1.7007), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:17,054 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.1263 (max= 1.7007), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:17,054 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.1263 (max= 1.7007), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:48,963 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.1282 (max= 1.7009), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:48,963 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.1282 (max= 1.7009), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:48,963 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.1282 (max= 1.7009), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:48,963 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.1282 (max= 1.7009), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:48,963 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.1282 (max= 1.7009), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:48,963 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.1282 (max= 1.7009), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:48,963 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.1282 (max= 1.7009), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:26:48,964 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.1282 (max= 1.7009), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:20,857 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.1251 (max= 1.5369), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:20,857 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.1251 (max= 1.5369), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:20,857 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.1251 (max= 1.5369), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:20,857 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.1251 (max= 1.5369), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:20,857 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.1251 (max= 1.5369), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:20,857 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.1251 (max= 1.5369), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:20,857 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.1251 (max= 1.5369), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:20,858 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.1251 (max= 1.5369), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:52,709 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.1129 (max= 1.7728), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:52,709 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.1129 (max= 1.7728), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:52,709 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.1129 (max= 1.7728), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:52,709 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.1129 (max= 1.7728), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:52,709 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.1129 (max= 1.7728), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:52,709 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.1129 (max= 1.7728), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:52,709 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.1129 (max= 1.7728), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:27:52,709 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.1129 (max= 1.7728), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:24,497 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.1204 (max= 1.5869), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:24,497 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.1204 (max= 1.5869), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:24,497 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.1204 (max= 1.5869), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:24,497 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.1204 (max= 1.5869), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:24,497 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.1204 (max= 1.5869), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:24,497 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.1204 (max= 1.5869), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:24,497 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.1204 (max= 1.5869), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:24,497 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.1204 (max= 1.5869), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:56,360 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.1007 (max= 1.8471), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:56,360 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.1007 (max= 1.8471), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:56,361 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.1007 (max= 1.8471), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:56,361 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.1007 (max= 1.8471), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:56,361 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.1007 (max= 1.8471), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:56,361 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.1007 (max= 1.8471), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:56,361 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.1007 (max= 1.8471), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:28:56,361 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.1007 (max= 1.8471), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:28,165 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.1079 (max= 1.8854), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:28,165 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.1079 (max= 1.8854), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:28,165 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.1079 (max= 1.8854), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:28,165 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.1079 (max= 1.8854), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:28,165 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.1079 (max= 1.8854), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:28,165 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.1079 (max= 1.8854), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:28,165 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.1079 (max= 1.8854), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:28,165 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.1079 (max= 1.8854), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:59,952 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.1119 (max= 1.8094), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:59,952 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.1119 (max= 1.8094), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:59,952 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.1119 (max= 1.8094), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:59,952 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.1119 (max= 1.8094), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:59,952 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.1119 (max= 1.8094), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:59,952 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.1119 (max= 1.8094), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:59,952 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.1119 (max= 1.8094), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:29:59,952 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.1119 (max= 1.8094), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:30:31,725 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.1381 (max= 1.6442), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:30:31,725 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.1381 (max= 1.6442), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:30:31,726 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.1381 (max= 1.6442), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:30:31,726 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.1381 (max= 1.6442), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:30:31,726 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.1381 (max= 1.6442), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:30:31,726 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.1381 (max= 1.6442), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:30:31,726 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.1381 (max= 1.6442), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:30:31,726 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.1381 (max= 1.6442), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:03,542 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.1457 (max= 1.7354), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:03,542 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.1457 (max= 1.7354), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:03,542 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.1457 (max= 1.7354), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:03,542 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.1457 (max= 1.7354), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:03,542 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.1457 (max= 1.7354), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:03,542 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.1457 (max= 1.7354), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:03,542 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.1457 (max= 1.7354), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:03,542 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.1457 (max= 1.7354), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:35,357 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.1330 (max= 1.8178), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:35,357 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.1330 (max= 1.8178), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:35,357 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.1330 (max= 1.8178), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:35,357 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.1330 (max= 1.8178), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:35,357 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.1330 (max= 1.8178), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:35,357 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.1330 (max= 1.8178), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:35,357 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.1330 (max= 1.8178), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:31:35,358 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.1330 (max= 1.8178), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:07,281 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.1269 (max= 1.6454), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:07,281 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.1269 (max= 1.6454), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:07,281 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.1269 (max= 1.6454), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:07,281 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.1269 (max= 1.6454), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:07,281 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.1269 (max= 1.6454), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:07,281 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.1269 (max= 1.6454), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:07,281 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.1269 (max= 1.6454), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:07,282 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.1269 (max= 1.6454), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:39,104 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.1274 (max= 1.8443), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:39,104 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.1274 (max= 1.8443), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:39,104 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.1274 (max= 1.8443), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:39,104 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.1274 (max= 1.8443), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:39,104 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.1274 (max= 1.8443), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:39,104 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.1274 (max= 1.8443), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:39,104 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.1274 (max= 1.8443), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:32:39,104 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.1274 (max= 1.8443), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:10,918 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.1261 (max= 1.6525), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:10,918 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.1261 (max= 1.6525), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:10,918 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.1261 (max= 1.6525), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:10,918 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.1261 (max= 1.6525), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:10,918 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.1261 (max= 1.6525), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:10,918 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.1261 (max= 1.6525), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:10,918 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.1261 (max= 1.6525), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:10,918 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.1261 (max= 1.6525), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:42,828 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.1464 (max= 1.5763), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:42,828 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.1464 (max= 1.5763), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:42,828 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.1464 (max= 1.5763), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:42,828 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.1464 (max= 1.5763), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:42,828 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.1464 (max= 1.5763), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:42,828 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.1464 (max= 1.5763), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:42,828 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.1464 (max= 1.5763), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:33:42,828 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.1464 (max= 1.5763), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:14,714 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.1214 (max= 1.7495), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:14,714 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.1214 (max= 1.7495), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:14,714 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.1214 (max= 1.7495), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:14,714 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.1214 (max= 1.7495), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:14,714 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.1214 (max= 1.7495), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:14,714 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.1214 (max= 1.7495), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:14,714 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.1214 (max= 1.7495), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:14,714 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.1214 (max= 1.7495), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:46,670 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.1046 (max= 1.6748), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:46,670 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.1046 (max= 1.6748), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:46,670 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.1046 (max= 1.6748), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:46,670 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.1046 (max= 1.6748), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:46,670 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.1046 (max= 1.6748), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:46,670 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.1046 (max= 1.6748), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:46,670 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.1046 (max= 1.6748), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:34:46,670 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.1046 (max= 1.6748), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:18,541 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.1227 (max= 1.5696), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:18,541 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.1227 (max= 1.5696), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:18,541 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.1227 (max= 1.5696), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:18,541 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.1227 (max= 1.5696), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:18,541 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.1227 (max= 1.5696), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:18,541 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.1227 (max= 1.5696), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:18,541 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.1227 (max= 1.5696), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:18,541 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.1227 (max= 1.5696), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:50,417 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.1209 (max= 1.8951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:50,418 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.1209 (max= 1.8951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:50,418 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.1209 (max= 1.8951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:50,418 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.1209 (max= 1.8951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:50,418 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.1209 (max= 1.8951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:50,418 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.1209 (max= 1.8951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:50,418 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.1209 (max= 1.8951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:35:50,418 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.1209 (max= 1.8951), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:22,211 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.1418 (max= 1.5834), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:22,211 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.1418 (max= 1.5834), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:22,212 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.1418 (max= 1.5834), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:22,212 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.1418 (max= 1.5834), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:22,212 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.1418 (max= 1.5834), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:22,212 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.1418 (max= 1.5834), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:22,212 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.1418 (max= 1.5834), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:22,212 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.1418 (max= 1.5834), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:54,058 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.1206 (max= 1.5900), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:54,058 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.1206 (max= 1.5900), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:54,058 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.1206 (max= 1.5900), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:54,058 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.1206 (max= 1.5900), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:54,058 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.1206 (max= 1.5900), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:54,058 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.1206 (max= 1.5900), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:54,058 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.1206 (max= 1.5900), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:36:54,058 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.1206 (max= 1.5900), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:25,867 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.1429 (max= 2.0388), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:25,867 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.1429 (max= 2.0388), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:25,867 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.1429 (max= 2.0388), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:25,867 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.1429 (max= 2.0388), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:25,867 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.1429 (max= 2.0388), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:25,867 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.1429 (max= 2.0388), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:25,867 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.1429 (max= 2.0388), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:25,868 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.1429 (max= 2.0388), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:57,777 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.1270 (max= 1.6784), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:57,777 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.1270 (max= 1.6784), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:57,777 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.1270 (max= 1.6784), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:57,777 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.1270 (max= 1.6784), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:57,777 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.1270 (max= 1.6784), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:57,777 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.1270 (max= 1.6784), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:57,777 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.1270 (max= 1.6784), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:37:57,777 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.1270 (max= 1.6784), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:38:29,615 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.1452 (max= 1.6311), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:38:29,615 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.1452 (max= 1.6311), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:38:29,615 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.1452 (max= 1.6311), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:38:29,615 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.1452 (max= 1.6311), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:38:29,615 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.1452 (max= 1.6311), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:38:29,615 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.1452 (max= 1.6311), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:38:29,615 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.1452 (max= 1.6311), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:38:29,615 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.1452 (max= 1.6311), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:01,528 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.1163 (max= 1.8537), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:01,528 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.1163 (max= 1.8537), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:01,528 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.1163 (max= 1.8537), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:01,528 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.1163 (max= 1.8537), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:01,528 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.1163 (max= 1.8537), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:01,528 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.1163 (max= 1.8537), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:01,528 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.1163 (max= 1.8537), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:01,528 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.1163 (max= 1.8537), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:33,480 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.1296 (max= 1.5629), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:33,480 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.1296 (max= 1.5629), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:33,480 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.1296 (max= 1.5629), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:33,480 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.1296 (max= 1.5629), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:33,480 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.1296 (max= 1.5629), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:33,480 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.1296 (max= 1.5629), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:33,480 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.1296 (max= 1.5629), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:39:33,480 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.1296 (max= 1.5629), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:05,308 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1364 (max= 1.6786), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:05,308 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1364 (max= 1.6786), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:05,308 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1364 (max= 1.6786), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:05,308 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1364 (max= 1.6786), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:05,308 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1364 (max= 1.6786), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:05,308 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1364 (max= 1.6786), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:05,309 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1364 (max= 1.6786), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:05,309 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1364 (max= 1.6786), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:37,188 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.1229 (max= 1.4564), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:37,188 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.1229 (max= 1.4564), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:37,188 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.1229 (max= 1.4564), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:37,188 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.1229 (max= 1.4564), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:37,188 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.1229 (max= 1.4564), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:37,188 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.1229 (max= 1.4564), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:37,188 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.1229 (max= 1.4564), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:40:37,188 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.1229 (max= 1.4564), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:08,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.1198 (max= 1.5746), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:08,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.1198 (max= 1.5746), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:08,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.1198 (max= 1.5746), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:08,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.1198 (max= 1.5746), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:08,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.1198 (max= 1.5746), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:08,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.1198 (max= 1.5746), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:08,995 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.1198 (max= 1.5746), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:08,995 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.1198 (max= 1.5746), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:40,882 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.1487 (max= 1.5973), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:40,882 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.1487 (max= 1.5973), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:40,882 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.1487 (max= 1.5973), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:40,882 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.1487 (max= 1.5973), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:40,882 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.1487 (max= 1.5973), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:40,882 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.1487 (max= 1.5973), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:40,882 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.1487 (max= 1.5973), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:41:40,883 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.1487 (max= 1.5973), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:12,753 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.1276 (max= 1.7579), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:12,753 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.1276 (max= 1.7579), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:12,754 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.1276 (max= 1.7579), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:12,754 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.1276 (max= 1.7579), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:12,754 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.1276 (max= 1.7579), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:12,754 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.1276 (max= 1.7579), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:12,754 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.1276 (max= 1.7579), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:12,754 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.1276 (max= 1.7579), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:44,624 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.1233 (max= 1.5960), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:44,624 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.1233 (max= 1.5960), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:44,624 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.1233 (max= 1.5960), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:44,624 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.1233 (max= 1.5960), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:44,624 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.1233 (max= 1.5960), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:44,624 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.1233 (max= 1.5960), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:44,624 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.1233 (max= 1.5960), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:42:44,624 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.1233 (max= 1.5960), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:16,446 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.1370 (max= 1.6383), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:16,446 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.1370 (max= 1.6383), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:16,446 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.1370 (max= 1.6383), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:16,446 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.1370 (max= 1.6383), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:16,446 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.1370 (max= 1.6383), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:16,446 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.1370 (max= 1.6383), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:16,446 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.1370 (max= 1.6383), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:16,447 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.1370 (max= 1.6383), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:48,272 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.0984 (max= 1.6144), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:48,272 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.0984 (max= 1.6144), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:48,272 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.0984 (max= 1.6144), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:48,272 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.0984 (max= 1.6144), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:48,272 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.0984 (max= 1.6144), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:48,273 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.0984 (max= 1.6144), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:48,273 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.0984 (max= 1.6144), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:43:48,273 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.0984 (max= 1.6144), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:20,211 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.1245 (max= 1.5190), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:20,211 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.1245 (max= 1.5190), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:20,211 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.1245 (max= 1.5190), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:20,211 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.1245 (max= 1.5190), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:20,211 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.1245 (max= 1.5190), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:20,211 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.1245 (max= 1.5190), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:20,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.1245 (max= 1.5190), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:20,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.1245 (max= 1.5190), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-15000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-15000! Save time: 4.515306234359741 +2025-10-25 23:44:52,084 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.1182 (max= 1.6178), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:52,084 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-25 23:44:52,085 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 23:44:52,085 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.1182 (max= 1.6178), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:52,085 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.1182 (max= 1.6178), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:52,085 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.1182 (max= 1.6178), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:52,085 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.1182 (max= 1.6178), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:52,085 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.1182 (max= 1.6178), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:52,085 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-25 23:44:52,085 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 23:44:52,085 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.1182 (max= 1.6178), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:52,085 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-25 23:44:52,085 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-25 23:44:52,085 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-25 23:44:52,085 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 23:44:52,085 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-25 23:44:52,085 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 23:44:52,085 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 23:44:52,085 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 23:44:52,085 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-25 23:44:52,085 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 23:44:52,085 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.1182 (max= 1.6178), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:44:52,085 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-25 23:44:52,085 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 23:45:06,727 - root - INFO - Finished saving the checkpoint in 14.64 seconds +2025-10-25 23:45:06,735 - root - INFO - Finished saving the checkpoint in 14.65 seconds +2025-10-25 23:45:06,736 - root - INFO - Finished saving the checkpoint in 14.65 seconds +2025-10-25 23:45:06,736 - root - INFO - Finished saving the checkpoint in 14.65 seconds +2025-10-25 23:45:06,736 - root - INFO - Finished saving the checkpoint in 14.65 seconds +2025-10-25 23:45:06,736 - root - INFO - Finished saving the checkpoint in 14.65 seconds +2025-10-25 23:45:06,737 - root - INFO - Finished saving the checkpoint in 14.65 seconds +2025-10-25 23:45:06,737 - root - INFO - Finished saving the checkpoint in 14.65 seconds +2025-10-25 23:45:38,557 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.1190 (max= 1.5201), tps=14103, mfu=29.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:45:38,557 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.1190 (max= 1.5201), tps=14103, mfu=29.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:45:38,557 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.1190 (max= 1.5201), tps=14103, mfu=29.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:45:38,557 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.1190 (max= 1.5201), tps=14103, mfu=29.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:45:38,557 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.1190 (max= 1.5201), tps=14103, mfu=29.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:45:38,557 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.1190 (max= 1.5201), tps=14103, mfu=29.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:45:38,557 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.1190 (max= 1.5201), tps=14103, mfu=29.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:45:38,557 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.1190 (max= 1.5201), tps=14103, mfu=29.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:10,436 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1347 (max= 1.5805), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:10,436 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1347 (max= 1.5805), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:10,436 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1347 (max= 1.5805), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:10,436 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1347 (max= 1.5805), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:10,436 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1347 (max= 1.5805), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:10,436 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1347 (max= 1.5805), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:10,436 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1347 (max= 1.5805), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:10,436 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1347 (max= 1.5805), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:42,255 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.1321 (max= 1.5870), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:42,256 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.1321 (max= 1.5870), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:42,256 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.1321 (max= 1.5870), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:42,256 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.1321 (max= 1.5870), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:42,256 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.1321 (max= 1.5870), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:42,256 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.1321 (max= 1.5870), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:42,256 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.1321 (max= 1.5870), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:46:42,256 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.1321 (max= 1.5870), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:14,124 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.1112 (max= 1.5736), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:14,124 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.1112 (max= 1.5736), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:14,124 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.1112 (max= 1.5736), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:14,124 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.1112 (max= 1.5736), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:14,124 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.1112 (max= 1.5736), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:14,124 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.1112 (max= 1.5736), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:14,124 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.1112 (max= 1.5736), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:14,124 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.1112 (max= 1.5736), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:45,934 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.1209 (max= 1.5224), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:45,934 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.1209 (max= 1.5224), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:45,934 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.1209 (max= 1.5224), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:45,934 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.1209 (max= 1.5224), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:45,934 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.1209 (max= 1.5224), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:45,934 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.1209 (max= 1.5224), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:45,934 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.1209 (max= 1.5224), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:47:45,935 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.1209 (max= 1.5224), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:17,694 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.1612 (max= 2.0717), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:17,694 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.1612 (max= 2.0717), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:17,694 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.1612 (max= 2.0717), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:17,694 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.1612 (max= 2.0717), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:17,694 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.1612 (max= 2.0717), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:17,694 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.1612 (max= 2.0717), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:17,694 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.1612 (max= 2.0717), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:17,694 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.1612 (max= 2.0717), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:49,593 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.1293 (max= 1.6622), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:49,593 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.1293 (max= 1.6622), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:49,593 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.1293 (max= 1.6622), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:49,593 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.1293 (max= 1.6622), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:49,593 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.1293 (max= 1.6622), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:49,593 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.1293 (max= 1.6622), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:49,593 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.1293 (max= 1.6622), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:48:49,594 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.1293 (max= 1.6622), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:21,449 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.1372 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:21,449 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.1372 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:21,449 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.1372 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:21,449 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.1372 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:21,449 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.1372 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:21,449 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.1372 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:21,449 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.1372 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:21,449 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.1372 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:53,235 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.1118 (max= 1.5845), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:53,235 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.1118 (max= 1.5845), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:53,236 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.1118 (max= 1.5845), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:53,236 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.1118 (max= 1.5845), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:53,236 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.1118 (max= 1.5845), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:53,236 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.1118 (max= 1.5845), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:53,236 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.1118 (max= 1.5845), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:49:53,236 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.1118 (max= 1.5845), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:25,065 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.1017 (max= 1.5492), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:25,065 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.1017 (max= 1.5492), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:25,065 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.1017 (max= 1.5492), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:25,065 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.1017 (max= 1.5492), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:25,065 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.1017 (max= 1.5492), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:25,065 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.1017 (max= 1.5492), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:25,066 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.1017 (max= 1.5492), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:25,066 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.1017 (max= 1.5492), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:56,907 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.1110 (max= 1.5278), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:56,907 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.1110 (max= 1.5278), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:56,907 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.1110 (max= 1.5278), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:56,907 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.1110 (max= 1.5278), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:56,907 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.1110 (max= 1.5278), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:56,907 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.1110 (max= 1.5278), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:56,907 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.1110 (max= 1.5278), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:50:56,907 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.1110 (max= 1.5278), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:51:28,872 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.1143 (max= 1.4840), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:51:28,872 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.1143 (max= 1.4840), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:51:28,872 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.1143 (max= 1.4840), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:51:28,872 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.1143 (max= 1.4840), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:51:28,872 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.1143 (max= 1.4840), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:51:28,872 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.1143 (max= 1.4840), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:51:28,872 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.1143 (max= 1.4840), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:51:28,873 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.1143 (max= 1.4840), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:00,768 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1186 (max= 1.7368), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:00,768 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1186 (max= 1.7368), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:00,768 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1186 (max= 1.7368), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:00,768 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1186 (max= 1.7368), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:00,768 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1186 (max= 1.7368), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:00,768 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1186 (max= 1.7368), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:00,768 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1186 (max= 1.7368), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:00,768 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1186 (max= 1.7368), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:32,558 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.0919 (max= 1.5526), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:32,558 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.0919 (max= 1.5526), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:32,559 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.0919 (max= 1.5526), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:32,559 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.0919 (max= 1.5526), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:32,559 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.0919 (max= 1.5526), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:32,559 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.0919 (max= 1.5526), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:32,559 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.0919 (max= 1.5526), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:52:32,559 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.0919 (max= 1.5526), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:04,441 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.1288 (max= 1.5170), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:04,442 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.1288 (max= 1.5170), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:04,442 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.1288 (max= 1.5170), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:04,442 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.1288 (max= 1.5170), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:04,442 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.1288 (max= 1.5170), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:04,442 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.1288 (max= 1.5170), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:04,442 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.1288 (max= 1.5170), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:04,442 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.1288 (max= 1.5170), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:36,244 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.1394 (max= 1.6645), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:36,244 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.1394 (max= 1.6645), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:36,244 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.1394 (max= 1.6645), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:36,244 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.1394 (max= 1.6645), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:36,244 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.1394 (max= 1.6645), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:36,244 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.1394 (max= 1.6645), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:36,244 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.1394 (max= 1.6645), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:53:36,244 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.1394 (max= 1.6645), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:08,107 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1379 (max= 1.5821), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:08,107 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1379 (max= 1.5821), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:08,107 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1379 (max= 1.5821), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:08,107 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1379 (max= 1.5821), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:08,107 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1379 (max= 1.5821), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:08,107 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1379 (max= 1.5821), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:08,107 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1379 (max= 1.5821), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:08,108 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1379 (max= 1.5821), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:39,911 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.1225 (max= 1.7460), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:39,911 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.1225 (max= 1.7460), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:39,911 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.1225 (max= 1.7460), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:39,911 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.1225 (max= 1.7460), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:39,911 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.1225 (max= 1.7460), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:39,911 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.1225 (max= 1.7460), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:39,911 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.1225 (max= 1.7460), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:54:39,911 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.1225 (max= 1.7460), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:11,691 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.0987 (max= 1.5341), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:11,692 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.0987 (max= 1.5341), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:11,692 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.0987 (max= 1.5341), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:11,692 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.0987 (max= 1.5341), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:11,692 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.0987 (max= 1.5341), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:11,692 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.0987 (max= 1.5341), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:11,692 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.0987 (max= 1.5341), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:11,692 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.0987 (max= 1.5341), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:43,653 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.1090 (max= 1.5977), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:43,653 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.1090 (max= 1.5977), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:43,653 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.1090 (max= 1.5977), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:43,653 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.1090 (max= 1.5977), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:43,653 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.1090 (max= 1.5977), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:43,653 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.1090 (max= 1.5977), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:43,653 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.1090 (max= 1.5977), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:55:43,653 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.1090 (max= 1.5977), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:15,533 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.1023 (max= 1.5770), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:15,534 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.1023 (max= 1.5770), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:15,534 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.1023 (max= 1.5770), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:15,534 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.1023 (max= 1.5770), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:15,534 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.1023 (max= 1.5770), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:15,534 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.1023 (max= 1.5770), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:15,534 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.1023 (max= 1.5770), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:15,534 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.1023 (max= 1.5770), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:47,353 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.1363 (max= 1.6764), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:47,353 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.1363 (max= 1.6764), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:47,353 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.1363 (max= 1.6764), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:47,353 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.1363 (max= 1.6764), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:47,353 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.1363 (max= 1.6764), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:47,353 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.1363 (max= 1.6764), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:47,353 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.1363 (max= 1.6764), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:56:47,353 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.1363 (max= 1.6764), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:19,278 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.1045 (max= 1.6640), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:19,278 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.1045 (max= 1.6640), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:19,278 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.1045 (max= 1.6640), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:19,278 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.1045 (max= 1.6640), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:19,278 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.1045 (max= 1.6640), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:19,279 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.1045 (max= 1.6640), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:19,279 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.1045 (max= 1.6640), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:19,279 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.1045 (max= 1.6640), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:51,179 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.1020 (max= 1.5316), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:51,180 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.1020 (max= 1.5316), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:51,180 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.1020 (max= 1.5316), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:51,180 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.1020 (max= 1.5316), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:51,180 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.1020 (max= 1.5316), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:51,180 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.1020 (max= 1.5316), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:51,180 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.1020 (max= 1.5316), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:57:51,180 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.1020 (max= 1.5316), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:22,938 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.1298 (max= 1.6227), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:22,938 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.1298 (max= 1.6227), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:22,938 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.1298 (max= 1.6227), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:22,938 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.1298 (max= 1.6227), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:22,938 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.1298 (max= 1.6227), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:22,938 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.1298 (max= 1.6227), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:22,938 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.1298 (max= 1.6227), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:22,938 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.1298 (max= 1.6227), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:26,666 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:4886546 +2025-10-25 23:58:54,722 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.0804 (max= 1.4751), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:54,722 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.0804 (max= 1.4751), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:54,722 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.0804 (max= 1.4751), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:54,722 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.0804 (max= 1.4751), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:54,722 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.0804 (max= 1.4751), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:54,722 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.0804 (max= 1.4751), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:54,722 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.0804 (max= 1.4751), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:58:54,722 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.0804 (max= 1.4751), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:26,552 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.1068 (max= 1.7316), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:26,552 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.1068 (max= 1.7316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:26,552 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.1068 (max= 1.7316), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:26,552 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.1068 (max= 1.7316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:26,552 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.1068 (max= 1.7316), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:26,552 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.1068 (max= 1.7316), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:26,553 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.1068 (max= 1.7316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:26,553 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.1068 (max= 1.7316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:32,087 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:785384 +2025-10-25 23:59:58,440 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.1216 (max= 1.6623), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:58,440 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.1216 (max= 1.6623), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:58,440 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.1216 (max= 1.6623), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:58,440 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.1216 (max= 1.6623), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:58,440 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.1216 (max= 1.6623), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:58,440 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.1216 (max= 1.6623), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:58,440 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.1216 (max= 1.6623), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 23:59:58,440 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.1216 (max= 1.6623), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:00:30,286 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.1104 (max= 1.5056), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:00:30,286 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.1104 (max= 1.5056), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:00:30,286 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.1104 (max= 1.5056), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:00:30,286 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.1104 (max= 1.5056), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:00:30,286 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.1104 (max= 1.5056), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:00:30,286 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.1104 (max= 1.5056), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:00:30,286 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.1104 (max= 1.5056), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:00:30,287 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.1104 (max= 1.5056), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:02,248 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1348 (max= 1.6102), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:02,248 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1348 (max= 1.6102), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:02,248 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1348 (max= 1.6102), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:02,248 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1348 (max= 1.6102), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:02,248 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1348 (max= 1.6102), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:02,248 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1348 (max= 1.6102), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:02,248 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1348 (max= 1.6102), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:02,248 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1348 (max= 1.6102), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:34,185 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1000 (max= 1.4289), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:34,185 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1000 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:34,185 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1000 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:34,185 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1000 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:34,186 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1000 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:34,186 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1000 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:34,186 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1000 (max= 1.4289), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:01:34,186 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1000 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:06,024 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.1217 (max= 1.6711), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:06,024 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.1217 (max= 1.6711), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:06,024 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.1217 (max= 1.6711), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:06,024 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.1217 (max= 1.6711), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:06,024 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.1217 (max= 1.6711), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:06,024 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.1217 (max= 1.6711), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:06,024 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.1217 (max= 1.6711), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:06,025 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.1217 (max= 1.6711), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:37,930 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.0756 (max= 1.6393), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:37,930 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.0756 (max= 1.6393), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:37,930 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.0756 (max= 1.6393), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:37,930 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.0756 (max= 1.6393), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:37,930 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.0756 (max= 1.6393), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:37,930 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.0756 (max= 1.6393), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:37,930 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.0756 (max= 1.6393), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:02:37,931 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.0756 (max= 1.6393), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:09,792 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1074 (max= 1.5977), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:09,792 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1074 (max= 1.5977), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:09,792 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1074 (max= 1.5977), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:09,792 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1074 (max= 1.5977), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:09,792 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1074 (max= 1.5977), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:09,792 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1074 (max= 1.5977), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:09,792 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1074 (max= 1.5977), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:09,792 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1074 (max= 1.5977), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:41,595 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.0892 (max= 1.5515), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:41,595 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.0892 (max= 1.5515), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:41,595 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.0892 (max= 1.5515), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:41,595 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.0892 (max= 1.5515), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:41,595 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.0892 (max= 1.5515), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:41,595 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.0892 (max= 1.5515), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:41,595 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.0892 (max= 1.5515), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:03:41,595 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.0892 (max= 1.5515), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:13,390 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.0765 (max= 1.4327), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:13,390 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.0765 (max= 1.4327), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:13,390 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.0765 (max= 1.4327), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:13,390 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.0765 (max= 1.4327), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:13,390 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.0765 (max= 1.4327), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:13,390 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.0765 (max= 1.4327), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:13,390 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.0765 (max= 1.4327), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:13,390 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.0765 (max= 1.4327), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:45,287 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1305 (max= 1.6498), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:45,287 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1305 (max= 1.6498), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:45,287 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1305 (max= 1.6498), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:45,287 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1305 (max= 1.6498), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:45,287 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1305 (max= 1.6498), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:45,287 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1305 (max= 1.6498), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:45,287 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1305 (max= 1.6498), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:04:45,287 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1305 (max= 1.6498), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:17,143 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.1189 (max= 1.5683), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:17,143 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.1189 (max= 1.5683), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:17,143 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.1189 (max= 1.5683), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:17,143 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.1189 (max= 1.5683), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:17,143 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.1189 (max= 1.5683), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:17,143 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.1189 (max= 1.5683), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:17,143 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.1189 (max= 1.5683), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:17,143 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.1189 (max= 1.5683), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:49,047 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.1202 (max= 1.7905), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:49,047 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.1202 (max= 1.7905), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:49,047 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.1202 (max= 1.7905), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:49,047 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.1202 (max= 1.7905), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:49,047 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.1202 (max= 1.7905), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:49,047 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.1202 (max= 1.7905), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:49,047 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.1202 (max= 1.7905), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:05:49,047 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.1202 (max= 1.7905), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:20,903 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.1148 (max= 1.6140), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:20,903 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.1148 (max= 1.6140), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:20,904 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.1148 (max= 1.6140), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:20,904 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.1148 (max= 1.6140), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:20,904 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.1148 (max= 1.6140), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:20,904 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.1148 (max= 1.6140), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:20,904 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.1148 (max= 1.6140), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:20,904 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.1148 (max= 1.6140), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:52,750 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.1080 (max= 1.6667), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:52,750 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.1080 (max= 1.6667), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:52,750 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.1080 (max= 1.6667), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:52,750 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.1080 (max= 1.6667), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:52,750 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.1080 (max= 1.6667), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:52,750 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.1080 (max= 1.6667), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:52,750 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.1080 (max= 1.6667), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:06:52,750 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.1080 (max= 1.6667), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:24,567 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.1081 (max= 1.5586), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:24,567 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.1081 (max= 1.5586), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:24,567 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.1081 (max= 1.5586), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:24,567 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.1081 (max= 1.5586), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:24,567 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.1081 (max= 1.5586), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:24,567 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.1081 (max= 1.5586), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:24,567 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.1081 (max= 1.5586), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:24,567 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.1081 (max= 1.5586), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:56,398 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.0998 (max= 1.4917), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:56,398 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.0998 (max= 1.4917), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:56,398 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.0998 (max= 1.4917), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:56,398 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.0998 (max= 1.4917), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:56,398 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.0998 (max= 1.4917), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:56,398 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.0998 (max= 1.4917), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:56,398 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.0998 (max= 1.4917), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:07:56,398 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.0998 (max= 1.4917), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:08:28,159 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.0987 (max= 1.5032), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:08:28,159 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.0987 (max= 1.5032), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:08:28,159 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.0987 (max= 1.5032), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:08:28,160 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.0987 (max= 1.5032), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:08:28,160 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.0987 (max= 1.5032), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:08:28,160 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.0987 (max= 1.5032), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:08:28,160 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.0987 (max= 1.5032), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:08:28,160 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.0987 (max= 1.5032), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:00,026 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.1089 (max= 1.5504), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:00,026 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.1089 (max= 1.5504), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:00,026 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.1089 (max= 1.5504), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:00,026 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.1089 (max= 1.5504), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:00,026 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.1089 (max= 1.5504), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:00,026 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.1089 (max= 1.5504), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:00,026 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.1089 (max= 1.5504), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:00,026 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.1089 (max= 1.5504), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:31,834 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.0804 (max= 1.3857), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:31,834 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.0804 (max= 1.3857), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:31,834 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.0804 (max= 1.3857), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:31,834 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.0804 (max= 1.3857), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:31,834 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.0804 (max= 1.3857), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:31,834 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.0804 (max= 1.3857), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:31,834 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.0804 (max= 1.3857), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:09:31,834 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.0804 (max= 1.3857), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:03,699 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1295 (max= 1.5437), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:03,699 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1295 (max= 1.5437), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:03,699 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1295 (max= 1.5437), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:03,699 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1295 (max= 1.5437), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:03,699 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1295 (max= 1.5437), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:03,699 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1295 (max= 1.5437), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:03,699 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1295 (max= 1.5437), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:03,699 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1295 (max= 1.5437), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:35,552 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.0929 (max= 1.4373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:35,552 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.0929 (max= 1.4373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:35,552 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.0929 (max= 1.4373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:35,552 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.0929 (max= 1.4373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:35,552 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.0929 (max= 1.4373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:35,552 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.0929 (max= 1.4373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:35,552 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.0929 (max= 1.4373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:10:35,552 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.0929 (max= 1.4373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:07,325 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.1110 (max= 1.5498), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:07,325 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.1110 (max= 1.5498), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:07,325 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.1110 (max= 1.5498), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:07,325 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.1110 (max= 1.5498), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:07,325 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.1110 (max= 1.5498), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:07,325 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.1110 (max= 1.5498), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:07,325 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.1110 (max= 1.5498), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:07,325 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.1110 (max= 1.5498), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:39,181 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.1104 (max= 1.6331), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:39,181 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.1104 (max= 1.6331), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:39,181 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.1104 (max= 1.6331), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:39,181 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.1104 (max= 1.6331), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:39,181 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.1104 (max= 1.6331), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:39,181 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.1104 (max= 1.6331), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:39,181 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.1104 (max= 1.6331), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:11:39,181 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.1104 (max= 1.6331), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:11,014 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.1127 (max= 1.6043), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:11,014 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.1127 (max= 1.6043), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:11,014 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.1127 (max= 1.6043), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:11,014 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.1127 (max= 1.6043), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:11,014 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.1127 (max= 1.6043), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:11,015 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.1127 (max= 1.6043), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:11,015 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.1127 (max= 1.6043), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:11,015 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.1127 (max= 1.6043), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:42,847 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.1251 (max= 1.5557), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:42,847 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.1251 (max= 1.5557), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:42,847 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.1251 (max= 1.5557), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:42,847 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.1251 (max= 1.5557), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:42,847 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.1251 (max= 1.5557), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:42,847 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.1251 (max= 1.5557), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:42,847 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.1251 (max= 1.5557), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:12:42,847 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.1251 (max= 1.5557), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:14,676 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.1151 (max= 1.5316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:14,676 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.1151 (max= 1.5316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:14,676 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.1151 (max= 1.5316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:14,676 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.1151 (max= 1.5316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:14,676 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.1151 (max= 1.5316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:14,676 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.1151 (max= 1.5316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:14,676 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.1151 (max= 1.5316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:14,676 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.1151 (max= 1.5316), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:46,559 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.1262 (max= 1.6150), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:46,559 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.1262 (max= 1.6150), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:46,559 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.1262 (max= 1.6150), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:46,559 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.1262 (max= 1.6150), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:46,559 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.1262 (max= 1.6150), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:46,559 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.1262 (max= 1.6150), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:46,559 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.1262 (max= 1.6150), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:13:46,559 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.1262 (max= 1.6150), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:18,396 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.0907 (max= 1.5192), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:18,396 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.0907 (max= 1.5192), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:18,396 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.0907 (max= 1.5192), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:18,396 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.0907 (max= 1.5192), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:18,396 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.0907 (max= 1.5192), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:18,396 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.0907 (max= 1.5192), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:18,396 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.0907 (max= 1.5192), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:18,396 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.0907 (max= 1.5192), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:50,299 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1122 (max= 1.6175), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:50,299 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1122 (max= 1.6175), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:50,299 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1122 (max= 1.6175), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:50,299 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1122 (max= 1.6175), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:50,299 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1122 (max= 1.6175), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:50,299 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1122 (max= 1.6175), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:50,299 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1122 (max= 1.6175), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:14:50,300 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1122 (max= 1.6175), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:22,206 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.1038 (max= 1.5477), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:22,206 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.1038 (max= 1.5477), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:22,206 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.1038 (max= 1.5477), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:22,206 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.1038 (max= 1.5477), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:22,206 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.1038 (max= 1.5477), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:22,207 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.1038 (max= 1.5477), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:22,207 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.1038 (max= 1.5477), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:22,207 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.1038 (max= 1.5477), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:54,051 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.0949 (max= 1.5619), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:54,051 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.0949 (max= 1.5619), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:54,051 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.0949 (max= 1.5619), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:54,051 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.0949 (max= 1.5619), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:54,051 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.0949 (max= 1.5619), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:54,051 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.0949 (max= 1.5619), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:54,051 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.0949 (max= 1.5619), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:15:54,052 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.0949 (max= 1.5619), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:25,879 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.1067 (max= 1.5401), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:25,879 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.1067 (max= 1.5401), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:25,879 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.1067 (max= 1.5401), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:25,879 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.1067 (max= 1.5401), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:25,879 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.1067 (max= 1.5401), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:25,879 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.1067 (max= 1.5401), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:25,879 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.1067 (max= 1.5401), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:25,879 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.1067 (max= 1.5401), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:57,779 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1235 (max= 1.8008), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:57,779 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1235 (max= 1.8008), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:57,779 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1235 (max= 1.8008), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:57,779 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1235 (max= 1.8008), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:57,779 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1235 (max= 1.8008), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:57,779 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1235 (max= 1.8008), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:57,779 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1235 (max= 1.8008), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:16:57,779 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1235 (max= 1.8008), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:17:29,672 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.1189 (max= 1.6908), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:17:29,672 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.1189 (max= 1.6908), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:17:29,672 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.1189 (max= 1.6908), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:17:29,672 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.1189 (max= 1.6908), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:17:29,672 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.1189 (max= 1.6908), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:17:29,672 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.1189 (max= 1.6908), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:17:29,672 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.1189 (max= 1.6908), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:17:29,672 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.1189 (max= 1.6908), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:01,496 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.0769 (max= 1.6424), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:01,496 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.0769 (max= 1.6424), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:01,496 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.0769 (max= 1.6424), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:01,496 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.0769 (max= 1.6424), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:01,496 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.0769 (max= 1.6424), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:01,496 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.0769 (max= 1.6424), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:01,496 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.0769 (max= 1.6424), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:01,496 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.0769 (max= 1.6424), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:33,370 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1077 (max= 1.5622), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:33,370 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1077 (max= 1.5622), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:33,370 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1077 (max= 1.5622), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:33,370 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1077 (max= 1.5622), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:33,370 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1077 (max= 1.5622), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:33,370 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1077 (max= 1.5622), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:33,370 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1077 (max= 1.5622), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:18:33,370 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1077 (max= 1.5622), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:05,195 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.0715 (max= 1.5131), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:05,195 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.0715 (max= 1.5131), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:05,195 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.0715 (max= 1.5131), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:05,196 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.0715 (max= 1.5131), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:05,196 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.0715 (max= 1.5131), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:05,196 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.0715 (max= 1.5131), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:05,196 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.0715 (max= 1.5131), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:05,196 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.0715 (max= 1.5131), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:37,045 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1057 (max= 1.4828), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:37,045 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1057 (max= 1.4828), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:37,045 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1057 (max= 1.4828), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:37,045 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1057 (max= 1.4828), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:37,045 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1057 (max= 1.4828), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:37,045 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1057 (max= 1.4828), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:37,045 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1057 (max= 1.4828), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:19:37,045 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1057 (max= 1.4828), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:08,850 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.0975 (max= 1.5573), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:08,850 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.0975 (max= 1.5573), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:08,850 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.0975 (max= 1.5573), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:08,850 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.0975 (max= 1.5573), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:08,850 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.0975 (max= 1.5573), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:08,850 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.0975 (max= 1.5573), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:08,850 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.0975 (max= 1.5573), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:08,850 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.0975 (max= 1.5573), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:40,636 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.0959 (max= 1.4735), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:40,636 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.0959 (max= 1.4735), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:40,636 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.0959 (max= 1.4735), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:40,636 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.0959 (max= 1.4735), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:40,636 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.0959 (max= 1.4735), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:40,636 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.0959 (max= 1.4735), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:40,636 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.0959 (max= 1.4735), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:20:40,636 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.0959 (max= 1.4735), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:12,493 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.0857 (max= 1.5687), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:12,493 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.0857 (max= 1.5687), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:12,493 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.0857 (max= 1.5687), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:12,493 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.0857 (max= 1.5687), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:12,493 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.0857 (max= 1.5687), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:12,493 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.0857 (max= 1.5687), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:12,493 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.0857 (max= 1.5687), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:12,493 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.0857 (max= 1.5687), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:44,422 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.0906 (max= 1.6029), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:44,422 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.0906 (max= 1.6029), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:44,422 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.0906 (max= 1.6029), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:44,422 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.0906 (max= 1.6029), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:44,422 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.0906 (max= 1.6029), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:44,422 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.0906 (max= 1.6029), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:44,422 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.0906 (max= 1.6029), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:21:44,422 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.0906 (max= 1.6029), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:16,278 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.0841 (max= 1.6251), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:16,278 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.0841 (max= 1.6251), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:16,278 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.0841 (max= 1.6251), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:16,278 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.0841 (max= 1.6251), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:16,278 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.0841 (max= 1.6251), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:16,278 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.0841 (max= 1.6251), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:16,278 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.0841 (max= 1.6251), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:16,278 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.0841 (max= 1.6251), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:48,121 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.0882 (max= 1.6040), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:48,121 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.0882 (max= 1.6040), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:48,121 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.0882 (max= 1.6040), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:48,121 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.0882 (max= 1.6040), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:48,121 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.0882 (max= 1.6040), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:48,121 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.0882 (max= 1.6040), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:48,121 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.0882 (max= 1.6040), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:22:48,121 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.0882 (max= 1.6040), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:19,954 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.0740 (max= 1.5273), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:19,954 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.0740 (max= 1.5273), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:19,954 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.0740 (max= 1.5273), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:19,954 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.0740 (max= 1.5273), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:19,954 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.0740 (max= 1.5273), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:19,954 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.0740 (max= 1.5273), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:19,954 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.0740 (max= 1.5273), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:19,954 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.0740 (max= 1.5273), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:33,303 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:7112796 +2025-10-26 00:23:51,825 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.1048 (max= 1.5109), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:51,825 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.1048 (max= 1.5109), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:51,825 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.1048 (max= 1.5109), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:51,825 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.1048 (max= 1.5109), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:51,825 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.1048 (max= 1.5109), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:51,825 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.1048 (max= 1.5109), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:51,825 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.1048 (max= 1.5109), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:23:51,825 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.1048 (max= 1.5109), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:23,581 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.1096 (max= 1.5937), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:23,581 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.1096 (max= 1.5937), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:23,581 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.1096 (max= 1.5937), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:23,581 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.1096 (max= 1.5937), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:23,581 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.1096 (max= 1.5937), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:23,581 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.1096 (max= 1.5937), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:23,581 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.1096 (max= 1.5937), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:23,581 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.1096 (max= 1.5937), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:55,433 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.1161 (max= 1.4889), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:55,433 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.1161 (max= 1.4889), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:55,433 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.1161 (max= 1.4889), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:55,433 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.1161 (max= 1.4889), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:55,433 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.1161 (max= 1.4889), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:55,433 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.1161 (max= 1.4889), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:55,433 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.1161 (max= 1.4889), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:24:55,433 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.1161 (max= 1.4889), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:27,210 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.0880 (max= 1.5777), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:27,210 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.0880 (max= 1.5777), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:27,210 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.0880 (max= 1.5777), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:27,210 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.0880 (max= 1.5777), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:27,210 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.0880 (max= 1.5777), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:27,210 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.0880 (max= 1.5777), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:27,210 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.0880 (max= 1.5777), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:27,210 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.0880 (max= 1.5777), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:59,031 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.0750 (max= 1.5831), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:59,031 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.0750 (max= 1.5831), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:59,031 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.0750 (max= 1.5831), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:59,031 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.0750 (max= 1.5831), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:59,031 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.0750 (max= 1.5831), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:59,031 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.0750 (max= 1.5831), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:59,031 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.0750 (max= 1.5831), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:25:59,032 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.0750 (max= 1.5831), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:26:30,887 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.0949 (max= 1.5740), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:26:30,887 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.0949 (max= 1.5740), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:26:30,887 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.0949 (max= 1.5740), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:26:30,887 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.0949 (max= 1.5740), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:26:30,887 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.0949 (max= 1.5740), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:26:30,887 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.0949 (max= 1.5740), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:26:30,887 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.0949 (max= 1.5740), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:26:30,887 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.0949 (max= 1.5740), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:02,703 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.0503 (max= 1.5775), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:02,703 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.0503 (max= 1.5775), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:02,703 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.0503 (max= 1.5775), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:02,703 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.0503 (max= 1.5775), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:02,703 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.0503 (max= 1.5775), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:02,703 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.0503 (max= 1.5775), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:02,703 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.0503 (max= 1.5775), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:02,703 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.0503 (max= 1.5775), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:34,559 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.1002 (max= 1.5394), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:34,559 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.1002 (max= 1.5394), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:34,559 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.1002 (max= 1.5394), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:34,559 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.1002 (max= 1.5394), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:34,559 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.1002 (max= 1.5394), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:34,559 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.1002 (max= 1.5394), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:34,559 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.1002 (max= 1.5394), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:27:34,560 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.1002 (max= 1.5394), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:06,394 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.1047 (max= 1.6232), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:06,394 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.1047 (max= 1.6232), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:06,394 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.1047 (max= 1.6232), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:06,394 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.1047 (max= 1.6232), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:06,394 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.1047 (max= 1.6232), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:06,394 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.1047 (max= 1.6232), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:06,394 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.1047 (max= 1.6232), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:06,394 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.1047 (max= 1.6232), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:38,258 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.1076 (max= 1.6227), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:38,258 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.1076 (max= 1.6227), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:38,258 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.1076 (max= 1.6227), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:38,258 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.1076 (max= 1.6227), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:38,258 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.1076 (max= 1.6227), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:38,258 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.1076 (max= 1.6227), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:38,259 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.1076 (max= 1.6227), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:28:38,259 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.1076 (max= 1.6227), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:10,033 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.0932 (max= 1.5043), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:10,033 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.0932 (max= 1.5043), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:10,033 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.0932 (max= 1.5043), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:10,033 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.0932 (max= 1.5043), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:10,033 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.0932 (max= 1.5043), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:10,033 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.0932 (max= 1.5043), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:10,033 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.0932 (max= 1.5043), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:10,033 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.0932 (max= 1.5043), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:41,903 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.1129 (max= 1.6317), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:41,903 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.1129 (max= 1.6317), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:41,903 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.1129 (max= 1.6317), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:41,903 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.1129 (max= 1.6317), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:41,903 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.1129 (max= 1.6317), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:41,903 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.1129 (max= 1.6317), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:41,903 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.1129 (max= 1.6317), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:29:41,903 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.1129 (max= 1.6317), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:13,746 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.1126 (max= 1.5455), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:13,746 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.1126 (max= 1.5455), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:13,746 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.1126 (max= 1.5455), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:13,746 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.1126 (max= 1.5455), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:13,746 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.1126 (max= 1.5455), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:13,746 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.1126 (max= 1.5455), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:13,746 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.1126 (max= 1.5455), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:13,746 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.1126 (max= 1.5455), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:45,563 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.0986 (max= 1.5769), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:45,563 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.0986 (max= 1.5769), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:45,563 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.0986 (max= 1.5769), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:45,563 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.0986 (max= 1.5769), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:45,563 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.0986 (max= 1.5769), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:45,564 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.0986 (max= 1.5769), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:45,564 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.0986 (max= 1.5769), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:30:45,564 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.0986 (max= 1.5769), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:17,453 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1179 (max= 1.6446), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:17,453 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1179 (max= 1.6446), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:17,453 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1179 (max= 1.6446), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:17,453 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1179 (max= 1.6446), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:17,453 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1179 (max= 1.6446), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:17,454 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1179 (max= 1.6446), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:17,454 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1179 (max= 1.6446), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:17,454 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1179 (max= 1.6446), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:49,324 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.1131 (max= 1.5091), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:49,324 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.1131 (max= 1.5091), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:49,324 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.1131 (max= 1.5091), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:49,324 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.1131 (max= 1.5091), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:49,324 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.1131 (max= 1.5091), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:49,324 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.1131 (max= 1.5091), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:49,324 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.1131 (max= 1.5091), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:31:49,324 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.1131 (max= 1.5091), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:21,141 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.1068 (max= 1.5936), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:21,141 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.1068 (max= 1.5936), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:21,141 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.1068 (max= 1.5936), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:21,141 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.1068 (max= 1.5936), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:21,141 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.1068 (max= 1.5936), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:21,141 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.1068 (max= 1.5936), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:21,141 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.1068 (max= 1.5936), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:21,141 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.1068 (max= 1.5936), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:52,930 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.0861 (max= 1.6380), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:52,930 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.0861 (max= 1.6380), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:52,930 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.0861 (max= 1.6380), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:52,930 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.0861 (max= 1.6380), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:52,930 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.0861 (max= 1.6380), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:52,930 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.0861 (max= 1.6380), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:52,930 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.0861 (max= 1.6380), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:32:52,930 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.0861 (max= 1.6380), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:24,801 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.0893 (max= 1.5589), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:24,801 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.0893 (max= 1.5589), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:24,801 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.0893 (max= 1.5589), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:24,801 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.0893 (max= 1.5589), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:24,801 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.0893 (max= 1.5589), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:24,801 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.0893 (max= 1.5589), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:24,801 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.0893 (max= 1.5589), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:24,801 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.0893 (max= 1.5589), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:56,702 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.0991 (max= 1.5214), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:56,702 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.0991 (max= 1.5214), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:56,702 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.0991 (max= 1.5214), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:56,703 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.0991 (max= 1.5214), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:56,703 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.0991 (max= 1.5214), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:56,703 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.0991 (max= 1.5214), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:56,703 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.0991 (max= 1.5214), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:33:56,703 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.0991 (max= 1.5214), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:34:28,581 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.0910 (max= 1.5074), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:34:28,581 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.0910 (max= 1.5074), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:34:28,581 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.0910 (max= 1.5074), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:34:28,581 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.0910 (max= 1.5074), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:34:28,581 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.0910 (max= 1.5074), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:34:28,581 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.0910 (max= 1.5074), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:34:28,581 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.0910 (max= 1.5074), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:34:28,581 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.0910 (max= 1.5074), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:00,417 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.0957 (max= 1.6296), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:00,417 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.0957 (max= 1.6296), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:00,417 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.0957 (max= 1.6296), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:00,417 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.0957 (max= 1.6296), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:00,417 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.0957 (max= 1.6296), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:00,417 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.0957 (max= 1.6296), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:00,418 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.0957 (max= 1.6296), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:00,418 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.0957 (max= 1.6296), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:32,299 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.0917 (max= 1.6634), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:32,299 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.0917 (max= 1.6634), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:32,299 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.0917 (max= 1.6634), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:32,299 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.0917 (max= 1.6634), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:32,299 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.0917 (max= 1.6634), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:32,299 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.0917 (max= 1.6634), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:32,299 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.0917 (max= 1.6634), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:35:32,300 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.0917 (max= 1.6634), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:04,172 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.1145 (max= 1.5958), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:04,172 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.1145 (max= 1.5958), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:04,173 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.1145 (max= 1.5958), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:04,173 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.1145 (max= 1.5958), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:04,173 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.1145 (max= 1.5958), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:04,173 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.1145 (max= 1.5958), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:04,173 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.1145 (max= 1.5958), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:04,173 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.1145 (max= 1.5958), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:35,975 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.0806 (max= 1.6208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:35,975 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.0806 (max= 1.6208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:35,975 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.0806 (max= 1.6208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:35,975 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.0806 (max= 1.6208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:35,975 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.0806 (max= 1.6208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:35,975 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.0806 (max= 1.6208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:35,975 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.0806 (max= 1.6208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:36:35,975 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.0806 (max= 1.6208), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:07,808 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.0923 (max= 1.5681), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:07,808 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.0923 (max= 1.5681), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:07,808 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.0923 (max= 1.5681), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:07,808 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.0923 (max= 1.5681), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:07,808 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.0923 (max= 1.5681), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:07,808 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.0923 (max= 1.5681), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:07,808 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.0923 (max= 1.5681), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:07,808 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.0923 (max= 1.5681), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:39,636 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.0793 (max= 1.5391), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:39,636 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.0793 (max= 1.5391), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:39,636 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.0793 (max= 1.5391), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:39,636 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.0793 (max= 1.5391), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:39,636 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.0793 (max= 1.5391), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:39,636 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.0793 (max= 1.5391), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:39,636 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.0793 (max= 1.5391), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:37:39,637 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.0793 (max= 1.5391), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-16000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-16000! Save time: 4.554891109466553 +2025-10-26 00:38:11,532 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.0971 (max= 1.5539), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:11,532 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.0971 (max= 1.5539), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:11,532 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.0971 (max= 1.5539), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:11,532 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-26 00:38:11,532 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-26 00:38:11,532 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 00:38:11,532 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 00:38:11,532 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-26 00:38:11,532 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 00:38:11,532 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.0971 (max= 1.5539), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:11,532 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.0971 (max= 1.5539), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:11,532 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-26 00:38:11,532 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-26 00:38:11,532 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 00:38:11,532 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 00:38:11,532 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.0971 (max= 1.5539), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:11,532 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.0971 (max= 1.5539), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:11,532 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.0971 (max= 1.5539), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:11,532 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-26 00:38:11,532 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 00:38:11,532 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-26 00:38:11,532 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 00:38:11,532 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-26 00:38:11,532 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 00:38:25,766 - root - INFO - Finished saving the checkpoint in 14.23 seconds +2025-10-26 00:38:25,774 - root - INFO - Finished saving the checkpoint in 14.24 seconds +2025-10-26 00:38:25,774 - root - INFO - Finished saving the checkpoint in 14.24 seconds +2025-10-26 00:38:25,775 - root - INFO - Finished saving the checkpoint in 14.24 seconds +2025-10-26 00:38:25,775 - root - INFO - Finished saving the checkpoint in 14.24 seconds +2025-10-26 00:38:25,775 - root - INFO - Finished saving the checkpoint in 14.24 seconds +2025-10-26 00:38:25,775 - root - INFO - Finished saving the checkpoint in 14.24 seconds +2025-10-26 00:38:25,776 - root - INFO - Finished saving the checkpoint in 14.24 seconds +2025-10-26 00:38:57,685 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.1052 (max= 1.5977), tps=14201, mfu=29.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:57,685 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.1052 (max= 1.5977), tps=14201, mfu=29.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:57,685 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.1052 (max= 1.5977), tps=14201, mfu=29.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:57,685 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.1052 (max= 1.5977), tps=14201, mfu=29.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:57,685 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.1052 (max= 1.5977), tps=14201, mfu=29.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:57,685 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.1052 (max= 1.5977), tps=14201, mfu=29.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:57,685 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.1052 (max= 1.5977), tps=14201, mfu=29.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:38:57,685 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.1052 (max= 1.5977), tps=14201, mfu=29.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:39:29,560 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1082 (max= 1.6169), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:39:29,560 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1082 (max= 1.6169), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:39:29,560 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1082 (max= 1.6169), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:39:29,560 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1082 (max= 1.6169), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:39:29,560 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1082 (max= 1.6169), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:39:29,561 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1082 (max= 1.6169), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:39:29,561 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1082 (max= 1.6169), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:39:29,561 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1082 (max= 1.6169), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:01,440 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.1011 (max= 1.5554), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:01,440 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.1011 (max= 1.5554), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:01,440 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.1011 (max= 1.5554), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:01,440 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.1011 (max= 1.5554), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:01,440 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.1011 (max= 1.5554), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:01,440 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.1011 (max= 1.5554), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:01,440 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.1011 (max= 1.5554), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:01,440 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.1011 (max= 1.5554), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:33,334 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.0764 (max= 1.5065), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:33,334 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.0764 (max= 1.5065), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:33,334 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.0764 (max= 1.5065), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:33,334 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.0764 (max= 1.5065), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:33,334 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.0764 (max= 1.5065), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:33,334 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.0764 (max= 1.5065), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:33,334 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.0764 (max= 1.5065), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:40:33,334 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.0764 (max= 1.5065), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:05,212 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.0863 (max= 1.4994), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:05,212 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.0863 (max= 1.4994), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:05,212 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.0863 (max= 1.4994), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:05,212 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.0863 (max= 1.4994), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:05,212 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.0863 (max= 1.4994), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:05,212 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.0863 (max= 1.4994), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:05,212 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.0863 (max= 1.4994), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:05,212 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.0863 (max= 1.4994), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:37,130 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.0919 (max= 1.6085), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:37,130 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.0919 (max= 1.6085), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:37,130 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.0919 (max= 1.6085), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:37,130 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.0919 (max= 1.6085), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:37,130 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.0919 (max= 1.6085), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:37,130 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.0919 (max= 1.6085), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:37,130 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.0919 (max= 1.6085), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:41:37,130 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.0919 (max= 1.6085), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:09,030 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.0850 (max= 1.5755), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:09,030 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.0850 (max= 1.5755), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:09,030 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.0850 (max= 1.5755), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:09,030 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.0850 (max= 1.5755), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:09,030 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.0850 (max= 1.5755), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:09,030 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.0850 (max= 1.5755), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:09,030 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.0850 (max= 1.5755), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:09,031 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.0850 (max= 1.5755), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:40,873 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.0915 (max= 1.9030), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:40,873 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.0915 (max= 1.9030), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:40,874 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.0915 (max= 1.9030), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:40,874 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.0915 (max= 1.9030), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:40,874 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.0915 (max= 1.9030), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:40,874 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.0915 (max= 1.9030), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:40,874 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.0915 (max= 1.9030), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:42:40,874 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.0915 (max= 1.9030), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:12,678 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.0809 (max= 1.7951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:12,678 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.0809 (max= 1.7951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:12,678 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.0809 (max= 1.7951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:12,678 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.0809 (max= 1.7951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:12,678 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.0809 (max= 1.7951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:12,678 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.0809 (max= 1.7951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:12,678 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.0809 (max= 1.7951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:12,678 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.0809 (max= 1.7951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:44,500 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.0692 (max= 1.6767), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:44,500 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.0692 (max= 1.6767), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:44,500 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.0692 (max= 1.6767), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:44,500 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.0692 (max= 1.6767), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:44,500 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.0692 (max= 1.6767), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:44,500 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.0692 (max= 1.6767), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:44,500 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.0692 (max= 1.6767), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:43:44,500 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.0692 (max= 1.6767), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:16,296 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.0813 (max= 1.6665), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:16,296 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.0813 (max= 1.6665), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:16,296 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.0813 (max= 1.6665), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:16,296 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.0813 (max= 1.6665), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:16,296 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.0813 (max= 1.6665), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:16,296 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.0813 (max= 1.6665), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:16,296 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.0813 (max= 1.6665), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:16,296 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.0813 (max= 1.6665), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:48,122 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.0704 (max= 1.6763), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:48,122 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.0704 (max= 1.6763), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:48,123 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.0704 (max= 1.6763), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:48,123 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.0704 (max= 1.6763), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:48,123 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.0704 (max= 1.6763), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:48,123 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.0704 (max= 1.6763), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:48,123 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.0704 (max= 1.6763), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:44:48,123 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.0704 (max= 1.6763), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:19,983 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.0978 (max= 1.7181), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:19,983 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.0978 (max= 1.7181), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:19,983 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.0978 (max= 1.7181), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:19,983 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.0978 (max= 1.7181), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:19,983 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.0978 (max= 1.7181), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:19,983 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.0978 (max= 1.7181), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:19,983 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.0978 (max= 1.7181), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:19,983 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.0978 (max= 1.7181), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:51,834 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.0755 (max= 1.7360), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:51,834 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.0755 (max= 1.7360), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:51,834 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.0755 (max= 1.7360), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:51,834 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.0755 (max= 1.7360), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:51,834 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.0755 (max= 1.7360), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:51,834 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.0755 (max= 1.7360), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:51,834 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.0755 (max= 1.7360), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:45:51,834 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.0755 (max= 1.7360), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:23,645 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.1166 (max= 1.6342), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:23,645 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.1166 (max= 1.6342), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:23,646 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.1166 (max= 1.6342), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:23,646 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.1166 (max= 1.6342), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:23,646 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.1166 (max= 1.6342), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:23,646 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.1166 (max= 1.6342), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:23,646 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.1166 (max= 1.6342), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:23,646 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.1166 (max= 1.6342), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:55,522 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.0971 (max= 1.7735), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:55,522 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.0971 (max= 1.7735), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:55,522 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.0971 (max= 1.7735), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:55,522 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.0971 (max= 1.7735), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:55,522 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.0971 (max= 1.7735), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:55,522 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.0971 (max= 1.7735), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:55,522 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.0971 (max= 1.7735), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:46:55,523 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.0971 (max= 1.7735), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:27,339 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.0988 (max= 1.7493), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:27,339 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.0988 (max= 1.7493), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:27,339 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.0988 (max= 1.7493), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:27,339 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.0988 (max= 1.7493), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:27,339 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.0988 (max= 1.7493), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:27,339 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.0988 (max= 1.7493), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:27,339 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.0988 (max= 1.7493), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:27,339 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.0988 (max= 1.7493), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:59,143 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.0771 (max= 1.9737), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:59,143 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.0771 (max= 1.9737), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:59,143 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.0771 (max= 1.9737), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:59,143 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.0771 (max= 1.9737), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:59,143 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.0771 (max= 1.9737), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:59,143 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.0771 (max= 1.9737), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:59,144 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.0771 (max= 1.9737), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:47:59,144 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.0771 (max= 1.9737), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:48:31,044 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.0870 (max= 1.6383), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:48:31,044 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.0870 (max= 1.6383), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:48:31,044 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.0870 (max= 1.6383), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:48:31,044 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.0870 (max= 1.6383), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:48:31,044 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.0870 (max= 1.6383), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:48:31,044 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.0870 (max= 1.6383), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:48:31,044 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.0870 (max= 1.6383), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:48:31,044 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.0870 (max= 1.6383), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:02,893 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.0894 (max= 1.7983), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:02,893 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.0894 (max= 1.7983), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:02,893 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.0894 (max= 1.7983), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:02,893 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.0894 (max= 1.7983), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:02,893 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.0894 (max= 1.7983), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:02,893 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.0894 (max= 1.7983), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:02,893 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.0894 (max= 1.7983), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:02,893 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.0894 (max= 1.7983), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:34,702 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.0761 (max= 1.5480), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:34,702 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.0761 (max= 1.5480), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:34,702 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.0761 (max= 1.5480), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:34,702 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.0761 (max= 1.5480), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:34,702 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.0761 (max= 1.5480), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:34,702 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.0761 (max= 1.5480), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:34,702 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.0761 (max= 1.5480), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:49:34,702 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.0761 (max= 1.5480), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:06,610 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.1240 (max= 1.5244), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:06,610 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.1240 (max= 1.5244), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:06,610 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.1240 (max= 1.5244), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:06,610 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.1240 (max= 1.5244), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:06,610 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.1240 (max= 1.5244), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:06,610 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.1240 (max= 1.5244), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:06,610 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.1240 (max= 1.5244), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:06,611 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.1240 (max= 1.5244), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:38,440 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.1070 (max= 1.5816), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:38,440 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.1070 (max= 1.5816), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:38,440 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.1070 (max= 1.5816), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:38,440 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.1070 (max= 1.5816), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:38,440 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.1070 (max= 1.5816), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:38,440 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.1070 (max= 1.5816), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:38,440 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.1070 (max= 1.5816), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:50:38,440 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.1070 (max= 1.5816), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:10,272 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1106 (max= 1.6128), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:10,272 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1106 (max= 1.6128), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:10,272 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1106 (max= 1.6128), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:10,273 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1106 (max= 1.6128), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:10,273 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1106 (max= 1.6128), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:10,273 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1106 (max= 1.6128), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:10,273 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1106 (max= 1.6128), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:10,273 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1106 (max= 1.6128), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:42,115 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.1146 (max= 1.5955), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:42,115 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.1146 (max= 1.5955), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:42,115 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.1146 (max= 1.5955), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:42,115 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.1146 (max= 1.5955), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:42,115 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.1146 (max= 1.5955), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:42,115 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.1146 (max= 1.5955), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:42,116 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.1146 (max= 1.5955), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:51:42,116 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.1146 (max= 1.5955), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:13,961 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1244 (max= 1.5754), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:13,961 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1244 (max= 1.5754), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:13,961 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1244 (max= 1.5754), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:13,961 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1244 (max= 1.5754), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:13,962 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1244 (max= 1.5754), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:13,962 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1244 (max= 1.5754), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:13,962 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1244 (max= 1.5754), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:13,962 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1244 (max= 1.5754), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:45,780 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.1003 (max= 1.6421), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:45,780 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.1003 (max= 1.6421), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:45,780 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.1003 (max= 1.6421), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:45,780 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.1003 (max= 1.6421), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:45,780 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.1003 (max= 1.6421), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:45,780 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.1003 (max= 1.6421), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:45,780 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.1003 (max= 1.6421), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:52:45,781 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.1003 (max= 1.6421), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:17,605 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.0949 (max= 1.7665), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:17,605 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.0949 (max= 1.7665), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:17,605 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.0949 (max= 1.7665), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:17,605 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.0949 (max= 1.7665), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:17,605 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.0949 (max= 1.7665), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:17,605 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.0949 (max= 1.7665), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:17,605 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.0949 (max= 1.7665), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:17,606 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.0949 (max= 1.7665), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:49,406 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.0787 (max= 1.5003), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:49,406 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.0787 (max= 1.5003), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:49,406 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.0787 (max= 1.5003), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:49,406 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.0787 (max= 1.5003), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:49,406 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.0787 (max= 1.5003), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:49,406 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.0787 (max= 1.5003), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:49,406 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.0787 (max= 1.5003), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:53:49,406 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.0787 (max= 1.5003), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:21,295 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.1079 (max= 1.6842), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:21,295 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.1079 (max= 1.6842), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:21,296 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.1079 (max= 1.6842), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:21,296 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.1079 (max= 1.6842), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:21,296 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.1079 (max= 1.6842), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:21,296 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.1079 (max= 1.6842), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:21,296 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.1079 (max= 1.6842), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:21,296 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.1079 (max= 1.6842), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:53,144 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.1278 (max= 1.7046), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:53,144 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.1278 (max= 1.7046), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:53,144 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.1278 (max= 1.7046), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:53,144 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.1278 (max= 1.7046), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:53,144 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.1278 (max= 1.7046), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:53,144 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.1278 (max= 1.7046), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:53,144 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.1278 (max= 1.7046), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:54:53,144 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.1278 (max= 1.7046), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:25,065 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.1177 (max= 1.6467), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:25,065 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.1177 (max= 1.6467), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:25,065 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.1177 (max= 1.6467), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:25,065 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.1177 (max= 1.6467), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:25,065 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.1177 (max= 1.6467), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:25,065 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.1177 (max= 1.6467), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:25,065 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.1177 (max= 1.6467), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:25,066 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.1177 (max= 1.6467), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:56,856 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.1345 (max= 1.5279), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:56,856 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.1345 (max= 1.5279), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:56,856 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.1345 (max= 1.5279), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:56,856 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.1345 (max= 1.5279), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:56,856 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.1345 (max= 1.5279), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:56,856 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.1345 (max= 1.5279), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:56,856 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.1345 (max= 1.5279), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:55:56,856 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.1345 (max= 1.5279), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:56:28,711 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.0967 (max= 1.4829), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:56:28,711 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.0967 (max= 1.4829), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:56:28,711 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.0967 (max= 1.4829), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:56:28,711 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.0967 (max= 1.4829), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:56:28,711 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.0967 (max= 1.4829), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:56:28,712 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.0967 (max= 1.4829), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:56:28,712 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.0967 (max= 1.4829), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:56:28,712 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.0967 (max= 1.4829), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:00,553 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.0962 (max= 1.6406), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:00,553 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.0962 (max= 1.6406), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:00,553 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.0962 (max= 1.6406), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:00,553 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.0962 (max= 1.6406), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:00,553 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.0962 (max= 1.6406), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:00,553 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.0962 (max= 1.6406), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:00,553 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.0962 (max= 1.6406), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:00,554 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.0962 (max= 1.6406), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:32,423 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.1045 (max= 1.7821), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:32,423 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.1045 (max= 1.7821), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:32,423 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.1045 (max= 1.7821), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:32,423 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.1045 (max= 1.7821), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:32,423 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.1045 (max= 1.7821), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:32,423 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.1045 (max= 1.7821), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:32,423 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.1045 (max= 1.7821), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:57:32,423 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.1045 (max= 1.7821), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:04,328 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1045 (max= 1.5482), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:04,328 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1045 (max= 1.5482), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:04,328 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1045 (max= 1.5482), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:04,328 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1045 (max= 1.5482), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:04,329 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1045 (max= 1.5482), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:04,329 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1045 (max= 1.5482), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:04,329 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1045 (max= 1.5482), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:04,329 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1045 (max= 1.5482), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:36,145 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.0954 (max= 1.5076), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:36,145 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.0954 (max= 1.5076), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:36,145 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.0954 (max= 1.5076), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:36,145 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.0954 (max= 1.5076), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:36,145 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.0954 (max= 1.5076), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:36,145 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.0954 (max= 1.5076), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:36,145 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.0954 (max= 1.5076), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:58:36,145 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.0954 (max= 1.5076), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:07,971 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.1116 (max= 1.4494), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:07,971 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.1116 (max= 1.4494), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:07,971 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.1116 (max= 1.4494), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:07,971 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.1116 (max= 1.4494), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:07,971 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.1116 (max= 1.4494), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:07,971 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.1116 (max= 1.4494), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:07,971 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.1116 (max= 1.4494), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:07,971 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.1116 (max= 1.4494), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:39,875 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1098 (max= 1.6176), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:39,875 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1098 (max= 1.6176), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:39,876 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1098 (max= 1.6176), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:39,876 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1098 (max= 1.6176), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:39,876 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1098 (max= 1.6176), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:39,876 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1098 (max= 1.6176), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:39,876 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1098 (max= 1.6176), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 00:59:39,876 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1098 (max= 1.6176), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:11,698 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.0831 (max= 1.5581), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:11,698 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.0831 (max= 1.5581), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:11,698 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.0831 (max= 1.5581), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:11,699 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.0831 (max= 1.5581), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:11,699 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.0831 (max= 1.5581), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:11,699 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.0831 (max= 1.5581), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:11,699 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.0831 (max= 1.5581), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:11,699 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.0831 (max= 1.5581), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:43,572 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.1366 (max= 1.5565), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:43,572 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.1366 (max= 1.5565), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:43,572 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.1366 (max= 1.5565), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:43,572 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.1366 (max= 1.5565), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:43,572 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.1366 (max= 1.5565), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:43,572 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.1366 (max= 1.5565), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:43,572 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.1366 (max= 1.5565), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:00:43,572 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.1366 (max= 1.5565), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:15,369 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.1239 (max= 2.0892), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:15,369 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.1239 (max= 2.0892), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:15,369 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.1239 (max= 2.0892), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:15,370 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.1239 (max= 2.0892), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:15,370 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.1239 (max= 2.0892), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:15,370 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.1239 (max= 2.0892), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:15,370 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.1239 (max= 2.0892), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:15,370 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.1239 (max= 2.0892), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:47,197 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.1064 (max= 1.5436), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:47,197 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.1064 (max= 1.5436), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:47,197 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.1064 (max= 1.5436), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:47,197 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.1064 (max= 1.5436), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:47,197 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.1064 (max= 1.5436), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:47,197 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.1064 (max= 1.5436), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:47,197 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.1064 (max= 1.5436), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:01:47,197 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.1064 (max= 1.5436), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:19,066 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1103 (max= 1.5633), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:19,066 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1103 (max= 1.5633), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:19,066 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1103 (max= 1.5633), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:19,066 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1103 (max= 1.5633), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:19,066 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1103 (max= 1.5633), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:19,066 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1103 (max= 1.5633), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:19,066 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1103 (max= 1.5633), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:19,066 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1103 (max= 1.5633), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:50,912 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1165 (max= 1.4987), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:50,912 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1165 (max= 1.4987), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:50,912 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1165 (max= 1.4987), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:50,912 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1165 (max= 1.4987), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:50,912 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1165 (max= 1.4987), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:50,912 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1165 (max= 1.4987), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:50,912 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1165 (max= 1.4987), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:02:50,912 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1165 (max= 1.4987), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:22,750 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.1242 (max= 1.5297), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:22,750 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.1242 (max= 1.5297), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:22,750 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.1242 (max= 1.5297), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:22,750 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.1242 (max= 1.5297), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:22,750 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.1242 (max= 1.5297), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:22,750 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.1242 (max= 1.5297), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:22,750 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.1242 (max= 1.5297), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:22,750 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.1242 (max= 1.5297), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:54,656 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.1034 (max= 1.7592), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:54,656 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.1034 (max= 1.7592), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:54,656 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.1034 (max= 1.7592), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:54,656 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.1034 (max= 1.7592), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:54,656 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.1034 (max= 1.7592), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:54,656 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.1034 (max= 1.7592), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:54,656 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.1034 (max= 1.7592), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:03:54,656 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.1034 (max= 1.7592), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:26,587 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1059 (max= 1.6104), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:26,587 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1059 (max= 1.6104), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:26,587 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1059 (max= 1.6104), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:26,587 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1059 (max= 1.6104), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:26,587 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1059 (max= 1.6104), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:26,587 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1059 (max= 1.6104), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:26,587 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1059 (max= 1.6104), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:26,587 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1059 (max= 1.6104), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:58,447 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.1137 (max= 1.7783), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:58,447 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.1137 (max= 1.7783), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:58,447 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.1137 (max= 1.7783), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:58,447 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.1137 (max= 1.7783), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:58,447 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.1137 (max= 1.7783), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:58,447 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.1137 (max= 1.7783), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:58,447 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.1137 (max= 1.7783), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:04:58,447 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.1137 (max= 1.7783), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:05:30,338 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.0896 (max= 1.4636), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:05:30,338 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.0896 (max= 1.4636), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:05:30,338 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.0896 (max= 1.4636), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:05:30,338 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.0896 (max= 1.4636), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:05:30,338 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.0896 (max= 1.4636), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:05:30,338 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.0896 (max= 1.4636), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:05:30,339 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.0896 (max= 1.4636), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:05:30,339 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.0896 (max= 1.4636), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:02,165 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1218 (max= 1.5865), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:02,166 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1218 (max= 1.5865), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:02,166 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1218 (max= 1.5865), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:02,166 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1218 (max= 1.5865), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:02,166 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1218 (max= 1.5865), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:02,166 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1218 (max= 1.5865), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:02,166 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1218 (max= 1.5865), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:02,166 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1218 (max= 1.5865), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:34,013 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.1234 (max= 1.4878), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:34,013 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.1234 (max= 1.4878), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:34,013 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.1234 (max= 1.4878), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:34,013 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.1234 (max= 1.4878), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:34,013 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.1234 (max= 1.4878), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:34,013 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.1234 (max= 1.4878), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:34,014 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.1234 (max= 1.4878), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:06:34,014 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.1234 (max= 1.4878), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:05,885 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1348 (max= 1.5672), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:05,885 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1348 (max= 1.5672), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:05,885 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1348 (max= 1.5672), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:05,885 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1348 (max= 1.5672), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:05,885 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1348 (max= 1.5672), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:05,885 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1348 (max= 1.5672), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:05,885 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1348 (max= 1.5672), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:05,885 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1348 (max= 1.5672), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:37,795 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.1230 (max= 1.5137), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:37,795 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.1230 (max= 1.5137), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:37,795 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.1230 (max= 1.5137), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:37,795 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.1230 (max= 1.5137), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:37,795 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.1230 (max= 1.5137), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:37,795 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.1230 (max= 1.5137), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:37,795 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.1230 (max= 1.5137), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:07:37,795 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.1230 (max= 1.5137), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:09,629 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1139 (max= 1.5202), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:09,629 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1139 (max= 1.5202), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:09,629 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1139 (max= 1.5202), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:09,629 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1139 (max= 1.5202), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:09,629 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1139 (max= 1.5202), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:09,629 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1139 (max= 1.5202), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:09,629 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1139 (max= 1.5202), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:09,629 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1139 (max= 1.5202), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:41,419 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.1169 (max= 1.5449), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:41,419 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.1169 (max= 1.5449), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:41,420 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.1169 (max= 1.5449), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:41,420 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.1169 (max= 1.5449), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:41,420 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.1169 (max= 1.5449), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:41,420 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.1169 (max= 1.5449), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:41,420 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.1169 (max= 1.5449), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:08:41,420 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.1169 (max= 1.5449), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:06,028 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:6375467 +2025-10-26 01:09:13,260 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.1326 (max= 1.5326), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:13,260 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.1326 (max= 1.5326), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:13,260 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.1326 (max= 1.5326), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:13,261 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.1326 (max= 1.5326), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:13,261 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.1326 (max= 1.5326), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:13,261 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.1326 (max= 1.5326), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:13,261 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.1326 (max= 1.5326), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:13,261 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.1326 (max= 1.5326), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:45,126 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1195 (max= 1.5270), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:45,126 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1195 (max= 1.5270), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:45,127 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1195 (max= 1.5270), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:45,127 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1195 (max= 1.5270), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:45,127 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1195 (max= 1.5270), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:45,127 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1195 (max= 1.5270), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:45,127 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1195 (max= 1.5270), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:09:45,127 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1195 (max= 1.5270), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:16,944 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.0774 (max= 1.5179), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:16,944 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.0774 (max= 1.5179), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:16,944 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.0774 (max= 1.5179), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:16,944 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.0774 (max= 1.5179), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:16,944 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.0774 (max= 1.5179), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:16,944 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.0774 (max= 1.5179), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:16,944 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.0774 (max= 1.5179), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:16,944 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.0774 (max= 1.5179), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:48,730 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.1170 (max= 1.5195), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:48,730 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.1170 (max= 1.5195), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:48,730 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.1170 (max= 1.5195), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:48,731 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.1170 (max= 1.5195), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:48,731 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.1170 (max= 1.5195), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:48,731 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.1170 (max= 1.5195), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:48,731 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.1170 (max= 1.5195), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:10:48,731 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.1170 (max= 1.5195), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:20,617 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.1107 (max= 1.4712), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:20,617 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.1107 (max= 1.4712), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:20,617 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.1107 (max= 1.4712), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:20,617 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.1107 (max= 1.4712), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:20,617 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.1107 (max= 1.4712), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:20,618 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.1107 (max= 1.4712), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:20,618 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.1107 (max= 1.4712), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:20,618 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.1107 (max= 1.4712), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:52,463 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.1250 (max= 1.5452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:52,464 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.1250 (max= 1.5452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:52,464 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.1250 (max= 1.5452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:52,464 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.1250 (max= 1.5452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:52,464 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.1250 (max= 1.5452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:52,464 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.1250 (max= 1.5452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:52,464 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.1250 (max= 1.5452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:11:52,464 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.1250 (max= 1.5452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:24,372 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.1227 (max= 1.5444), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:24,372 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.1227 (max= 1.5444), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:24,372 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.1227 (max= 1.5444), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:24,372 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.1227 (max= 1.5444), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:24,372 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.1227 (max= 1.5444), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:24,372 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.1227 (max= 1.5444), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:24,372 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.1227 (max= 1.5444), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:24,372 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.1227 (max= 1.5444), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:56,207 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1086 (max= 1.6586), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:56,207 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1086 (max= 1.6586), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:56,207 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1086 (max= 1.6586), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:56,208 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1086 (max= 1.6586), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:56,208 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1086 (max= 1.6586), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:56,208 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1086 (max= 1.6586), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:56,208 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1086 (max= 1.6586), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:12:56,208 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1086 (max= 1.6586), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:28,085 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1102 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:28,085 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1102 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:28,085 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1102 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:28,085 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1102 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:28,085 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1102 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:28,085 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1102 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:28,085 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1102 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:28,085 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1102 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:59,963 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1023 (max= 1.6188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:59,963 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1023 (max= 1.6188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:59,963 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1023 (max= 1.6188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:59,963 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1023 (max= 1.6188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:59,963 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1023 (max= 1.6188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:59,963 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1023 (max= 1.6188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:59,963 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1023 (max= 1.6188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:13:59,963 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1023 (max= 1.6188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:14:31,817 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1192 (max= 1.5569), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:14:31,817 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1192 (max= 1.5569), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:14:31,817 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1192 (max= 1.5569), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:14:31,817 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1192 (max= 1.5569), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:14:31,817 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1192 (max= 1.5569), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:14:31,817 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1192 (max= 1.5569), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:14:31,817 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1192 (max= 1.5569), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:14:31,817 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1192 (max= 1.5569), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:03,669 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.1301 (max= 1.8169), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:03,669 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.1301 (max= 1.8169), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:03,669 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.1301 (max= 1.8169), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:03,669 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.1301 (max= 1.8169), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:03,669 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.1301 (max= 1.8169), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:03,669 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.1301 (max= 1.8169), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:03,669 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.1301 (max= 1.8169), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:03,669 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.1301 (max= 1.8169), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:35,619 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.1295 (max= 1.9968), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:35,619 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.1295 (max= 1.9968), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:35,619 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.1295 (max= 1.9968), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:35,619 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.1295 (max= 1.9968), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:35,619 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.1295 (max= 1.9968), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:35,619 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.1295 (max= 1.9968), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:35,619 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.1295 (max= 1.9968), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:15:35,619 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.1295 (max= 1.9968), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:07,442 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.1375 (max= 1.5980), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:07,442 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.1375 (max= 1.5980), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:07,442 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.1375 (max= 1.5980), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:07,442 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.1375 (max= 1.5980), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:07,442 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.1375 (max= 1.5980), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:07,443 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.1375 (max= 1.5980), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:07,443 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.1375 (max= 1.5980), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:07,443 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.1375 (max= 1.5980), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:39,394 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.1061 (max= 1.6236), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:39,395 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.1061 (max= 1.6236), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:39,395 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.1061 (max= 1.6236), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:39,395 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.1061 (max= 1.6236), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:39,395 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.1061 (max= 1.6236), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:39,395 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.1061 (max= 1.6236), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:39,395 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.1061 (max= 1.6236), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:16:39,395 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.1061 (max= 1.6236), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:11,180 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.1260 (max= 1.5560), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:11,180 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.1260 (max= 1.5560), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:11,180 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.1260 (max= 1.5560), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:11,180 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.1260 (max= 1.5560), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:11,180 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.1260 (max= 1.5560), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:11,180 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.1260 (max= 1.5560), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:11,180 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.1260 (max= 1.5560), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:11,180 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.1260 (max= 1.5560), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:43,034 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1197 (max= 1.6343), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:43,034 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1197 (max= 1.6343), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:43,034 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1197 (max= 1.6343), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:43,034 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1197 (max= 1.6343), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:43,034 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1197 (max= 1.6343), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:43,034 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1197 (max= 1.6343), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:43,034 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1197 (max= 1.6343), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:17:43,035 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1197 (max= 1.6343), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:05,922 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:3395102 +2025-10-26 01:18:14,879 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.0978 (max= 1.5216), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:14,879 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.0978 (max= 1.5216), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:14,879 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.0978 (max= 1.5216), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:14,879 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.0978 (max= 1.5216), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:14,879 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.0978 (max= 1.5216), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:14,879 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.0978 (max= 1.5216), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:14,879 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.0978 (max= 1.5216), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:14,879 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.0978 (max= 1.5216), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:46,710 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1080 (max= 1.7003), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:46,710 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1080 (max= 1.7003), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:46,710 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1080 (max= 1.7003), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:46,710 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1080 (max= 1.7003), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:46,710 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1080 (max= 1.7003), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:46,710 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1080 (max= 1.7003), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:46,710 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1080 (max= 1.7003), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:18:46,710 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1080 (max= 1.7003), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:18,562 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1115 (max= 1.8843), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:18,562 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1115 (max= 1.8843), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:18,562 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1115 (max= 1.8843), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:18,562 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1115 (max= 1.8843), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:18,562 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1115 (max= 1.8843), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:18,562 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1115 (max= 1.8843), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:18,562 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1115 (max= 1.8843), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:18,562 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1115 (max= 1.8843), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:50,453 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.1073 (max= 1.5507), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:50,453 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.1073 (max= 1.5507), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:50,453 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.1073 (max= 1.5507), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:50,453 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.1073 (max= 1.5507), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:50,453 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.1073 (max= 1.5507), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:50,453 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.1073 (max= 1.5507), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:50,453 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.1073 (max= 1.5507), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:19:50,453 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.1073 (max= 1.5507), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:22,245 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.1327 (max= 1.5314), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:22,245 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.1327 (max= 1.5314), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:22,245 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.1327 (max= 1.5314), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:22,245 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.1327 (max= 1.5314), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:22,245 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.1327 (max= 1.5314), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:22,245 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.1327 (max= 1.5314), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:22,245 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.1327 (max= 1.5314), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:22,245 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.1327 (max= 1.5314), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:54,109 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.1206 (max= 1.5204), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:54,109 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.1206 (max= 1.5204), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:54,109 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.1206 (max= 1.5204), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:54,109 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.1206 (max= 1.5204), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:54,109 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.1206 (max= 1.5204), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:54,109 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.1206 (max= 1.5204), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:54,109 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.1206 (max= 1.5204), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:20:54,109 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.1206 (max= 1.5204), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:26,036 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.1114 (max= 1.6142), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:26,036 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.1114 (max= 1.6142), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:26,036 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.1114 (max= 1.6142), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:26,036 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.1114 (max= 1.6142), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:26,036 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.1114 (max= 1.6142), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:26,036 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.1114 (max= 1.6142), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:26,036 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.1114 (max= 1.6142), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:26,036 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.1114 (max= 1.6142), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:57,939 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.0976 (max= 1.5768), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:57,939 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.0976 (max= 1.5768), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:57,939 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.0976 (max= 1.5768), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:57,939 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.0976 (max= 1.5768), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:57,939 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.0976 (max= 1.5768), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:57,939 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.0976 (max= 1.5768), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:57,939 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.0976 (max= 1.5768), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:21:57,939 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.0976 (max= 1.5768), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:22:29,824 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.1138 (max= 1.5854), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:22:29,825 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.1138 (max= 1.5854), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:22:29,825 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.1138 (max= 1.5854), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:22:29,825 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.1138 (max= 1.5854), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:22:29,825 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.1138 (max= 1.5854), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:22:29,825 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.1138 (max= 1.5854), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:22:29,825 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.1138 (max= 1.5854), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:22:29,825 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.1138 (max= 1.5854), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:01,620 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.1065 (max= 1.8350), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:01,620 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.1065 (max= 1.8350), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:01,620 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.1065 (max= 1.8350), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:01,620 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.1065 (max= 1.8350), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:01,620 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.1065 (max= 1.8350), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:01,620 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.1065 (max= 1.8350), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:01,620 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.1065 (max= 1.8350), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:01,620 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.1065 (max= 1.8350), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:33,476 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.1045 (max= 1.6263), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:33,476 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.1045 (max= 1.6263), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:33,476 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.1045 (max= 1.6263), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:33,476 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.1045 (max= 1.6263), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:33,476 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.1045 (max= 1.6263), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:33,477 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.1045 (max= 1.6263), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:33,477 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.1045 (max= 1.6263), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:23:33,477 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.1045 (max= 1.6263), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:05,306 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.1092 (max= 1.5964), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:05,306 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.1092 (max= 1.5964), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:05,306 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.1092 (max= 1.5964), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:05,306 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.1092 (max= 1.5964), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:05,306 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.1092 (max= 1.5964), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:05,306 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.1092 (max= 1.5964), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:05,306 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.1092 (max= 1.5964), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:05,306 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.1092 (max= 1.5964), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:37,240 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.1289 (max= 1.6286), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:37,240 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.1289 (max= 1.6286), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:37,241 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.1289 (max= 1.6286), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:37,241 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.1289 (max= 1.6286), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:37,241 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.1289 (max= 1.6286), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:37,241 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.1289 (max= 1.6286), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:37,241 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.1289 (max= 1.6286), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:24:37,241 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.1289 (max= 1.6286), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:09,091 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.1355 (max= 1.5180), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:09,091 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.1355 (max= 1.5180), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:09,091 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.1355 (max= 1.5180), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:09,091 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.1355 (max= 1.5180), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:09,091 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.1355 (max= 1.5180), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:09,091 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.1355 (max= 1.5180), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:09,091 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.1355 (max= 1.5180), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:09,091 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.1355 (max= 1.5180), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:40,918 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.1030 (max= 1.6606), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:40,918 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.1030 (max= 1.6606), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:40,918 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.1030 (max= 1.6606), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:40,918 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.1030 (max= 1.6606), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:40,918 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.1030 (max= 1.6606), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:40,919 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.1030 (max= 1.6606), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:40,919 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.1030 (max= 1.6606), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:25:40,919 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.1030 (max= 1.6606), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:12,837 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.1358 (max= 1.5990), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:12,837 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.1358 (max= 1.5990), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:12,837 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.1358 (max= 1.5990), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:12,837 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.1358 (max= 1.5990), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:12,837 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.1358 (max= 1.5990), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:12,837 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.1358 (max= 1.5990), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:12,837 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.1358 (max= 1.5990), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:12,837 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.1358 (max= 1.5990), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:44,685 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1142 (max= 1.7083), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:44,685 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1142 (max= 1.7083), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:44,685 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1142 (max= 1.7083), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:44,685 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1142 (max= 1.7083), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:44,685 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1142 (max= 1.7083), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:44,685 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1142 (max= 1.7083), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:44,685 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1142 (max= 1.7083), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:26:44,685 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1142 (max= 1.7083), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:16,551 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.1222 (max= 1.5132), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:16,551 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.1222 (max= 1.5132), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:16,552 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.1222 (max= 1.5132), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:16,552 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.1222 (max= 1.5132), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:16,552 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.1222 (max= 1.5132), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:16,552 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.1222 (max= 1.5132), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:16,552 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.1222 (max= 1.5132), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:16,552 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.1222 (max= 1.5132), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:48,415 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.1097 (max= 1.8814), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:48,415 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.1097 (max= 1.8814), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:48,416 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.1097 (max= 1.8814), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:48,416 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.1097 (max= 1.8814), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:48,416 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.1097 (max= 1.8814), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:48,416 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.1097 (max= 1.8814), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:48,416 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.1097 (max= 1.8814), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:27:48,416 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.1097 (max= 1.8814), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:20,324 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.1058 (max= 1.7100), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:20,324 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.1058 (max= 1.7100), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:20,324 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.1058 (max= 1.7100), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:20,324 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.1058 (max= 1.7100), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:20,324 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.1058 (max= 1.7100), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:20,324 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.1058 (max= 1.7100), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:20,324 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.1058 (max= 1.7100), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:20,324 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.1058 (max= 1.7100), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:52,120 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.1128 (max= 1.5666), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:52,120 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.1128 (max= 1.5666), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:52,120 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.1128 (max= 1.5666), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:52,120 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.1128 (max= 1.5666), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:52,120 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.1128 (max= 1.5666), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:52,120 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.1128 (max= 1.5666), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:52,120 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.1128 (max= 1.5666), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:28:52,120 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.1128 (max= 1.5666), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:24,067 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.1082 (max= 1.6430), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:24,067 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.1082 (max= 1.6430), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:24,067 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.1082 (max= 1.6430), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:24,067 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.1082 (max= 1.6430), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:24,067 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.1082 (max= 1.6430), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:24,067 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.1082 (max= 1.6430), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:24,067 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.1082 (max= 1.6430), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:24,067 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.1082 (max= 1.6430), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:55,941 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.1187 (max= 1.5605), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:55,941 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.1187 (max= 1.5605), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:55,941 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.1187 (max= 1.5605), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:55,941 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.1187 (max= 1.5605), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:55,941 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.1187 (max= 1.5605), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:55,941 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.1187 (max= 1.5605), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:55,941 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.1187 (max= 1.5605), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:29:55,941 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.1187 (max= 1.5605), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:27,867 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.1250 (max= 1.7181), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:27,867 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.1250 (max= 1.7181), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:27,867 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.1250 (max= 1.7181), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:27,867 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.1250 (max= 1.7181), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:27,867 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.1250 (max= 1.7181), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:27,867 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.1250 (max= 1.7181), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:27,867 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.1250 (max= 1.7181), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:27,867 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.1250 (max= 1.7181), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:59,677 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1176 (max= 1.6319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:59,677 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1176 (max= 1.6319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:59,677 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1176 (max= 1.6319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:59,677 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1176 (max= 1.6319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:59,677 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1176 (max= 1.6319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:59,677 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1176 (max= 1.6319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:59,677 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1176 (max= 1.6319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:30:59,677 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1176 (max= 1.6319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-17000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-17000! Save time: 4.568701267242432 +2025-10-26 01:31:31,610 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.1119 (max= 1.5615), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:31:31,610 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-26 01:31:31,610 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 01:31:31,610 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.1119 (max= 1.5615), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:31:31,610 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.1119 (max= 1.5615), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:31:31,610 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.1119 (max= 1.5615), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:31:31,610 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-26 01:31:31,610 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.1119 (max= 1.5615), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:31:31,610 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 01:31:31,610 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.1119 (max= 1.5615), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:31:31,610 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-26 01:31:31,610 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-26 01:31:31,610 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 01:31:31,610 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-26 01:31:31,610 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.1119 (max= 1.5615), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:31:31,610 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 01:31:31,610 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 01:31:31,610 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-26 01:31:31,610 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.1119 (max= 1.5615), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:31:31,610 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 01:31:31,611 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-26 01:31:31,611 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 01:31:31,611 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-26 01:31:31,611 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 01:31:45,932 - root - INFO - Finished saving the checkpoint in 14.32 seconds +2025-10-26 01:31:45,939 - root - INFO - Finished saving the checkpoint in 14.33 seconds +2025-10-26 01:31:45,939 - root - INFO - Finished saving the checkpoint in 14.33 seconds +2025-10-26 01:31:45,939 - root - INFO - Finished saving the checkpoint in 14.33 seconds +2025-10-26 01:31:45,940 - root - INFO - Finished saving the checkpoint in 14.33 seconds +2025-10-26 01:31:45,940 - root - INFO - Finished saving the checkpoint in 14.33 seconds +2025-10-26 01:31:45,940 - root - INFO - Finished saving the checkpoint in 14.33 seconds +2025-10-26 01:31:45,941 - root - INFO - Finished saving the checkpoint in 14.33 seconds +2025-10-26 01:32:17,722 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.1058 (max= 1.6194), tps=14214, mfu=29.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:17,722 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.1058 (max= 1.6194), tps=14214, mfu=29.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:17,722 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.1058 (max= 1.6194), tps=14214, mfu=29.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:17,722 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.1058 (max= 1.6194), tps=14214, mfu=29.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:17,722 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.1058 (max= 1.6194), tps=14214, mfu=29.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:17,722 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.1058 (max= 1.6194), tps=14214, mfu=29.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:17,722 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.1058 (max= 1.6194), tps=14214, mfu=29.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:17,722 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.1058 (max= 1.6194), tps=14213, mfu=29.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:49,541 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1166 (max= 1.5139), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:49,541 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1166 (max= 1.5139), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:49,541 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1166 (max= 1.5139), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:49,541 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1166 (max= 1.5139), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:49,541 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1166 (max= 1.5139), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:49,541 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1166 (max= 1.5139), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:49,541 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1166 (max= 1.5139), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:32:49,541 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1166 (max= 1.5139), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:21,397 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.1301 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:21,397 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.1301 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:21,397 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.1301 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:21,397 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.1301 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:21,397 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.1301 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:21,397 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.1301 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:21,397 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.1301 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:21,398 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.1301 (max= 1.6197), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:53,265 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.1129 (max= 1.6058), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:53,265 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.1129 (max= 1.6058), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:53,265 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.1129 (max= 1.6058), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:53,266 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.1129 (max= 1.6058), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:53,266 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.1129 (max= 1.6058), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:53,266 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.1129 (max= 1.6058), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:53,266 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.1129 (max= 1.6058), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:33:53,266 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.1129 (max= 1.6058), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:25,092 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.1182 (max= 1.5247), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:25,092 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.1182 (max= 1.5247), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:25,092 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.1182 (max= 1.5247), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:25,092 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.1182 (max= 1.5247), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:25,093 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.1182 (max= 1.5247), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:25,093 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.1182 (max= 1.5247), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:25,093 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.1182 (max= 1.5247), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:25,093 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.1182 (max= 1.5247), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:56,957 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.1163 (max= 1.6298), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:56,957 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.1163 (max= 1.6298), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:56,957 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.1163 (max= 1.6298), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:56,957 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.1163 (max= 1.6298), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:56,957 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.1163 (max= 1.6298), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:56,957 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.1163 (max= 1.6298), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:56,957 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.1163 (max= 1.6298), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:34:56,957 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.1163 (max= 1.6298), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:35:28,819 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.1097 (max= 1.7727), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:35:28,819 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.1097 (max= 1.7727), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:35:28,819 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.1097 (max= 1.7727), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:35:28,819 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.1097 (max= 1.7727), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:35:28,819 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.1097 (max= 1.7727), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:35:28,819 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.1097 (max= 1.7727), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:35:28,819 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.1097 (max= 1.7727), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:35:28,819 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.1097 (max= 1.7727), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:00,684 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.0959 (max= 1.7825), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:00,684 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.0959 (max= 1.7825), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:00,684 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.0959 (max= 1.7825), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:00,684 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.0959 (max= 1.7825), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:00,684 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.0959 (max= 1.7825), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:00,684 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.0959 (max= 1.7825), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:00,684 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.0959 (max= 1.7825), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:00,685 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.0959 (max= 1.7825), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:32,534 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.1109 (max= 1.6583), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:32,534 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.1109 (max= 1.6583), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:32,534 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.1109 (max= 1.6583), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:32,534 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.1109 (max= 1.6583), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:32,534 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.1109 (max= 1.6583), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:32,534 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.1109 (max= 1.6583), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:32,535 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.1109 (max= 1.6583), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:36:32,535 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.1109 (max= 1.6583), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:04,355 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.0930 (max= 1.5193), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:04,355 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.0930 (max= 1.5193), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:04,355 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.0930 (max= 1.5193), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:04,355 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.0930 (max= 1.5193), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:04,355 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.0930 (max= 1.5193), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:04,355 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.0930 (max= 1.5193), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:04,355 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.0930 (max= 1.5193), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:04,355 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.0930 (max= 1.5193), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:36,147 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.1089 (max= 1.5916), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:36,147 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.1089 (max= 1.5916), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:36,147 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.1089 (max= 1.5916), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:36,147 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.1089 (max= 1.5916), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:36,147 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.1089 (max= 1.5916), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:36,148 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.1089 (max= 1.5916), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:36,148 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.1089 (max= 1.5916), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:37:36,148 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.1089 (max= 1.5916), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:07,954 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.1205 (max= 1.5460), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:07,954 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.1205 (max= 1.5460), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:07,954 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.1205 (max= 1.5460), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:07,954 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.1205 (max= 1.5460), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:07,954 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.1205 (max= 1.5460), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:07,954 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.1205 (max= 1.5460), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:07,954 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.1205 (max= 1.5460), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:07,954 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.1205 (max= 1.5460), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:39,760 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1243 (max= 1.6313), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:39,761 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1243 (max= 1.6313), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:39,761 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1243 (max= 1.6313), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:39,761 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1243 (max= 1.6313), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:39,761 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1243 (max= 1.6313), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:39,761 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1243 (max= 1.6313), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:39,761 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1243 (max= 1.6313), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:38:39,761 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1243 (max= 1.6313), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:11,670 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.1032 (max= 1.5116), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:11,670 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.1032 (max= 1.5116), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:11,670 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.1032 (max= 1.5116), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:11,670 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.1032 (max= 1.5116), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:11,670 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.1032 (max= 1.5116), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:11,670 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.1032 (max= 1.5116), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:11,670 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.1032 (max= 1.5116), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:11,670 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.1032 (max= 1.5116), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:43,518 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.1234 (max= 1.6296), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:43,518 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.1234 (max= 1.6296), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:43,518 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.1234 (max= 1.6296), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:43,518 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.1234 (max= 1.6296), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:43,518 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.1234 (max= 1.6296), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:43,518 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.1234 (max= 1.6296), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:43,518 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.1234 (max= 1.6296), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:39:43,518 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.1234 (max= 1.6296), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:15,331 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.0979 (max= 1.6450), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:15,331 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.0979 (max= 1.6450), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:15,331 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.0979 (max= 1.6450), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:15,331 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.0979 (max= 1.6450), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:15,331 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.0979 (max= 1.6450), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:15,331 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.0979 (max= 1.6450), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:15,332 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.0979 (max= 1.6450), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:15,332 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.0979 (max= 1.6450), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:47,146 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.1140 (max= 1.5476), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:47,146 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.1140 (max= 1.5476), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:47,147 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.1140 (max= 1.5476), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:47,147 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.1140 (max= 1.5476), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:47,147 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.1140 (max= 1.5476), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:47,147 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.1140 (max= 1.5476), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:47,147 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.1140 (max= 1.5476), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:40:47,147 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.1140 (max= 1.5476), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:19,052 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.0855 (max= 1.4552), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:19,052 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.0855 (max= 1.4552), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:19,052 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.0855 (max= 1.4552), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:19,052 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.0855 (max= 1.4552), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:19,052 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.0855 (max= 1.4552), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:19,052 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.0855 (max= 1.4552), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:19,052 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.0855 (max= 1.4552), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:19,052 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.0855 (max= 1.4552), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:50,902 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.1176 (max= 1.5836), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:50,902 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.1176 (max= 1.5836), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:50,902 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.1176 (max= 1.5836), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:50,902 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.1176 (max= 1.5836), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:50,902 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.1176 (max= 1.5836), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:50,902 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.1176 (max= 1.5836), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:50,902 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.1176 (max= 1.5836), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:41:50,902 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.1176 (max= 1.5836), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:22,737 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.1034 (max= 1.5648), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:22,737 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.1034 (max= 1.5648), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:22,737 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.1034 (max= 1.5648), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:22,737 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.1034 (max= 1.5648), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:22,737 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.1034 (max= 1.5648), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:22,737 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.1034 (max= 1.5648), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:22,737 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.1034 (max= 1.5648), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:22,737 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.1034 (max= 1.5648), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:54,596 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.1211 (max= 1.4950), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:54,596 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.1211 (max= 1.4950), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:54,596 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.1211 (max= 1.4950), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:54,596 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.1211 (max= 1.4950), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:54,596 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.1211 (max= 1.4950), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:54,596 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.1211 (max= 1.4950), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:54,596 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.1211 (max= 1.4950), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:42:54,596 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.1211 (max= 1.4950), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:26,480 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.1123 (max= 1.4995), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:26,480 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.1123 (max= 1.4995), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:26,480 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.1123 (max= 1.4995), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:26,480 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.1123 (max= 1.4995), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:26,480 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.1123 (max= 1.4995), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:26,480 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.1123 (max= 1.4995), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:26,480 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.1123 (max= 1.4995), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:26,480 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.1123 (max= 1.4995), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:58,306 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.1160 (max= 1.7322), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:58,306 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.1160 (max= 1.7322), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:58,306 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.1160 (max= 1.7322), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:58,306 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.1160 (max= 1.7322), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:58,306 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.1160 (max= 1.7322), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:58,306 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.1160 (max= 1.7322), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:58,307 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.1160 (max= 1.7322), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:43:58,307 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.1160 (max= 1.7322), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:44:30,125 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.0956 (max= 1.5101), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:44:30,125 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.0956 (max= 1.5101), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:44:30,125 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.0956 (max= 1.5101), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:44:30,125 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.0956 (max= 1.5101), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:44:30,125 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.0956 (max= 1.5101), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:44:30,125 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.0956 (max= 1.5101), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:44:30,125 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.0956 (max= 1.5101), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:44:30,125 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.0956 (max= 1.5101), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:02,118 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.1085 (max= 1.5704), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:02,118 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.1085 (max= 1.5704), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:02,118 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.1085 (max= 1.5704), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:02,118 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.1085 (max= 1.5704), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:02,118 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.1085 (max= 1.5704), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:02,118 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.1085 (max= 1.5704), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:02,118 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.1085 (max= 1.5704), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:02,118 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.1085 (max= 1.5704), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:33,886 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.0886 (max= 1.6533), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:33,886 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.0886 (max= 1.6533), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:33,886 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.0886 (max= 1.6533), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:33,886 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.0886 (max= 1.6533), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:33,886 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.0886 (max= 1.6533), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:33,886 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.0886 (max= 1.6533), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:33,886 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.0886 (max= 1.6533), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:45:33,886 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.0886 (max= 1.6533), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:05,765 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1016 (max= 1.5586), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:05,765 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1016 (max= 1.5586), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:05,765 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1016 (max= 1.5586), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:05,765 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1016 (max= 1.5586), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:05,765 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1016 (max= 1.5586), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:05,765 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1016 (max= 1.5586), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:05,765 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1016 (max= 1.5586), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:05,765 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1016 (max= 1.5586), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:37,519 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.0958 (max= 1.5840), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:37,520 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.0958 (max= 1.5840), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:37,520 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.0958 (max= 1.5840), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:37,520 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.0958 (max= 1.5840), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:37,520 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.0958 (max= 1.5840), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:37,520 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.0958 (max= 1.5840), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:37,520 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.0958 (max= 1.5840), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:46:37,520 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.0958 (max= 1.5840), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:47:09,311 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.0904 (max= 1.4940), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-26 01:47:09,311 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.0904 (max= 1.4940), tps=20617, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-26 01:47:09,311 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.0904 (max= 1.4940), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-26 01:47:09,311 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.0904 (max= 1.4940), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-26 01:47:09,312 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.0904 (max= 1.4940), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-26 01:47:09,312 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.0904 (max= 1.4940), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-26 01:47:09,312 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.0904 (max= 1.4940), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-26 01:47:09,312 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.0904 (max= 1.4940), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-26 01:47:41,162 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.0707 (max= 1.5240), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:47:41,162 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.0707 (max= 1.5240), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:47:41,162 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.0707 (max= 1.5240), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:47:41,162 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.0707 (max= 1.5240), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:47:41,162 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.0707 (max= 1.5240), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:47:41,162 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.0707 (max= 1.5240), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:47:41,162 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.0707 (max= 1.5240), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:47:41,162 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.0707 (max= 1.5240), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:13,036 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.0760 (max= 1.6961), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:13,036 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.0760 (max= 1.6961), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:13,036 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.0760 (max= 1.6961), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:13,036 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.0760 (max= 1.6961), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:13,036 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.0760 (max= 1.6961), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:13,036 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.0760 (max= 1.6961), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:13,036 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.0760 (max= 1.6961), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:13,036 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.0760 (max= 1.6961), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:44,860 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.0826 (max= 1.5821), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:44,860 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.0826 (max= 1.5821), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:44,860 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.0826 (max= 1.5821), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:44,860 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.0826 (max= 1.5821), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:44,860 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.0826 (max= 1.5821), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:44,860 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.0826 (max= 1.5821), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:44,860 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.0826 (max= 1.5821), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:48:44,860 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.0826 (max= 1.5821), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:16,717 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1037 (max= 1.7144), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:16,717 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1037 (max= 1.7144), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:16,717 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1037 (max= 1.7144), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:16,717 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1037 (max= 1.7144), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:16,717 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1037 (max= 1.7144), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:16,717 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1037 (max= 1.7144), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:16,717 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1037 (max= 1.7144), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:16,717 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1037 (max= 1.7144), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:48,612 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.0792 (max= 1.5271), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:48,612 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.0792 (max= 1.5271), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:48,612 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.0792 (max= 1.5271), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:48,612 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.0792 (max= 1.5271), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:48,612 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.0792 (max= 1.5271), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:48,612 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.0792 (max= 1.5271), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:48,612 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.0792 (max= 1.5271), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:49:48,613 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.0792 (max= 1.5271), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:20,415 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.0721 (max= 1.5215), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:20,415 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.0721 (max= 1.5215), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:20,415 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.0721 (max= 1.5215), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:20,415 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.0721 (max= 1.5215), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:20,415 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.0721 (max= 1.5215), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:20,415 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.0721 (max= 1.5215), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:20,415 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.0721 (max= 1.5215), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:20,415 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.0721 (max= 1.5215), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:52,308 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.0887 (max= 1.4553), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:52,308 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.0887 (max= 1.4553), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:52,308 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.0887 (max= 1.4553), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:52,308 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.0887 (max= 1.4553), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:52,308 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.0887 (max= 1.4553), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:52,308 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.0887 (max= 1.4553), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:52,308 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.0887 (max= 1.4553), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:50:52,308 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.0887 (max= 1.4553), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:24,158 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.0714 (max= 1.4933), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:24,158 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.0714 (max= 1.4933), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:24,158 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.0714 (max= 1.4933), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:24,158 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.0714 (max= 1.4933), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:24,158 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.0714 (max= 1.4933), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:24,158 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.0714 (max= 1.4933), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:24,158 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.0714 (max= 1.4933), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:24,158 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.0714 (max= 1.4933), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:55,958 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.0967 (max= 1.6442), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:55,958 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.0967 (max= 1.6442), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:55,958 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.0967 (max= 1.6442), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:55,958 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.0967 (max= 1.6442), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:55,958 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.0967 (max= 1.6442), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:55,958 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.0967 (max= 1.6442), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:55,958 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.0967 (max= 1.6442), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:51:55,958 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.0967 (max= 1.6442), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:27,733 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.0579 (max= 1.6226), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:27,734 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.0579 (max= 1.6226), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:27,734 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.0579 (max= 1.6226), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:27,734 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.0579 (max= 1.6226), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:27,734 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.0579 (max= 1.6226), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:27,734 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.0579 (max= 1.6226), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:27,734 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.0579 (max= 1.6226), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:27,734 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.0579 (max= 1.6226), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:59,568 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.0698 (max= 1.9088), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:59,568 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.0698 (max= 1.9088), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:59,568 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.0698 (max= 1.9088), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:59,568 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.0698 (max= 1.9088), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:59,568 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.0698 (max= 1.9088), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:59,569 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.0698 (max= 1.9088), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:59,569 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.0698 (max= 1.9088), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:52:59,569 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.0698 (max= 1.9088), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:53:31,356 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.0756 (max= 1.5133), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:53:31,356 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.0756 (max= 1.5133), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:53:31,356 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.0756 (max= 1.5133), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:53:31,356 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.0756 (max= 1.5133), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:53:31,356 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.0756 (max= 1.5133), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:53:31,356 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.0756 (max= 1.5133), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:53:31,356 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.0756 (max= 1.5133), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:53:31,356 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.0756 (max= 1.5133), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:03,250 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.0808 (max= 1.4804), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:03,250 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.0808 (max= 1.4804), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:03,250 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.0808 (max= 1.4804), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:03,250 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.0808 (max= 1.4804), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:03,250 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.0808 (max= 1.4804), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:03,250 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.0808 (max= 1.4804), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:03,250 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.0808 (max= 1.4804), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:03,251 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.0808 (max= 1.4804), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:35,097 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.0807 (max= 1.6065), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:35,098 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.0807 (max= 1.6065), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:35,098 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.0807 (max= 1.6065), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:35,098 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.0807 (max= 1.6065), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:35,098 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.0807 (max= 1.6065), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:35,098 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.0807 (max= 1.6065), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:35,098 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.0807 (max= 1.6065), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:54:35,098 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.0807 (max= 1.6065), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:06,907 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.1003 (max= 1.5436), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:06,907 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.1003 (max= 1.5436), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:06,907 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.1003 (max= 1.5436), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:06,907 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.1003 (max= 1.5436), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:06,907 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.1003 (max= 1.5436), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:06,907 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.1003 (max= 1.5436), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:06,907 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.1003 (max= 1.5436), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:06,907 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.1003 (max= 1.5436), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:38,721 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.0983 (max= 1.7864), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:38,721 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.0983 (max= 1.7864), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:38,721 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.0983 (max= 1.7864), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:38,721 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.0983 (max= 1.7864), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:38,721 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.0983 (max= 1.7864), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:38,721 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.0983 (max= 1.7864), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:38,721 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.0983 (max= 1.7864), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:55:38,721 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.0983 (max= 1.7864), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:10,520 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.0990 (max= 1.5079), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:10,520 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.0990 (max= 1.5079), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:10,520 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.0990 (max= 1.5079), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:10,520 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.0990 (max= 1.5079), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:10,521 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.0990 (max= 1.5079), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:10,520 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.0990 (max= 1.5079), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:10,521 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.0990 (max= 1.5079), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:10,521 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.0990 (max= 1.5079), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:42,369 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.0877 (max= 1.6044), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:42,369 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.0877 (max= 1.6044), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:42,369 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.0877 (max= 1.6044), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:42,369 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.0877 (max= 1.6044), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:42,369 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.0877 (max= 1.6044), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:42,369 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.0877 (max= 1.6044), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:42,370 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.0877 (max= 1.6044), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:56:42,370 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.0877 (max= 1.6044), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:14,275 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.0946 (max= 1.6277), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:14,275 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.0946 (max= 1.6277), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:14,275 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.0946 (max= 1.6277), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:14,275 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.0946 (max= 1.6277), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:14,275 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.0946 (max= 1.6277), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:14,275 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.0946 (max= 1.6277), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:14,275 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.0946 (max= 1.6277), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:14,276 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.0946 (max= 1.6277), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:46,148 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1159 (max= 1.6452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:46,149 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1159 (max= 1.6452), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:46,149 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1159 (max= 1.6452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:46,149 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1159 (max= 1.6452), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:46,149 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1159 (max= 1.6452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:46,149 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1159 (max= 1.6452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:46,149 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1159 (max= 1.6452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:46,149 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1159 (max= 1.6452), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:57:49,897 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:790296 +2025-10-26 01:58:17,992 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.0935 (max= 1.4853), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:17,992 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.0935 (max= 1.4853), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:17,992 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.0935 (max= 1.4853), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:17,992 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.0935 (max= 1.4853), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:17,992 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.0935 (max= 1.4853), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:17,992 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.0935 (max= 1.4853), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:17,992 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.0935 (max= 1.4853), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:17,992 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.0935 (max= 1.4853), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:49,795 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.0939 (max= 1.5264), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:49,795 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.0939 (max= 1.5264), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:49,795 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.0939 (max= 1.5264), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:49,795 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.0939 (max= 1.5264), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:49,795 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.0939 (max= 1.5264), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:49,795 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.0939 (max= 1.5264), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:49,795 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.0939 (max= 1.5264), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:58:49,795 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.0939 (max= 1.5264), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:21,663 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.0740 (max= 1.7682), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:21,663 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.0740 (max= 1.7682), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:21,663 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.0740 (max= 1.7682), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:21,663 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.0740 (max= 1.7682), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:21,663 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.0740 (max= 1.7682), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:21,663 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.0740 (max= 1.7682), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:21,663 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.0740 (max= 1.7682), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:21,663 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.0740 (max= 1.7682), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:53,493 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.0824 (max= 1.5040), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:53,493 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.0824 (max= 1.5040), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:53,493 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.0824 (max= 1.5040), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:53,493 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.0824 (max= 1.5040), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:53,493 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.0824 (max= 1.5040), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:53,493 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.0824 (max= 1.5040), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:53,493 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.0824 (max= 1.5040), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 01:59:53,493 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.0824 (max= 1.5040), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:25,352 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.0539 (max= 1.5508), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:25,353 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.0539 (max= 1.5508), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:25,353 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.0539 (max= 1.5508), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:25,353 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.0539 (max= 1.5508), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:25,353 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.0539 (max= 1.5508), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:25,353 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.0539 (max= 1.5508), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:25,353 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.0539 (max= 1.5508), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:25,353 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.0539 (max= 1.5508), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:57,197 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.0893 (max= 1.5097), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:57,197 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.0893 (max= 1.5097), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:57,197 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.0893 (max= 1.5097), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:57,197 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.0893 (max= 1.5097), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:57,197 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.0893 (max= 1.5097), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:57,197 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.0893 (max= 1.5097), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:57,197 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.0893 (max= 1.5097), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:57,197 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.0893 (max= 1.5097), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:29,142 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1108 (max= 1.5130), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:29,142 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1108 (max= 1.5130), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:29,142 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1108 (max= 1.5130), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:29,142 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1108 (max= 1.5130), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:29,143 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1108 (max= 1.5130), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:29,143 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1108 (max= 1.5130), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:29,143 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1108 (max= 1.5130), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:29,143 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1108 (max= 1.5130), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:00,975 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.1279 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:00,975 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.1279 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:00,975 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.1279 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:00,975 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.1279 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:00,975 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.1279 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:00,975 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.1279 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:00,976 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.1279 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:00,976 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.1279 (max= 1.6149), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:32,783 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.0862 (max= 1.4910), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:32,783 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.0862 (max= 1.4910), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:32,783 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.0862 (max= 1.4910), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:32,783 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.0862 (max= 1.4910), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:32,783 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.0862 (max= 1.4910), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:32,783 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.0862 (max= 1.4910), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:32,783 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.0862 (max= 1.4910), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:32,783 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.0862 (max= 1.4910), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:04,630 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.1211 (max= 1.5450), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:04,630 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.1211 (max= 1.5450), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:04,630 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.1211 (max= 1.5450), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:04,630 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.1211 (max= 1.5450), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:04,630 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.1211 (max= 1.5450), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:04,630 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.1211 (max= 1.5450), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:04,630 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.1211 (max= 1.5450), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:04,630 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.1211 (max= 1.5450), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:36,536 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.0679 (max= 1.4747), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:36,536 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.0679 (max= 1.4747), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:36,536 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.0679 (max= 1.4747), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:36,536 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.0679 (max= 1.4747), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:36,536 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.0679 (max= 1.4747), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:36,536 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.0679 (max= 1.4747), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:36,536 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.0679 (max= 1.4747), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:36,536 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.0679 (max= 1.4747), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:08,303 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.0935 (max= 1.6234), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:08,303 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.0935 (max= 1.6234), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:08,303 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.0935 (max= 1.6234), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:08,303 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.0935 (max= 1.6234), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:08,303 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.0935 (max= 1.6234), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:08,303 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.0935 (max= 1.6234), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:08,303 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.0935 (max= 1.6234), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:08,303 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.0935 (max= 1.6234), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:40,154 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.1019 (max= 1.6578), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:40,154 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.1019 (max= 1.6578), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:40,154 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.1019 (max= 1.6578), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:40,154 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.1019 (max= 1.6578), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:40,154 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.1019 (max= 1.6578), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:40,154 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.1019 (max= 1.6578), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:40,154 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.1019 (max= 1.6578), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:40,154 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.1019 (max= 1.6578), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:11,998 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.0723 (max= 1.6977), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:11,998 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.0723 (max= 1.6977), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:11,998 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.0723 (max= 1.6977), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:11,998 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.0723 (max= 1.6977), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:11,998 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.0723 (max= 1.6977), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:11,998 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.0723 (max= 1.6977), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:11,998 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.0723 (max= 1.6977), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:11,998 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.0723 (max= 1.6977), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:43,818 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.0605 (max= 1.5650), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:43,818 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.0605 (max= 1.5650), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:43,818 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.0605 (max= 1.5650), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:43,818 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.0605 (max= 1.5650), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:43,818 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.0605 (max= 1.5650), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:43,818 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.0605 (max= 1.5650), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:43,818 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.0605 (max= 1.5650), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:43,818 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.0605 (max= 1.5650), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:15,722 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.1044 (max= 1.6292), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:15,722 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.1044 (max= 1.6292), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:15,722 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.1044 (max= 1.6292), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:15,722 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.1044 (max= 1.6292), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:15,722 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.1044 (max= 1.6292), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:15,722 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.1044 (max= 1.6292), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:15,722 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.1044 (max= 1.6292), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:15,722 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.1044 (max= 1.6292), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:47,612 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.0750 (max= 1.5208), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:47,612 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.0750 (max= 1.5208), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:47,612 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.0750 (max= 1.5208), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:47,612 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.0750 (max= 1.5208), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:47,612 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.0750 (max= 1.5208), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:47,612 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.0750 (max= 1.5208), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:47,612 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.0750 (max= 1.5208), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:47,612 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.0750 (max= 1.5208), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:19,459 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.0848 (max= 1.5844), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:19,459 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.0848 (max= 1.5844), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:19,459 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.0848 (max= 1.5844), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:19,459 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.0848 (max= 1.5844), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:19,459 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.0848 (max= 1.5844), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:19,459 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.0848 (max= 1.5844), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:19,459 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.0848 (max= 1.5844), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:19,459 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.0848 (max= 1.5844), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:51,280 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.1127 (max= 1.6827), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:51,280 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.1127 (max= 1.6827), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:51,280 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.1127 (max= 1.6827), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:51,280 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.1127 (max= 1.6827), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:51,280 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.1127 (max= 1.6827), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:51,280 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.1127 (max= 1.6827), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:51,280 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.1127 (max= 1.6827), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:51,280 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.1127 (max= 1.6827), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:23,130 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1149 (max= 1.6000), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:23,130 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1149 (max= 1.6000), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:23,130 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1149 (max= 1.6000), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:23,130 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1149 (max= 1.6000), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:23,130 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1149 (max= 1.6000), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:23,130 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1149 (max= 1.6000), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:23,130 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1149 (max= 1.6000), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:23,130 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1149 (max= 1.6000), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:54,988 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.0821 (max= 1.7017), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:54,988 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.0821 (max= 1.7017), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:54,988 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.0821 (max= 1.7017), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:54,988 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.0821 (max= 1.7017), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:54,988 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.0821 (max= 1.7017), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:54,988 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.0821 (max= 1.7017), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:54,988 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.0821 (max= 1.7017), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:54,988 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.0821 (max= 1.7017), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:26,792 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.0923 (max= 1.4800), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:26,792 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.0923 (max= 1.4800), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:26,792 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.0923 (max= 1.4800), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:26,792 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.0923 (max= 1.4800), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:26,792 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.0923 (max= 1.4800), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:26,792 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.0923 (max= 1.4800), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:26,792 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.0923 (max= 1.4800), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:26,793 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.0923 (max= 1.4800), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:58,616 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.1059 (max= 1.5552), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:58,616 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.1059 (max= 1.5552), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:58,616 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.1059 (max= 1.5552), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:58,616 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.1059 (max= 1.5552), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:58,616 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.1059 (max= 1.5552), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:58,616 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.1059 (max= 1.5552), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:58,616 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.1059 (max= 1.5552), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:58,616 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.1059 (max= 1.5552), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:30,467 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.0897 (max= 1.5653), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:30,467 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.0897 (max= 1.5653), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:30,467 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.0897 (max= 1.5653), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:30,467 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.0897 (max= 1.5653), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:30,467 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.0897 (max= 1.5653), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:30,467 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.0897 (max= 1.5653), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:30,467 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.0897 (max= 1.5653), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:30,467 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.0897 (max= 1.5653), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:02,231 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.1097 (max= 1.5773), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:02,232 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.1097 (max= 1.5773), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:02,232 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.1097 (max= 1.5773), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:02,232 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.1097 (max= 1.5773), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:02,232 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.1097 (max= 1.5773), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:02,232 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.1097 (max= 1.5773), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:02,232 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.1097 (max= 1.5773), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:02,232 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.1097 (max= 1.5773), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:34,119 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.0930 (max= 1.5753), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:34,119 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.0930 (max= 1.5753), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:34,119 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.0930 (max= 1.5753), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:34,119 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.0930 (max= 1.5753), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:34,119 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.0930 (max= 1.5753), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:34,119 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.0930 (max= 1.5753), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:34,119 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.0930 (max= 1.5753), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:34,119 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.0930 (max= 1.5753), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:05,984 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.0844 (max= 1.7023), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:05,985 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.0844 (max= 1.7023), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:05,985 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.0844 (max= 1.7023), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:05,985 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.0844 (max= 1.7023), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:05,985 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.0844 (max= 1.7023), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:05,985 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.0844 (max= 1.7023), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:05,985 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.0844 (max= 1.7023), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:05,985 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.0844 (max= 1.7023), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:37,903 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.0891 (max= 1.5064), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:37,903 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.0891 (max= 1.5064), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:37,904 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.0891 (max= 1.5064), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:37,904 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.0891 (max= 1.5064), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:37,904 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.0891 (max= 1.5064), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:37,904 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.0891 (max= 1.5064), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:37,904 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.0891 (max= 1.5064), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:37,904 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.0891 (max= 1.5064), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:09,693 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.0832 (max= 1.6577), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:09,694 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.0832 (max= 1.6577), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:09,694 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.0832 (max= 1.6577), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:09,694 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.0832 (max= 1.6577), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:09,694 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.0832 (max= 1.6577), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:09,694 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.0832 (max= 1.6577), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:09,694 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.0832 (max= 1.6577), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:09,694 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.0832 (max= 1.6577), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:41,586 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.0863 (max= 1.4754), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:41,586 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.0863 (max= 1.4754), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:41,587 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.0863 (max= 1.4754), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:41,587 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.0863 (max= 1.4754), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:41,587 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.0863 (max= 1.4754), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:41,587 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.0863 (max= 1.4754), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:41,587 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.0863 (max= 1.4754), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:41,587 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.0863 (max= 1.4754), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:13,448 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.0702 (max= 1.4219), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:13,448 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.0702 (max= 1.4219), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:13,448 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.0702 (max= 1.4219), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:13,448 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.0702 (max= 1.4219), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:13,448 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.0702 (max= 1.4219), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:13,448 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.0702 (max= 1.4219), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:13,448 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.0702 (max= 1.4219), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:13,448 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.0702 (max= 1.4219), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:45,307 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.1090 (max= 1.5266), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:45,307 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.1090 (max= 1.5266), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:45,307 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.1090 (max= 1.5266), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:45,307 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.1090 (max= 1.5266), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:45,307 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.1090 (max= 1.5266), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:45,307 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.1090 (max= 1.5266), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:45,307 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.1090 (max= 1.5266), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:45,308 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.1090 (max= 1.5266), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:17,089 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.1013 (max= 1.5129), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:17,089 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.1013 (max= 1.5129), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:17,089 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.1013 (max= 1.5129), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:17,089 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.1013 (max= 1.5129), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:17,089 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.1013 (max= 1.5129), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:17,089 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.1013 (max= 1.5129), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:17,089 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.1013 (max= 1.5129), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:17,089 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.1013 (max= 1.5129), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:48,933 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.0929 (max= 1.5366), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:48,933 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.0929 (max= 1.5366), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:48,933 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.0929 (max= 1.5366), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:48,933 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.0929 (max= 1.5366), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:48,933 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.0929 (max= 1.5366), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:48,933 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.0929 (max= 1.5366), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:48,933 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.0929 (max= 1.5366), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:15:48,933 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.0929 (max= 1.5366), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:20,759 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.1011 (max= 1.5025), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:20,760 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.1011 (max= 1.5025), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:20,759 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.1011 (max= 1.5025), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:20,760 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.1011 (max= 1.5025), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:20,760 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.1011 (max= 1.5025), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:20,760 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.1011 (max= 1.5025), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:20,760 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.1011 (max= 1.5025), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:20,760 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.1011 (max= 1.5025), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:52,684 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.0931 (max= 1.4554), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:52,684 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.0931 (max= 1.4554), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:52,684 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.0931 (max= 1.4554), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:52,684 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.0931 (max= 1.4554), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:52,684 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.0931 (max= 1.4554), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:52,684 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.0931 (max= 1.4554), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:52,684 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.0931 (max= 1.4554), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:16:52,684 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.0931 (max= 1.4554), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:24,510 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.0902 (max= 1.5251), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:24,510 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.0902 (max= 1.5251), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:24,510 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.0902 (max= 1.5251), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:24,511 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.0902 (max= 1.5251), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:24,511 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.0902 (max= 1.5251), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:24,511 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.0902 (max= 1.5251), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:24,511 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.0902 (max= 1.5251), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:24,511 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.0902 (max= 1.5251), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:56,382 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1061 (max= 1.5063), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:56,382 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1061 (max= 1.5063), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:56,382 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1061 (max= 1.5063), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:56,382 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1061 (max= 1.5063), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:56,382 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1061 (max= 1.5063), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:56,382 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1061 (max= 1.5063), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:56,382 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1061 (max= 1.5063), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:17:56,382 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1061 (max= 1.5063), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:18:28,221 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.1079 (max= 1.5149), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:18:28,221 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.1079 (max= 1.5149), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:18:28,221 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.1079 (max= 1.5149), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:18:28,221 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.1079 (max= 1.5149), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:18:28,221 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.1079 (max= 1.5149), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:18:28,221 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.1079 (max= 1.5149), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:18:28,221 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.1079 (max= 1.5149), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:18:28,221 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.1079 (max= 1.5149), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:00,090 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.0825 (max= 1.4913), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:00,091 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.0825 (max= 1.4913), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:00,091 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.0825 (max= 1.4913), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:00,091 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.0825 (max= 1.4913), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:00,091 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.0825 (max= 1.4913), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:00,091 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.0825 (max= 1.4913), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:00,091 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.0825 (max= 1.4913), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:00,091 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.0825 (max= 1.4913), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:31,936 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.0785 (max= 1.5278), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:31,936 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.0785 (max= 1.5278), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:31,936 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.0785 (max= 1.5278), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:31,936 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.0785 (max= 1.5278), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:31,936 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.0785 (max= 1.5278), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:31,936 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.0785 (max= 1.5278), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:31,936 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.0785 (max= 1.5278), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:19:31,936 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.0785 (max= 1.5278), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:03,863 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.0943 (max= 1.4828), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:03,863 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.0943 (max= 1.4828), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:03,863 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.0943 (max= 1.4828), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:03,863 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.0943 (max= 1.4828), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:03,863 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.0943 (max= 1.4828), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:03,863 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.0943 (max= 1.4828), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:03,863 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.0943 (max= 1.4828), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:03,863 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.0943 (max= 1.4828), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:35,773 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.0453 (max= 1.6701), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:35,773 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.0453 (max= 1.6701), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:35,773 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.0453 (max= 1.6701), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:35,773 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.0453 (max= 1.6701), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:35,773 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.0453 (max= 1.6701), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:35,773 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.0453 (max= 1.6701), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:35,773 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.0453 (max= 1.6701), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:20:35,773 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.0453 (max= 1.6701), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:07,674 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.0891 (max= 1.5505), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:07,674 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.0891 (max= 1.5505), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:07,675 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.0891 (max= 1.5505), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:07,674 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.0891 (max= 1.5505), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:07,675 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.0891 (max= 1.5505), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:07,675 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.0891 (max= 1.5505), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:07,675 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.0891 (max= 1.5505), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:07,675 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.0891 (max= 1.5505), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:39,473 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.0992 (max= 1.6984), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:39,473 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.0992 (max= 1.6984), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:39,473 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.0992 (max= 1.6984), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:39,473 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.0992 (max= 1.6984), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:39,473 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.0992 (max= 1.6984), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:39,473 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.0992 (max= 1.6984), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:39,473 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.0992 (max= 1.6984), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:21:39,473 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.0992 (max= 1.6984), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:11,310 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.1097 (max= 1.5945), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:11,310 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.1097 (max= 1.5945), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:11,310 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.1097 (max= 1.5945), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:11,310 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.1097 (max= 1.5945), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:11,310 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.1097 (max= 1.5945), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:11,310 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.1097 (max= 1.5945), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:11,310 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.1097 (max= 1.5945), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:11,310 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.1097 (max= 1.5945), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:43,221 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.1081 (max= 1.4825), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:43,221 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.1081 (max= 1.4825), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:43,221 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.1081 (max= 1.4825), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:43,221 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.1081 (max= 1.4825), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:43,221 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.1081 (max= 1.4825), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:43,221 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.1081 (max= 1.4825), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:43,221 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.1081 (max= 1.4825), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:22:43,221 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.1081 (max= 1.4825), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:15,071 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.0798 (max= 1.5369), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:15,071 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.0798 (max= 1.5369), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:15,071 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.0798 (max= 1.5369), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:15,071 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.0798 (max= 1.5369), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:15,071 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.0798 (max= 1.5369), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:15,071 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.0798 (max= 1.5369), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:15,071 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.0798 (max= 1.5369), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:15,071 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.0798 (max= 1.5369), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:46,950 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.0748 (max= 1.5346), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:46,950 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.0748 (max= 1.5346), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:46,950 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.0748 (max= 1.5346), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:46,950 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.0748 (max= 1.5346), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:46,950 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.0748 (max= 1.5346), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:46,950 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.0748 (max= 1.5346), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:46,950 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.0748 (max= 1.5346), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:46,950 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.0748 (max= 1.5346), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:23:50,702 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:4964331 +2025-10-26 02:24:18,798 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.0773 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:18,798 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.0773 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:18,798 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.0773 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:18,798 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.0773 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:18,798 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.0773 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:18,798 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.0773 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:18,798 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.0773 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:18,798 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.0773 (max= 1.5550), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-pt/checkpoints/dataloader/step-18000 +Dataset successfully saved to jobs/munin-7b-open-pt/checkpoints/dataloader/step-18000! Save time: 4.54231071472168 +2025-10-26 02:24:50,769 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.0947 (max= 1.4805), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:50,769 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.0947 (max= 1.4805), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:50,769 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.0947 (max= 1.4805), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:50,769 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-26 02:24:50,769 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-26 02:24:50,769 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 02:24:50,769 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 02:24:50,769 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.0947 (max= 1.4805), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:50,769 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-26 02:24:50,769 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.0947 (max= 1.4805), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:50,769 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.0947 (max= 1.4805), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:50,769 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 02:24:50,769 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-26 02:24:50,769 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-26 02:24:50,769 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 02:24:50,769 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-26 02:24:50,769 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.0947 (max= 1.4805), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:50,769 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 02:24:50,769 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 02:24:50,769 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.0947 (max= 1.4805), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:24:50,769 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-26 02:24:50,769 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 02:24:50,769 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-26 02:24:50,769 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-26 02:25:06,957 - root - INFO - Finished saving the checkpoint in 16.19 seconds +2025-10-26 02:25:06,965 - root - INFO - Finished saving the checkpoint in 16.20 seconds +2025-10-26 02:25:06,965 - root - INFO - Finished saving the checkpoint in 16.20 seconds +2025-10-26 02:25:06,965 - root - INFO - Finished saving the checkpoint in 16.20 seconds +2025-10-26 02:25:06,966 - root - INFO - Finished saving the checkpoint in 16.20 seconds +2025-10-26 02:25:06,966 - root - INFO - Finished saving the checkpoint in 16.20 seconds +2025-10-26 02:25:06,966 - root - INFO - Finished saving the checkpoint in 16.20 seconds +2025-10-26 02:25:06,967 - root - INFO - Finished saving the checkpoint in 16.20 seconds +2025-10-26 02:25:38,862 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.0895 (max= 1.5950), tps=13628, mfu=28.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:25:38,862 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.0895 (max= 1.5950), tps=13628, mfu=28.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:25:38,862 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.0895 (max= 1.5950), tps=13628, mfu=28.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:25:38,862 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.0895 (max= 1.5950), tps=13628, mfu=28.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:25:38,862 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.0895 (max= 1.5950), tps=13628, mfu=28.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:25:38,862 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.0895 (max= 1.5950), tps=13628, mfu=28.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:25:38,862 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.0895 (max= 1.5950), tps=13628, mfu=28.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:25:38,862 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.0895 (max= 1.5950), tps=13628, mfu=28.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:01,762 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:3568038 +2025-10-26 02:26:06,723 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:1990523 +2025-10-26 02:26:10,754 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.1052 (max= 1.5747), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:10,754 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.1052 (max= 1.5747), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:10,754 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.1052 (max= 1.5747), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:10,754 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.1052 (max= 1.5747), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:10,754 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.1052 (max= 1.5747), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:10,754 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.1052 (max= 1.5747), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:10,754 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.1052 (max= 1.5747), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:10,754 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.1052 (max= 1.5747), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:42,629 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.0774 (max= 1.6478), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:42,629 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.0774 (max= 1.6478), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:42,629 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.0774 (max= 1.6478), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:42,629 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.0774 (max= 1.6478), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:42,630 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.0774 (max= 1.6478), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:42,630 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.0774 (max= 1.6478), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:42,630 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.0774 (max= 1.6478), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:26:42,630 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.0774 (max= 1.6478), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:14,495 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.0865 (max= 1.4229), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:14,495 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.0865 (max= 1.4229), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:14,495 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.0865 (max= 1.4229), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:14,495 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.0865 (max= 1.4229), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:14,495 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.0865 (max= 1.4229), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:14,495 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.0865 (max= 1.4229), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:14,495 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.0865 (max= 1.4229), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:14,495 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.0865 (max= 1.4229), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:46,323 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.1300 (max= 1.7123), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:46,323 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.1300 (max= 1.7123), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:46,323 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.1300 (max= 1.7123), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:46,323 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.1300 (max= 1.7123), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:46,323 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.1300 (max= 1.7123), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:46,323 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.1300 (max= 1.7123), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:46,323 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.1300 (max= 1.7123), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:27:46,323 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.1300 (max= 1.7123), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:18,174 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.0647 (max= 1.6610), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:18,174 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.0647 (max= 1.6610), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:18,174 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.0647 (max= 1.6610), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:18,174 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.0647 (max= 1.6610), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:18,174 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.0647 (max= 1.6610), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:18,174 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.0647 (max= 1.6610), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:18,174 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.0647 (max= 1.6610), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:18,174 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.0647 (max= 1.6610), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:50,112 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.0781 (max= 1.4335), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:50,112 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.0781 (max= 1.4335), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:50,112 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.0781 (max= 1.4335), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:50,112 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.0781 (max= 1.4335), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:50,112 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.0781 (max= 1.4335), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:50,112 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.0781 (max= 1.4335), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:50,112 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.0781 (max= 1.4335), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:28:50,113 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.0781 (max= 1.4335), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:21,975 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.0933 (max= 1.5010), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:21,975 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.0933 (max= 1.5010), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:21,975 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.0933 (max= 1.5010), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:21,975 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.0933 (max= 1.5010), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:21,975 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.0933 (max= 1.5010), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:21,975 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.0933 (max= 1.5010), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:21,975 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.0933 (max= 1.5010), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:21,975 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.0933 (max= 1.5010), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:53,796 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.0920 (max= 1.6962), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:53,796 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.0920 (max= 1.6962), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:53,796 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.0920 (max= 1.6962), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:53,796 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.0920 (max= 1.6962), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:53,796 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.0920 (max= 1.6962), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:53,796 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.0920 (max= 1.6962), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:53,796 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.0920 (max= 1.6962), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:29:53,796 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.0920 (max= 1.6962), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:25,563 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.0905 (max= 1.5127), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:25,563 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.0905 (max= 1.5127), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:25,563 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.0905 (max= 1.5127), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:25,563 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.0905 (max= 1.5127), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:25,563 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.0905 (max= 1.5127), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:25,563 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.0905 (max= 1.5127), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:25,563 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.0905 (max= 1.5127), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:25,563 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.0905 (max= 1.5127), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:57,348 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.1016 (max= 1.4396), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:57,348 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.1016 (max= 1.4396), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:57,348 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.1016 (max= 1.4396), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:57,348 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.1016 (max= 1.4396), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:57,348 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.1016 (max= 1.4396), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:57,348 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.1016 (max= 1.4396), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:57,348 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.1016 (max= 1.4396), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:30:57,348 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.1016 (max= 1.4396), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:31:29,303 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.0887 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:31:29,303 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.0887 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:31:29,303 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.0887 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:31:29,303 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.0887 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:31:29,303 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.0887 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:31:29,303 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.0887 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:31:29,303 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.0887 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:31:29,303 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.0887 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:01,114 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.1096 (max= 1.5856), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:01,114 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.1096 (max= 1.5856), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:01,114 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.1096 (max= 1.5856), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:01,114 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.1096 (max= 1.5856), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:01,114 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.1096 (max= 1.5856), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:01,114 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.1096 (max= 1.5856), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:01,114 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.1096 (max= 1.5856), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:01,114 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.1096 (max= 1.5856), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:32,905 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.1122 (max= 1.6376), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:32,906 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.1122 (max= 1.6376), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:32,906 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.1122 (max= 1.6376), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:32,906 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.1122 (max= 1.6376), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:32,906 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.1122 (max= 1.6376), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:32,906 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.1122 (max= 1.6376), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:32,906 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.1122 (max= 1.6376), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:32:32,906 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.1122 (max= 1.6376), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:04,704 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.1000 (max= 1.5680), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:04,704 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.1000 (max= 1.5680), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:04,704 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.1000 (max= 1.5680), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:04,704 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.1000 (max= 1.5680), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:04,704 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.1000 (max= 1.5680), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:04,704 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.1000 (max= 1.5680), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:04,704 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.1000 (max= 1.5680), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:04,704 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.1000 (max= 1.5680), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:36,525 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.1054 (max= 1.5471), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:36,525 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.1054 (max= 1.5471), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:36,525 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.1054 (max= 1.5471), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:36,525 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.1054 (max= 1.5471), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:36,525 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.1054 (max= 1.5471), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:36,526 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.1054 (max= 1.5471), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:36,526 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.1054 (max= 1.5471), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:33:36,526 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.1054 (max= 1.5471), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:08,340 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.0862 (max= 1.7071), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:08,340 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.0862 (max= 1.7071), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:08,340 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.0862 (max= 1.7071), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:08,340 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.0862 (max= 1.7071), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:08,340 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.0862 (max= 1.7071), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:08,340 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.0862 (max= 1.7071), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:08,340 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.0862 (max= 1.7071), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:08,340 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.0862 (max= 1.7071), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:40,260 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.0878 (max= 1.4615), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:40,260 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.0878 (max= 1.4615), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:40,260 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.0878 (max= 1.4615), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:40,260 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.0878 (max= 1.4615), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:40,260 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.0878 (max= 1.4615), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:40,260 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.0878 (max= 1.4615), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:40,260 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.0878 (max= 1.4615), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:34:40,260 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.0878 (max= 1.4615), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:12,169 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.0992 (max= 1.5623), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:12,169 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.0992 (max= 1.5623), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:12,169 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.0992 (max= 1.5623), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:12,169 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.0992 (max= 1.5623), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:12,169 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.0992 (max= 1.5623), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:12,169 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.0992 (max= 1.5623), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:12,169 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.0992 (max= 1.5623), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:12,169 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.0992 (max= 1.5623), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:43,996 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.0674 (max= 1.4452), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:43,996 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.0674 (max= 1.4452), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:43,996 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.0674 (max= 1.4452), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:43,996 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.0674 (max= 1.4452), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:43,996 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.0674 (max= 1.4452), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:43,996 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.0674 (max= 1.4452), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:43,996 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.0674 (max= 1.4452), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:35:43,996 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.0674 (max= 1.4452), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:15,832 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.1066 (max= 1.6218), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:15,832 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.1066 (max= 1.6218), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:15,832 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.1066 (max= 1.6218), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:15,832 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.1066 (max= 1.6218), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:15,832 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.1066 (max= 1.6218), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:15,832 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.1066 (max= 1.6218), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:15,832 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.1066 (max= 1.6218), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:15,832 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.1066 (max= 1.6218), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:47,631 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.0963 (max= 1.5756), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:47,631 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.0963 (max= 1.5756), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:47,631 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.0963 (max= 1.5756), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:47,631 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.0963 (max= 1.5756), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:47,631 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.0963 (max= 1.5756), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:47,631 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.0963 (max= 1.5756), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:47,631 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.0963 (max= 1.5756), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:36:47,632 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.0963 (max= 1.5756), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:19,465 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.1020 (max= 1.4865), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:19,465 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.1020 (max= 1.4865), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:19,465 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.1020 (max= 1.4865), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:19,465 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.1020 (max= 1.4865), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:19,465 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.1020 (max= 1.4865), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:19,465 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.1020 (max= 1.4865), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:19,465 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.1020 (max= 1.4865), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:19,465 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.1020 (max= 1.4865), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:51,357 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.1066 (max= 1.5741), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:51,358 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.1066 (max= 1.5741), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:51,358 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.1066 (max= 1.5741), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:51,358 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.1066 (max= 1.5741), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:51,358 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.1066 (max= 1.5741), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:51,358 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.1066 (max= 1.5741), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:51,358 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.1066 (max= 1.5741), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:37:51,358 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.1066 (max= 1.5741), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:23,195 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.1103 (max= 1.4529), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:23,195 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.1103 (max= 1.4529), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:23,195 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.1103 (max= 1.4529), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:23,195 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.1103 (max= 1.4529), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:23,195 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.1103 (max= 1.4529), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:23,195 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.1103 (max= 1.4529), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:23,195 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.1103 (max= 1.4529), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:23,195 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.1103 (max= 1.4529), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:28,725 - root - INFO - ParquetDataset: entering epoch 1 +2025-10-26 02:38:55,051 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.1045 (max= 1.6331), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:55,052 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.1045 (max= 1.6331), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:55,052 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.1045 (max= 1.6331), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:55,052 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.1045 (max= 1.6331), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:55,052 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.1045 (max= 1.6331), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:55,052 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.1045 (max= 1.6331), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:55,052 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.1045 (max= 1.6331), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:38:55,052 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.1045 (max= 1.6331), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:26,893 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.0858 (max= 1.6103), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:26,893 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.0858 (max= 1.6103), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:26,893 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.0858 (max= 1.6103), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:26,893 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.0858 (max= 1.6103), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:26,893 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.0858 (max= 1.6103), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:26,893 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.0858 (max= 1.6103), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:26,894 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.0858 (max= 1.6103), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:26,894 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.0858 (max= 1.6103), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:58,722 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.0958 (max= 1.5843), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:58,722 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.0958 (max= 1.5843), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:58,722 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.0958 (max= 1.5843), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:58,722 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.0958 (max= 1.5843), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:58,722 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.0958 (max= 1.5843), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:58,722 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.0958 (max= 1.5843), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:58,722 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.0958 (max= 1.5843), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:39:58,722 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.0958 (max= 1.5843), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:40:30,660 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.1013 (max= 1.5886), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:40:30,660 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.1013 (max= 1.5886), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:40:30,660 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.1013 (max= 1.5886), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:40:30,660 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.1013 (max= 1.5886), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:40:30,660 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.1013 (max= 1.5886), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:40:30,660 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.1013 (max= 1.5886), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:40:30,660 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.1013 (max= 1.5886), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:40:30,660 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.1013 (max= 1.5886), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:02,548 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.0801 (max= 1.4746), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:02,548 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.0801 (max= 1.4746), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:02,548 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.0801 (max= 1.4746), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:02,548 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.0801 (max= 1.4746), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:02,548 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.0801 (max= 1.4746), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:02,548 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.0801 (max= 1.4746), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:02,548 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.0801 (max= 1.4746), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:02,548 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.0801 (max= 1.4746), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:34,415 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.0864 (max= 1.5930), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:34,415 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.0864 (max= 1.5930), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:34,415 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.0864 (max= 1.5930), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:34,415 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.0864 (max= 1.5930), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:34,415 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.0864 (max= 1.5930), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:34,415 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.0864 (max= 1.5930), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:34,415 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.0864 (max= 1.5930), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:41:34,415 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.0864 (max= 1.5930), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:06,273 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.1017 (max= 1.6608), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:06,273 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.1017 (max= 1.6608), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:06,273 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.1017 (max= 1.6608), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:06,273 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.1017 (max= 1.6608), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:06,273 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.1017 (max= 1.6608), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:06,273 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.1017 (max= 1.6608), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:06,273 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.1017 (max= 1.6608), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:06,273 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.1017 (max= 1.6608), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:38,107 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.1014 (max= 1.8735), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:38,107 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.1014 (max= 1.8735), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:38,107 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.1014 (max= 1.8735), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:38,107 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.1014 (max= 1.8735), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:38,107 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.1014 (max= 1.8735), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:38,107 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.1014 (max= 1.8735), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:38,107 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.1014 (max= 1.8735), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:42:38,107 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.1014 (max= 1.8735), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:09,996 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.1036 (max= 1.4947), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:09,996 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.1036 (max= 1.4947), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:09,996 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.1036 (max= 1.4947), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:09,996 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.1036 (max= 1.4947), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:09,996 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.1036 (max= 1.4947), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:09,996 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.1036 (max= 1.4947), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:09,996 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.1036 (max= 1.4947), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:09,996 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.1036 (max= 1.4947), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:41,842 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.1062 (max= 1.5139), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:41,843 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.1062 (max= 1.5139), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:41,843 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.1062 (max= 1.5139), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:41,843 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.1062 (max= 1.5139), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:41,843 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.1062 (max= 1.5139), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:41,843 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.1062 (max= 1.5139), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:41,843 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.1062 (max= 1.5139), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:43:41,843 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.1062 (max= 1.5139), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:13,734 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.0913 (max= 1.5440), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:13,734 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.0913 (max= 1.5440), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:13,734 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.0913 (max= 1.5440), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:13,734 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.0913 (max= 1.5440), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:13,734 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.0913 (max= 1.5440), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:13,734 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.0913 (max= 1.5440), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:13,734 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.0913 (max= 1.5440), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:13,734 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.0913 (max= 1.5440), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:45,584 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.0943 (max= 1.7301), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:45,584 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.0943 (max= 1.7301), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:45,584 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.0943 (max= 1.7301), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:45,584 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.0943 (max= 1.7301), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:45,584 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.0943 (max= 1.7301), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:45,584 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.0943 (max= 1.7301), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:45,584 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.0943 (max= 1.7301), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:44:45,584 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.0943 (max= 1.7301), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:17,471 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.0963 (max= 1.4921), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:17,471 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.0963 (max= 1.4921), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:17,471 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.0963 (max= 1.4921), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:17,471 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.0963 (max= 1.4921), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:17,471 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.0963 (max= 1.4921), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:17,471 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.0963 (max= 1.4921), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:17,471 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.0963 (max= 1.4921), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:17,472 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.0963 (max= 1.4921), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:49,308 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.0991 (max= 1.5572), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:49,308 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.0991 (max= 1.5572), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:49,308 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.0991 (max= 1.5572), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:49,308 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.0991 (max= 1.5572), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:49,308 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.0991 (max= 1.5572), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:49,308 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.0991 (max= 1.5572), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:49,308 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.0991 (max= 1.5572), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:45:49,308 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.0991 (max= 1.5572), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:21,194 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.0888 (max= 1.5249), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:21,194 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.0888 (max= 1.5249), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:21,194 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.0888 (max= 1.5249), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:21,194 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.0888 (max= 1.5249), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:21,194 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.0888 (max= 1.5249), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:21,194 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.0888 (max= 1.5249), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:21,195 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.0888 (max= 1.5249), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:21,195 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.0888 (max= 1.5249), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:52,992 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.0980 (max= 1.5627), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:52,992 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.0980 (max= 1.5627), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:52,992 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.0980 (max= 1.5627), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:52,992 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.0980 (max= 1.5627), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:52,992 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.0980 (max= 1.5627), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:52,992 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.0980 (max= 1.5627), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:52,992 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.0980 (max= 1.5627), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:46:52,992 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.0980 (max= 1.5627), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:24,783 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.1128 (max= 1.5246), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:24,783 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.1128 (max= 1.5246), tps=20617, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:24,783 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.1128 (max= 1.5246), tps=20617, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:24,783 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.1128 (max= 1.5246), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:24,783 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.1128 (max= 1.5246), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:24,783 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.1128 (max= 1.5246), tps=20617, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:24,783 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.1128 (max= 1.5246), tps=20617, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:24,783 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.1128 (max= 1.5246), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:56,664 - root - INFO - Step 18430: lr=9.11E-06, loss= 1.1062 (max= 1.5437), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:56,664 - root - INFO - Step 18430: lr=9.11E-06, loss= 1.1062 (max= 1.5437), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:56,664 - root - INFO - Step 18430: lr=9.11E-06, loss= 1.1062 (max= 1.5437), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:56,664 - root - INFO - Step 18430: lr=9.11E-06, loss= 1.1062 (max= 1.5437), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:56,664 - root - INFO - Step 18430: lr=9.11E-06, loss= 1.1062 (max= 1.5437), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:56,664 - root - INFO - Step 18430: lr=9.11E-06, loss= 1.1062 (max= 1.5437), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:56,664 - root - INFO - Step 18430: lr=9.11E-06, loss= 1.1062 (max= 1.5437), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:47:56,664 - root - INFO - Step 18430: lr=9.11E-06, loss= 1.1062 (max= 1.5437), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:48:28,537 - root - INFO - Step 18440: lr=8.33E-06, loss= 1.0882 (max= 1.4305), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:48:28,537 - root - INFO - Step 18440: lr=8.33E-06, loss= 1.0882 (max= 1.4305), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:48:28,537 - root - INFO - Step 18440: lr=8.33E-06, loss= 1.0882 (max= 1.4305), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:48:28,537 - root - INFO - Step 18440: lr=8.33E-06, loss= 1.0882 (max= 1.4305), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:48:28,537 - root - INFO - Step 18440: lr=8.33E-06, loss= 1.0882 (max= 1.4305), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:48:28,537 - root - INFO - Step 18440: lr=8.33E-06, loss= 1.0882 (max= 1.4305), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:48:28,537 - root - INFO - Step 18440: lr=8.33E-06, loss= 1.0882 (max= 1.4305), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:48:28,537 - root - INFO - Step 18440: lr=8.33E-06, loss= 1.0882 (max= 1.4305), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:00,444 - root - INFO - Step 18450: lr=7.81E-06, loss= 1.0982 (max= 1.5210), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:00,444 - root - INFO - Step 18450: lr=7.81E-06, loss= 1.0982 (max= 1.5210), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:00,444 - root - INFO - Step 18450: lr=7.81E-06, loss= 1.0982 (max= 1.5210), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:00,444 - root - INFO - Step 18450: lr=7.81E-06, loss= 1.0982 (max= 1.5210), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:00,444 - root - INFO - Step 18450: lr=7.81E-06, loss= 1.0982 (max= 1.5210), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:00,444 - root - INFO - Step 18450: lr=7.81E-06, loss= 1.0982 (max= 1.5210), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:00,444 - root - INFO - Step 18450: lr=7.81E-06, loss= 1.0982 (max= 1.5210), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:00,445 - root - INFO - Step 18450: lr=7.81E-06, loss= 1.0982 (max= 1.5210), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:32,271 - root - INFO - Step 18460: lr=7.39E-06, loss= 1.0966 (max= 1.4511), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:32,272 - root - INFO - Step 18460: lr=7.39E-06, loss= 1.0966 (max= 1.4511), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:32,272 - root - INFO - Step 18460: lr=7.39E-06, loss= 1.0966 (max= 1.4511), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:32,272 - root - INFO - Step 18460: lr=7.39E-06, loss= 1.0966 (max= 1.4511), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:32,272 - root - INFO - Step 18460: lr=7.39E-06, loss= 1.0966 (max= 1.4511), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:32,272 - root - INFO - Step 18460: lr=7.39E-06, loss= 1.0966 (max= 1.4511), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:32,272 - root - INFO - Step 18460: lr=7.39E-06, loss= 1.0966 (max= 1.4511), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:49:32,272 - root - INFO - Step 18460: lr=7.39E-06, loss= 1.0966 (max= 1.4511), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:04,117 - root - INFO - Step 18470: lr=7.03E-06, loss= 1.0843 (max= 1.5042), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:04,117 - root - INFO - Step 18470: lr=7.03E-06, loss= 1.0843 (max= 1.5042), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:04,117 - root - INFO - Step 18470: lr=7.03E-06, loss= 1.0843 (max= 1.5042), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:04,117 - root - INFO - Step 18470: lr=7.03E-06, loss= 1.0843 (max= 1.5042), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:04,117 - root - INFO - Step 18470: lr=7.03E-06, loss= 1.0843 (max= 1.5042), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:04,117 - root - INFO - Step 18470: lr=7.03E-06, loss= 1.0843 (max= 1.5042), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:04,117 - root - INFO - Step 18470: lr=7.03E-06, loss= 1.0843 (max= 1.5042), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:04,117 - root - INFO - Step 18470: lr=7.03E-06, loss= 1.0843 (max= 1.5042), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:35,941 - root - INFO - Step 18480: lr=6.71E-06, loss= 1.0902 (max= 1.5931), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:35,941 - root - INFO - Step 18480: lr=6.71E-06, loss= 1.0902 (max= 1.5931), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:35,941 - root - INFO - Step 18480: lr=6.71E-06, loss= 1.0902 (max= 1.5931), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:35,941 - root - INFO - Step 18480: lr=6.71E-06, loss= 1.0902 (max= 1.5931), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:35,941 - root - INFO - Step 18480: lr=6.71E-06, loss= 1.0902 (max= 1.5931), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:35,941 - root - INFO - Step 18480: lr=6.71E-06, loss= 1.0902 (max= 1.5931), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:35,941 - root - INFO - Step 18480: lr=6.71E-06, loss= 1.0902 (max= 1.5931), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:50:35,941 - root - INFO - Step 18480: lr=6.71E-06, loss= 1.0902 (max= 1.5931), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:07,804 - root - INFO - Step 18490: lr=6.42E-06, loss= 1.0876 (max= 1.4769), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:07,804 - root - INFO - Step 18490: lr=6.42E-06, loss= 1.0876 (max= 1.4769), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:07,804 - root - INFO - Step 18490: lr=6.42E-06, loss= 1.0876 (max= 1.4769), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:07,804 - root - INFO - Step 18490: lr=6.42E-06, loss= 1.0876 (max= 1.4769), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:07,804 - root - INFO - Step 18490: lr=6.42E-06, loss= 1.0876 (max= 1.4769), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:07,804 - root - INFO - Step 18490: lr=6.42E-06, loss= 1.0876 (max= 1.4769), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:07,804 - root - INFO - Step 18490: lr=6.42E-06, loss= 1.0876 (max= 1.4769), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:07,804 - root - INFO - Step 18490: lr=6.42E-06, loss= 1.0876 (max= 1.4769), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:39,656 - root - INFO - Step 18500: lr=6.15E-06, loss= 1.0979 (max= 1.6839), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:39,656 - root - INFO - Step 18500: lr=6.15E-06, loss= 1.0979 (max= 1.6839), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:39,656 - root - INFO - Step 18500: lr=6.15E-06, loss= 1.0979 (max= 1.6839), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:39,657 - root - INFO - Step 18500: lr=6.15E-06, loss= 1.0979 (max= 1.6839), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:39,657 - root - INFO - Step 18500: lr=6.15E-06, loss= 1.0979 (max= 1.6839), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:39,657 - root - INFO - Step 18500: lr=6.15E-06, loss= 1.0979 (max= 1.6839), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:39,657 - root - INFO - Step 18500: lr=6.15E-06, loss= 1.0979 (max= 1.6839), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:51:39,657 - root - INFO - Step 18500: lr=6.15E-06, loss= 1.0979 (max= 1.6839), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:11,456 - root - INFO - Step 18510: lr=5.90E-06, loss= 1.1099 (max= 1.6073), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:11,456 - root - INFO - Step 18510: lr=5.90E-06, loss= 1.1099 (max= 1.6073), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:11,456 - root - INFO - Step 18510: lr=5.90E-06, loss= 1.1099 (max= 1.6073), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:11,456 - root - INFO - Step 18510: lr=5.90E-06, loss= 1.1099 (max= 1.6073), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:11,456 - root - INFO - Step 18510: lr=5.90E-06, loss= 1.1099 (max= 1.6073), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:11,456 - root - INFO - Step 18510: lr=5.90E-06, loss= 1.1099 (max= 1.6073), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:11,456 - root - INFO - Step 18510: lr=5.90E-06, loss= 1.1099 (max= 1.6073), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:11,456 - root - INFO - Step 18510: lr=5.90E-06, loss= 1.1099 (max= 1.6073), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:43,333 - root - INFO - Step 18520: lr=5.66E-06, loss= 1.0972 (max= 1.5762), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:43,333 - root - INFO - Step 18520: lr=5.66E-06, loss= 1.0972 (max= 1.5762), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:43,333 - root - INFO - Step 18520: lr=5.66E-06, loss= 1.0972 (max= 1.5762), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:43,334 - root - INFO - Step 18520: lr=5.66E-06, loss= 1.0972 (max= 1.5762), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:43,334 - root - INFO - Step 18520: lr=5.66E-06, loss= 1.0972 (max= 1.5762), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:43,334 - root - INFO - Step 18520: lr=5.66E-06, loss= 1.0972 (max= 1.5762), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:43,334 - root - INFO - Step 18520: lr=5.66E-06, loss= 1.0972 (max= 1.5762), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:52:43,334 - root - INFO - Step 18520: lr=5.66E-06, loss= 1.0972 (max= 1.5762), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:15,197 - root - INFO - Step 18530: lr=5.44E-06, loss= 1.1192 (max= 1.7278), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:15,197 - root - INFO - Step 18530: lr=5.44E-06, loss= 1.1192 (max= 1.7278), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:15,197 - root - INFO - Step 18530: lr=5.44E-06, loss= 1.1192 (max= 1.7278), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:15,197 - root - INFO - Step 18530: lr=5.44E-06, loss= 1.1192 (max= 1.7278), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:15,197 - root - INFO - Step 18530: lr=5.44E-06, loss= 1.1192 (max= 1.7278), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:15,197 - root - INFO - Step 18530: lr=5.44E-06, loss= 1.1192 (max= 1.7278), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:15,197 - root - INFO - Step 18530: lr=5.44E-06, loss= 1.1192 (max= 1.7278), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:15,197 - root - INFO - Step 18530: lr=5.44E-06, loss= 1.1192 (max= 1.7278), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:46,986 - root - INFO - Step 18540: lr=5.23E-06, loss= 1.0877 (max= 1.5348), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:46,986 - root - INFO - Step 18540: lr=5.23E-06, loss= 1.0877 (max= 1.5348), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:46,986 - root - INFO - Step 18540: lr=5.23E-06, loss= 1.0877 (max= 1.5348), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:46,986 - root - INFO - Step 18540: lr=5.23E-06, loss= 1.0877 (max= 1.5348), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:46,986 - root - INFO - Step 18540: lr=5.23E-06, loss= 1.0877 (max= 1.5348), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:46,986 - root - INFO - Step 18540: lr=5.23E-06, loss= 1.0877 (max= 1.5348), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:46,986 - root - INFO - Step 18540: lr=5.23E-06, loss= 1.0877 (max= 1.5348), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:53:46,986 - root - INFO - Step 18540: lr=5.23E-06, loss= 1.0877 (max= 1.5348), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:18,824 - root - INFO - Step 18550: lr=5.02E-06, loss= 1.0729 (max= 1.5127), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:18,824 - root - INFO - Step 18550: lr=5.02E-06, loss= 1.0729 (max= 1.5127), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:18,824 - root - INFO - Step 18550: lr=5.02E-06, loss= 1.0729 (max= 1.5127), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:18,824 - root - INFO - Step 18550: lr=5.02E-06, loss= 1.0729 (max= 1.5127), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:18,824 - root - INFO - Step 18550: lr=5.02E-06, loss= 1.0729 (max= 1.5127), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:18,824 - root - INFO - Step 18550: lr=5.02E-06, loss= 1.0729 (max= 1.5127), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:18,824 - root - INFO - Step 18550: lr=5.02E-06, loss= 1.0729 (max= 1.5127), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:18,824 - root - INFO - Step 18550: lr=5.02E-06, loss= 1.0729 (max= 1.5127), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:50,744 - root - INFO - Step 18560: lr=4.82E-06, loss= 1.0843 (max= 1.4263), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:50,744 - root - INFO - Step 18560: lr=4.82E-06, loss= 1.0843 (max= 1.4263), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:50,744 - root - INFO - Step 18560: lr=4.82E-06, loss= 1.0843 (max= 1.4263), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:50,744 - root - INFO - Step 18560: lr=4.82E-06, loss= 1.0843 (max= 1.4263), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:50,744 - root - INFO - Step 18560: lr=4.82E-06, loss= 1.0843 (max= 1.4263), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:50,744 - root - INFO - Step 18560: lr=4.82E-06, loss= 1.0843 (max= 1.4263), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:50,744 - root - INFO - Step 18560: lr=4.82E-06, loss= 1.0843 (max= 1.4263), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:54:50,745 - root - INFO - Step 18560: lr=4.82E-06, loss= 1.0843 (max= 1.4263), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:22,654 - root - INFO - Step 18570: lr=4.63E-06, loss= 1.1138 (max= 1.5652), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:22,654 - root - INFO - Step 18570: lr=4.63E-06, loss= 1.1138 (max= 1.5652), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:22,654 - root - INFO - Step 18570: lr=4.63E-06, loss= 1.1138 (max= 1.5652), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:22,654 - root - INFO - Step 18570: lr=4.63E-06, loss= 1.1138 (max= 1.5652), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:22,654 - root - INFO - Step 18570: lr=4.63E-06, loss= 1.1138 (max= 1.5652), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:22,655 - root - INFO - Step 18570: lr=4.63E-06, loss= 1.1138 (max= 1.5652), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:22,655 - root - INFO - Step 18570: lr=4.63E-06, loss= 1.1138 (max= 1.5652), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:22,655 - root - INFO - Step 18570: lr=4.63E-06, loss= 1.1138 (max= 1.5652), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:54,471 - root - INFO - Step 18580: lr=4.45E-06, loss= 1.1002 (max= 1.7279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:54,471 - root - INFO - Step 18580: lr=4.45E-06, loss= 1.1002 (max= 1.7279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:54,471 - root - INFO - Step 18580: lr=4.45E-06, loss= 1.1002 (max= 1.7279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:54,471 - root - INFO - Step 18580: lr=4.45E-06, loss= 1.1002 (max= 1.7279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:54,472 - root - INFO - Step 18580: lr=4.45E-06, loss= 1.1002 (max= 1.7279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:54,472 - root - INFO - Step 18580: lr=4.45E-06, loss= 1.1002 (max= 1.7279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:54,472 - root - INFO - Step 18580: lr=4.45E-06, loss= 1.1002 (max= 1.7279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:55:54,472 - root - INFO - Step 18580: lr=4.45E-06, loss= 1.1002 (max= 1.7279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:26,318 - root - INFO - Step 18590: lr=4.27E-06, loss= 1.0972 (max= 1.6400), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:26,318 - root - INFO - Step 18590: lr=4.27E-06, loss= 1.0972 (max= 1.6400), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:26,318 - root - INFO - Step 18590: lr=4.27E-06, loss= 1.0972 (max= 1.6400), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:26,318 - root - INFO - Step 18590: lr=4.27E-06, loss= 1.0972 (max= 1.6400), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:26,318 - root - INFO - Step 18590: lr=4.27E-06, loss= 1.0972 (max= 1.6400), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:26,318 - root - INFO - Step 18590: lr=4.27E-06, loss= 1.0972 (max= 1.6400), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:26,319 - root - INFO - Step 18590: lr=4.27E-06, loss= 1.0972 (max= 1.6400), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:26,319 - root - INFO - Step 18590: lr=4.27E-06, loss= 1.0972 (max= 1.6400), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:58,185 - root - INFO - Step 18600: lr=4.10E-06, loss= 1.0742 (max= 1.4809), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:58,185 - root - INFO - Step 18600: lr=4.10E-06, loss= 1.0742 (max= 1.4809), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:58,185 - root - INFO - Step 18600: lr=4.10E-06, loss= 1.0742 (max= 1.4809), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:58,185 - root - INFO - Step 18600: lr=4.10E-06, loss= 1.0742 (max= 1.4809), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:58,185 - root - INFO - Step 18600: lr=4.10E-06, loss= 1.0742 (max= 1.4809), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:58,185 - root - INFO - Step 18600: lr=4.10E-06, loss= 1.0742 (max= 1.4809), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:58,185 - root - INFO - Step 18600: lr=4.10E-06, loss= 1.0742 (max= 1.4809), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:56:58,185 - root - INFO - Step 18600: lr=4.10E-06, loss= 1.0742 (max= 1.4809), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:57:30,010 - root - INFO - Step 18610: lr=3.93E-06, loss= 1.1210 (max= 1.6620), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:57:30,010 - root - INFO - Step 18610: lr=3.93E-06, loss= 1.1210 (max= 1.6620), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:57:30,010 - root - INFO - Step 18610: lr=3.93E-06, loss= 1.1210 (max= 1.6620), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:57:30,010 - root - INFO - Step 18610: lr=3.93E-06, loss= 1.1210 (max= 1.6620), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:57:30,010 - root - INFO - Step 18610: lr=3.93E-06, loss= 1.1210 (max= 1.6620), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:57:30,010 - root - INFO - Step 18610: lr=3.93E-06, loss= 1.1210 (max= 1.6620), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:57:30,010 - root - INFO - Step 18610: lr=3.93E-06, loss= 1.1210 (max= 1.6620), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:57:30,010 - root - INFO - Step 18610: lr=3.93E-06, loss= 1.1210 (max= 1.6620), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:01,811 - root - INFO - Step 18620: lr=3.77E-06, loss= 1.1000 (max= 1.5804), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:01,811 - root - INFO - Step 18620: lr=3.77E-06, loss= 1.1000 (max= 1.5804), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:01,811 - root - INFO - Step 18620: lr=3.77E-06, loss= 1.1000 (max= 1.5804), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:01,811 - root - INFO - Step 18620: lr=3.77E-06, loss= 1.1000 (max= 1.5804), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:01,811 - root - INFO - Step 18620: lr=3.77E-06, loss= 1.1000 (max= 1.5804), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:01,811 - root - INFO - Step 18620: lr=3.77E-06, loss= 1.1000 (max= 1.5804), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:01,811 - root - INFO - Step 18620: lr=3.77E-06, loss= 1.1000 (max= 1.5804), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:01,811 - root - INFO - Step 18620: lr=3.77E-06, loss= 1.1000 (max= 1.5804), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:33,654 - root - INFO - Step 18630: lr=3.61E-06, loss= 1.0876 (max= 1.5052), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:33,654 - root - INFO - Step 18630: lr=3.61E-06, loss= 1.0876 (max= 1.5052), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:33,654 - root - INFO - Step 18630: lr=3.61E-06, loss= 1.0876 (max= 1.5052), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:33,654 - root - INFO - Step 18630: lr=3.61E-06, loss= 1.0876 (max= 1.5052), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:33,654 - root - INFO - Step 18630: lr=3.61E-06, loss= 1.0876 (max= 1.5052), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:33,654 - root - INFO - Step 18630: lr=3.61E-06, loss= 1.0876 (max= 1.5052), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:33,654 - root - INFO - Step 18630: lr=3.61E-06, loss= 1.0876 (max= 1.5052), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:58:33,654 - root - INFO - Step 18630: lr=3.61E-06, loss= 1.0876 (max= 1.5052), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:05,541 - root - INFO - Step 18640: lr=3.46E-06, loss= 1.0843 (max= 1.6428), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:05,541 - root - INFO - Step 18640: lr=3.46E-06, loss= 1.0843 (max= 1.6428), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:05,541 - root - INFO - Step 18640: lr=3.46E-06, loss= 1.0843 (max= 1.6428), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:05,541 - root - INFO - Step 18640: lr=3.46E-06, loss= 1.0843 (max= 1.6428), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:05,541 - root - INFO - Step 18640: lr=3.46E-06, loss= 1.0843 (max= 1.6428), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:05,541 - root - INFO - Step 18640: lr=3.46E-06, loss= 1.0843 (max= 1.6428), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:05,541 - root - INFO - Step 18640: lr=3.46E-06, loss= 1.0843 (max= 1.6428), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:05,541 - root - INFO - Step 18640: lr=3.46E-06, loss= 1.0843 (max= 1.6428), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:37,333 - root - INFO - Step 18650: lr=3.31E-06, loss= 1.0877 (max= 1.5874), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:37,334 - root - INFO - Step 18650: lr=3.31E-06, loss= 1.0877 (max= 1.5874), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:37,334 - root - INFO - Step 18650: lr=3.31E-06, loss= 1.0877 (max= 1.5874), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:37,334 - root - INFO - Step 18650: lr=3.31E-06, loss= 1.0877 (max= 1.5874), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:37,334 - root - INFO - Step 18650: lr=3.31E-06, loss= 1.0877 (max= 1.5874), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:37,334 - root - INFO - Step 18650: lr=3.31E-06, loss= 1.0877 (max= 1.5874), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:37,334 - root - INFO - Step 18650: lr=3.31E-06, loss= 1.0877 (max= 1.5874), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:59:37,334 - root - INFO - Step 18650: lr=3.31E-06, loss= 1.0877 (max= 1.5874), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:09,187 - root - INFO - Step 18660: lr=3.16E-06, loss= 1.0907 (max= 1.5273), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:09,187 - root - INFO - Step 18660: lr=3.16E-06, loss= 1.0907 (max= 1.5273), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:09,187 - root - INFO - Step 18660: lr=3.16E-06, loss= 1.0907 (max= 1.5273), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:09,187 - root - INFO - Step 18660: lr=3.16E-06, loss= 1.0907 (max= 1.5273), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:09,187 - root - INFO - Step 18660: lr=3.16E-06, loss= 1.0907 (max= 1.5273), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:09,187 - root - INFO - Step 18660: lr=3.16E-06, loss= 1.0907 (max= 1.5273), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:09,187 - root - INFO - Step 18660: lr=3.16E-06, loss= 1.0907 (max= 1.5273), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:09,188 - root - INFO - Step 18660: lr=3.16E-06, loss= 1.0907 (max= 1.5273), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:40,959 - root - INFO - Step 18670: lr=3.01E-06, loss= 1.0898 (max= 1.5057), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:40,959 - root - INFO - Step 18670: lr=3.01E-06, loss= 1.0898 (max= 1.5057), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:40,959 - root - INFO - Step 18670: lr=3.01E-06, loss= 1.0898 (max= 1.5057), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:40,959 - root - INFO - Step 18670: lr=3.01E-06, loss= 1.0898 (max= 1.5057), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:40,959 - root - INFO - Step 18670: lr=3.01E-06, loss= 1.0898 (max= 1.5057), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:40,959 - root - INFO - Step 18670: lr=3.01E-06, loss= 1.0898 (max= 1.5057), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:40,959 - root - INFO - Step 18670: lr=3.01E-06, loss= 1.0898 (max= 1.5057), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:00:40,959 - root - INFO - Step 18670: lr=3.01E-06, loss= 1.0898 (max= 1.5057), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:12,801 - root - INFO - Step 18680: lr=2.87E-06, loss= 1.1071 (max= 1.5900), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:12,802 - root - INFO - Step 18680: lr=2.87E-06, loss= 1.1071 (max= 1.5900), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:12,802 - root - INFO - Step 18680: lr=2.87E-06, loss= 1.1071 (max= 1.5900), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:12,802 - root - INFO - Step 18680: lr=2.87E-06, loss= 1.1071 (max= 1.5900), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:12,802 - root - INFO - Step 18680: lr=2.87E-06, loss= 1.1071 (max= 1.5900), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:12,802 - root - INFO - Step 18680: lr=2.87E-06, loss= 1.1071 (max= 1.5900), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:12,802 - root - INFO - Step 18680: lr=2.87E-06, loss= 1.1071 (max= 1.5900), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:12,802 - root - INFO - Step 18680: lr=2.87E-06, loss= 1.1071 (max= 1.5900), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:44,622 - root - INFO - Step 18690: lr=2.73E-06, loss= 1.1006 (max= 1.5571), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:44,622 - root - INFO - Step 18690: lr=2.73E-06, loss= 1.1006 (max= 1.5571), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:44,622 - root - INFO - Step 18690: lr=2.73E-06, loss= 1.1006 (max= 1.5571), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:44,622 - root - INFO - Step 18690: lr=2.73E-06, loss= 1.1006 (max= 1.5571), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:44,623 - root - INFO - Step 18690: lr=2.73E-06, loss= 1.1006 (max= 1.5571), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:44,623 - root - INFO - Step 18690: lr=2.73E-06, loss= 1.1006 (max= 1.5571), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:44,623 - root - INFO - Step 18690: lr=2.73E-06, loss= 1.1006 (max= 1.5571), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:01:44,623 - root - INFO - Step 18690: lr=2.73E-06, loss= 1.1006 (max= 1.5571), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:12,484 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-1-of-16-train/dsk-open-dyna-0-of-1-cp-1-of-16-train.parquet:1280134 +2025-10-26 02:02:16,559 - root - INFO - Step 18700: lr=2.60E-06, loss= 1.0774 (max= 1.4660), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:16,559 - root - INFO - Step 18700: lr=2.60E-06, loss= 1.0774 (max= 1.4660), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:16,559 - root - INFO - Step 18700: lr=2.60E-06, loss= 1.0774 (max= 1.4660), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:16,559 - root - INFO - Step 18700: lr=2.60E-06, loss= 1.0774 (max= 1.4660), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:16,559 - root - INFO - Step 18700: lr=2.60E-06, loss= 1.0774 (max= 1.4660), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:16,559 - root - INFO - Step 18700: lr=2.60E-06, loss= 1.0774 (max= 1.4660), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:16,559 - root - INFO - Step 18700: lr=2.60E-06, loss= 1.0774 (max= 1.4660), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:16,559 - root - INFO - Step 18700: lr=2.60E-06, loss= 1.0774 (max= 1.4660), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:48,424 - root - INFO - Step 18710: lr=2.46E-06, loss= 1.1100 (max= 1.5757), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:48,424 - root - INFO - Step 18710: lr=2.46E-06, loss= 1.1100 (max= 1.5757), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:48,424 - root - INFO - Step 18710: lr=2.46E-06, loss= 1.1100 (max= 1.5757), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:48,424 - root - INFO - Step 18710: lr=2.46E-06, loss= 1.1100 (max= 1.5757), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:48,424 - root - INFO - Step 18710: lr=2.46E-06, loss= 1.1100 (max= 1.5757), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:48,424 - root - INFO - Step 18710: lr=2.46E-06, loss= 1.1100 (max= 1.5757), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:48,424 - root - INFO - Step 18710: lr=2.46E-06, loss= 1.1100 (max= 1.5757), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:02:48,424 - root - INFO - Step 18710: lr=2.46E-06, loss= 1.1100 (max= 1.5757), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:20,281 - root - INFO - Step 18720: lr=2.33E-06, loss= 1.0834 (max= 1.5160), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:20,281 - root - INFO - Step 18720: lr=2.33E-06, loss= 1.0834 (max= 1.5160), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:20,281 - root - INFO - Step 18720: lr=2.33E-06, loss= 1.0834 (max= 1.5160), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:20,281 - root - INFO - Step 18720: lr=2.33E-06, loss= 1.0834 (max= 1.5160), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:20,281 - root - INFO - Step 18720: lr=2.33E-06, loss= 1.0834 (max= 1.5160), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:20,281 - root - INFO - Step 18720: lr=2.33E-06, loss= 1.0834 (max= 1.5160), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:20,281 - root - INFO - Step 18720: lr=2.33E-06, loss= 1.0834 (max= 1.5160), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:20,282 - root - INFO - Step 18720: lr=2.33E-06, loss= 1.0834 (max= 1.5160), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:52,083 - root - INFO - Step 18730: lr=2.20E-06, loss= 1.0750 (max= 1.4733), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:52,083 - root - INFO - Step 18730: lr=2.20E-06, loss= 1.0750 (max= 1.4733), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:52,083 - root - INFO - Step 18730: lr=2.20E-06, loss= 1.0750 (max= 1.4733), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:52,083 - root - INFO - Step 18730: lr=2.20E-06, loss= 1.0750 (max= 1.4733), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:52,083 - root - INFO - Step 18730: lr=2.20E-06, loss= 1.0750 (max= 1.4733), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:52,083 - root - INFO - Step 18730: lr=2.20E-06, loss= 1.0750 (max= 1.4733), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:52,083 - root - INFO - Step 18730: lr=2.20E-06, loss= 1.0750 (max= 1.4733), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:03:52,083 - root - INFO - Step 18730: lr=2.20E-06, loss= 1.0750 (max= 1.4733), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:23,912 - root - INFO - Step 18740: lr=2.08E-06, loss= 1.1191 (max= 1.4752), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:23,912 - root - INFO - Step 18740: lr=2.08E-06, loss= 1.1191 (max= 1.4752), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:23,912 - root - INFO - Step 18740: lr=2.08E-06, loss= 1.1191 (max= 1.4752), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:23,912 - root - INFO - Step 18740: lr=2.08E-06, loss= 1.1191 (max= 1.4752), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:23,912 - root - INFO - Step 18740: lr=2.08E-06, loss= 1.1191 (max= 1.4752), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:23,912 - root - INFO - Step 18740: lr=2.08E-06, loss= 1.1191 (max= 1.4752), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:23,912 - root - INFO - Step 18740: lr=2.08E-06, loss= 1.1191 (max= 1.4752), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:23,912 - root - INFO - Step 18740: lr=2.08E-06, loss= 1.1191 (max= 1.4752), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:55,804 - root - INFO - Step 18750: lr=1.95E-06, loss= 1.0607 (max= 1.5389), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:55,804 - root - INFO - Step 18750: lr=1.95E-06, loss= 1.0607 (max= 1.5389), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:55,804 - root - INFO - Step 18750: lr=1.95E-06, loss= 1.0607 (max= 1.5389), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:55,804 - root - INFO - Step 18750: lr=1.95E-06, loss= 1.0607 (max= 1.5389), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:55,804 - root - INFO - Step 18750: lr=1.95E-06, loss= 1.0607 (max= 1.5389), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:55,804 - root - INFO - Step 18750: lr=1.95E-06, loss= 1.0607 (max= 1.5389), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:55,804 - root - INFO - Step 18750: lr=1.95E-06, loss= 1.0607 (max= 1.5389), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:04:55,804 - root - INFO - Step 18750: lr=1.95E-06, loss= 1.0607 (max= 1.5389), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:27,659 - root - INFO - Step 18760: lr=1.83E-06, loss= 1.0800 (max= 1.5625), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:27,659 - root - INFO - Step 18760: lr=1.83E-06, loss= 1.0800 (max= 1.5625), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:27,659 - root - INFO - Step 18760: lr=1.83E-06, loss= 1.0800 (max= 1.5625), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:27,659 - root - INFO - Step 18760: lr=1.83E-06, loss= 1.0800 (max= 1.5625), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:27,659 - root - INFO - Step 18760: lr=1.83E-06, loss= 1.0800 (max= 1.5625), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:27,659 - root - INFO - Step 18760: lr=1.83E-06, loss= 1.0800 (max= 1.5625), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:27,659 - root - INFO - Step 18760: lr=1.83E-06, loss= 1.0800 (max= 1.5625), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:27,659 - root - INFO - Step 18760: lr=1.83E-06, loss= 1.0800 (max= 1.5625), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:59,471 - root - INFO - Step 18770: lr=1.71E-06, loss= 1.0909 (max= 1.4717), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:59,471 - root - INFO - Step 18770: lr=1.71E-06, loss= 1.0909 (max= 1.4717), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:59,472 - root - INFO - Step 18770: lr=1.71E-06, loss= 1.0909 (max= 1.4717), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:59,472 - root - INFO - Step 18770: lr=1.71E-06, loss= 1.0909 (max= 1.4717), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:59,472 - root - INFO - Step 18770: lr=1.71E-06, loss= 1.0909 (max= 1.4717), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:59,472 - root - INFO - Step 18770: lr=1.71E-06, loss= 1.0909 (max= 1.4717), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:59,472 - root - INFO - Step 18770: lr=1.71E-06, loss= 1.0909 (max= 1.4717), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:05:59,472 - root - INFO - Step 18770: lr=1.71E-06, loss= 1.0909 (max= 1.4717), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:31,353 - root - INFO - Step 18780: lr=1.59E-06, loss= 1.0844 (max= 1.5580), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:31,353 - root - INFO - Step 18780: lr=1.59E-06, loss= 1.0844 (max= 1.5580), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:31,353 - root - INFO - Step 18780: lr=1.59E-06, loss= 1.0844 (max= 1.5580), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:31,353 - root - INFO - Step 18780: lr=1.59E-06, loss= 1.0844 (max= 1.5580), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:31,353 - root - INFO - Step 18780: lr=1.59E-06, loss= 1.0844 (max= 1.5580), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:31,353 - root - INFO - Step 18780: lr=1.59E-06, loss= 1.0844 (max= 1.5580), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:31,353 - root - INFO - Step 18780: lr=1.59E-06, loss= 1.0844 (max= 1.5580), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:06:31,354 - root - INFO - Step 18780: lr=1.59E-06, loss= 1.0844 (max= 1.5580), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:03,297 - root - INFO - Step 18790: lr=1.47E-06, loss= 1.0952 (max= 1.5422), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:03,297 - root - INFO - Step 18790: lr=1.47E-06, loss= 1.0952 (max= 1.5422), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:03,297 - root - INFO - Step 18790: lr=1.47E-06, loss= 1.0952 (max= 1.5422), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:03,297 - root - INFO - Step 18790: lr=1.47E-06, loss= 1.0952 (max= 1.5422), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:03,297 - root - INFO - Step 18790: lr=1.47E-06, loss= 1.0952 (max= 1.5422), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:03,297 - root - INFO - Step 18790: lr=1.47E-06, loss= 1.0952 (max= 1.5422), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:03,298 - root - INFO - Step 18790: lr=1.47E-06, loss= 1.0952 (max= 1.5422), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:03,298 - root - INFO - Step 18790: lr=1.47E-06, loss= 1.0952 (max= 1.5422), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:35,131 - root - INFO - Step 18800: lr=1.35E-06, loss= 1.1155 (max= 1.5861), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:35,131 - root - INFO - Step 18800: lr=1.35E-06, loss= 1.1155 (max= 1.5861), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:35,131 - root - INFO - Step 18800: lr=1.35E-06, loss= 1.1155 (max= 1.5861), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:35,131 - root - INFO - Step 18800: lr=1.35E-06, loss= 1.1155 (max= 1.5861), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:35,131 - root - INFO - Step 18800: lr=1.35E-06, loss= 1.1155 (max= 1.5861), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:35,131 - root - INFO - Step 18800: lr=1.35E-06, loss= 1.1155 (max= 1.5861), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:35,131 - root - INFO - Step 18800: lr=1.35E-06, loss= 1.1155 (max= 1.5861), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:07:35,131 - root - INFO - Step 18800: lr=1.35E-06, loss= 1.1155 (max= 1.5861), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:06,984 - root - INFO - Step 18810: lr=1.24E-06, loss= 1.0878 (max= 1.4941), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:06,984 - root - INFO - Step 18810: lr=1.24E-06, loss= 1.0878 (max= 1.4941), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:06,984 - root - INFO - Step 18810: lr=1.24E-06, loss= 1.0878 (max= 1.4941), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:06,984 - root - INFO - Step 18810: lr=1.24E-06, loss= 1.0878 (max= 1.4941), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:06,984 - root - INFO - Step 18810: lr=1.24E-06, loss= 1.0878 (max= 1.4941), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:06,984 - root - INFO - Step 18810: lr=1.24E-06, loss= 1.0878 (max= 1.4941), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:06,984 - root - INFO - Step 18810: lr=1.24E-06, loss= 1.0878 (max= 1.4941), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:06,984 - root - INFO - Step 18810: lr=1.24E-06, loss= 1.0878 (max= 1.4941), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:38,847 - root - INFO - Step 18820: lr=1.12E-06, loss= 1.1087 (max= 1.5990), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:38,847 - root - INFO - Step 18820: lr=1.12E-06, loss= 1.1087 (max= 1.5990), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:38,847 - root - INFO - Step 18820: lr=1.12E-06, loss= 1.1087 (max= 1.5990), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:38,847 - root - INFO - Step 18820: lr=1.12E-06, loss= 1.1087 (max= 1.5990), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:38,847 - root - INFO - Step 18820: lr=1.12E-06, loss= 1.1087 (max= 1.5990), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:38,847 - root - INFO - Step 18820: lr=1.12E-06, loss= 1.1087 (max= 1.5990), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:38,847 - root - INFO - Step 18820: lr=1.12E-06, loss= 1.1087 (max= 1.5990), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:08:38,847 - root - INFO - Step 18820: lr=1.12E-06, loss= 1.1087 (max= 1.5990), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:10,719 - root - INFO - Step 18830: lr=1.01E-06, loss= 1.1056 (max= 1.6158), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:10,719 - root - INFO - Step 18830: lr=1.01E-06, loss= 1.1056 (max= 1.6158), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:10,719 - root - INFO - Step 18830: lr=1.01E-06, loss= 1.1056 (max= 1.6158), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:10,719 - root - INFO - Step 18830: lr=1.01E-06, loss= 1.1056 (max= 1.6158), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:10,719 - root - INFO - Step 18830: lr=1.01E-06, loss= 1.1056 (max= 1.6158), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:10,719 - root - INFO - Step 18830: lr=1.01E-06, loss= 1.1056 (max= 1.6158), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:10,719 - root - INFO - Step 18830: lr=1.01E-06, loss= 1.1056 (max= 1.6158), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:10,719 - root - INFO - Step 18830: lr=1.01E-06, loss= 1.1056 (max= 1.6158), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:42,542 - root - INFO - Step 18840: lr=9.01E-07, loss= 1.1011 (max= 1.4811), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:42,542 - root - INFO - Step 18840: lr=9.01E-07, loss= 1.1011 (max= 1.4811), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:42,542 - root - INFO - Step 18840: lr=9.01E-07, loss= 1.1011 (max= 1.4811), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:42,542 - root - INFO - Step 18840: lr=9.01E-07, loss= 1.1011 (max= 1.4811), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:42,542 - root - INFO - Step 18840: lr=9.01E-07, loss= 1.1011 (max= 1.4811), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:42,542 - root - INFO - Step 18840: lr=9.01E-07, loss= 1.1011 (max= 1.4811), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:42,542 - root - INFO - Step 18840: lr=9.01E-07, loss= 1.1011 (max= 1.4811), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:09:42,542 - root - INFO - Step 18840: lr=9.01E-07, loss= 1.1011 (max= 1.4811), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:14,398 - root - INFO - Step 18850: lr=7.91E-07, loss= 1.0908 (max= 1.5626), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:14,398 - root - INFO - Step 18850: lr=7.91E-07, loss= 1.0908 (max= 1.5626), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:14,398 - root - INFO - Step 18850: lr=7.91E-07, loss= 1.0908 (max= 1.5626), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:14,398 - root - INFO - Step 18850: lr=7.91E-07, loss= 1.0908 (max= 1.5626), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:14,398 - root - INFO - Step 18850: lr=7.91E-07, loss= 1.0908 (max= 1.5626), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:14,398 - root - INFO - Step 18850: lr=7.91E-07, loss= 1.0908 (max= 1.5626), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:14,398 - root - INFO - Step 18850: lr=7.91E-07, loss= 1.0908 (max= 1.5626), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:14,398 - root - INFO - Step 18850: lr=7.91E-07, loss= 1.0908 (max= 1.5626), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:46,280 - root - INFO - Step 18860: lr=6.83E-07, loss= 1.1197 (max= 1.7645), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:46,280 - root - INFO - Step 18860: lr=6.83E-07, loss= 1.1197 (max= 1.7645), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:46,280 - root - INFO - Step 18860: lr=6.83E-07, loss= 1.1197 (max= 1.7645), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:46,281 - root - INFO - Step 18860: lr=6.83E-07, loss= 1.1197 (max= 1.7645), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:46,281 - root - INFO - Step 18860: lr=6.83E-07, loss= 1.1197 (max= 1.7645), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:46,281 - root - INFO - Step 18860: lr=6.83E-07, loss= 1.1197 (max= 1.7645), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:46,281 - root - INFO - Step 18860: lr=6.83E-07, loss= 1.1197 (max= 1.7645), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:10:46,281 - root - INFO - Step 18860: lr=6.83E-07, loss= 1.1197 (max= 1.7645), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:18,139 - root - INFO - Step 18870: lr=5.77E-07, loss= 1.0811 (max= 1.6964), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:18,139 - root - INFO - Step 18870: lr=5.77E-07, loss= 1.0811 (max= 1.6964), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:18,139 - root - INFO - Step 18870: lr=5.77E-07, loss= 1.0811 (max= 1.6964), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:18,139 - root - INFO - Step 18870: lr=5.77E-07, loss= 1.0811 (max= 1.6964), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:18,139 - root - INFO - Step 18870: lr=5.77E-07, loss= 1.0811 (max= 1.6964), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:18,139 - root - INFO - Step 18870: lr=5.77E-07, loss= 1.0811 (max= 1.6964), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:18,139 - root - INFO - Step 18870: lr=5.77E-07, loss= 1.0811 (max= 1.6964), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:18,139 - root - INFO - Step 18870: lr=5.77E-07, loss= 1.0811 (max= 1.6964), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:49,932 - root - INFO - Step 18880: lr=4.71E-07, loss= 1.0932 (max= 1.4858), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:49,932 - root - INFO - Step 18880: lr=4.71E-07, loss= 1.0932 (max= 1.4858), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:49,932 - root - INFO - Step 18880: lr=4.71E-07, loss= 1.0932 (max= 1.4858), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:49,932 - root - INFO - Step 18880: lr=4.71E-07, loss= 1.0932 (max= 1.4858), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:49,932 - root - INFO - Step 18880: lr=4.71E-07, loss= 1.0932 (max= 1.4858), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:49,933 - root - INFO - Step 18880: lr=4.71E-07, loss= 1.0932 (max= 1.4858), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:49,933 - root - INFO - Step 18880: lr=4.71E-07, loss= 1.0932 (max= 1.4858), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:11:49,933 - root - INFO - Step 18880: lr=4.71E-07, loss= 1.0932 (max= 1.4858), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:21,832 - root - INFO - Step 18890: lr=3.67E-07, loss= 1.0817 (max= 1.4802), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:21,832 - root - INFO - Step 18890: lr=3.67E-07, loss= 1.0817 (max= 1.4802), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:21,833 - root - INFO - Step 18890: lr=3.67E-07, loss= 1.0817 (max= 1.4802), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:21,833 - root - INFO - Step 18890: lr=3.67E-07, loss= 1.0817 (max= 1.4802), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:21,833 - root - INFO - Step 18890: lr=3.67E-07, loss= 1.0817 (max= 1.4802), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:21,833 - root - INFO - Step 18890: lr=3.67E-07, loss= 1.0817 (max= 1.4802), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:21,833 - root - INFO - Step 18890: lr=3.67E-07, loss= 1.0817 (max= 1.4802), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:21,833 - root - INFO - Step 18890: lr=3.67E-07, loss= 1.0817 (max= 1.4802), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:53,644 - root - INFO - Step 18900: lr=2.63E-07, loss= 1.1020 (max= 1.6105), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:53,644 - root - INFO - Step 18900: lr=2.63E-07, loss= 1.1020 (max= 1.6105), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:53,644 - root - INFO - Step 18900: lr=2.63E-07, loss= 1.1020 (max= 1.6105), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:53,644 - root - INFO - Step 18900: lr=2.63E-07, loss= 1.1020 (max= 1.6105), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:53,644 - root - INFO - Step 18900: lr=2.63E-07, loss= 1.1020 (max= 1.6105), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:53,644 - root - INFO - Step 18900: lr=2.63E-07, loss= 1.1020 (max= 1.6105), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:53,644 - root - INFO - Step 18900: lr=2.63E-07, loss= 1.1020 (max= 1.6105), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:12:53,644 - root - INFO - Step 18900: lr=2.63E-07, loss= 1.1020 (max= 1.6105), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:25,560 - root - INFO - Step 18910: lr=1.61E-07, loss= 1.0853 (max= 1.4529), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:25,560 - root - INFO - Step 18910: lr=1.61E-07, loss= 1.0853 (max= 1.4529), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:25,560 - root - INFO - Step 18910: lr=1.61E-07, loss= 1.0853 (max= 1.4529), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:25,560 - root - INFO - Step 18910: lr=1.61E-07, loss= 1.0853 (max= 1.4529), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:25,560 - root - INFO - Step 18910: lr=1.61E-07, loss= 1.0853 (max= 1.4529), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:25,560 - root - INFO - Step 18910: lr=1.61E-07, loss= 1.0853 (max= 1.4529), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:25,560 - root - INFO - Step 18910: lr=1.61E-07, loss= 1.0853 (max= 1.4529), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:25,561 - root - INFO - Step 18910: lr=1.61E-07, loss= 1.0853 (max= 1.4529), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:57,465 - root - INFO - Step 18920: lr=6.02E-08, loss= 1.0980 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:57,465 - root - INFO - Step 18920: lr=6.02E-08, loss= 1.0980 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:57,465 - root - INFO - Step 18920: lr=6.02E-08, loss= 1.0980 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:57,465 - root - INFO - Step 18920: lr=6.02E-08, loss= 1.0980 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:57,465 - root - INFO - Step 18920: lr=6.02E-08, loss= 1.0980 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:57,466 - root - INFO - Step 18920: lr=6.02E-08, loss= 1.0980 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:57,466 - root - INFO - Step 18920: lr=6.02E-08, loss= 1.0980 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:13:57,466 - root - INFO - Step 18920: lr=6.02E-08, loss= 1.0980 (max= 1.6494), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-26 02:14:16,065 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 +2025-10-26 02:14:16,065 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-26 02:14:16,068 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 +2025-10-26 02:14:16,068 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-26 02:14:16,068 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 +2025-10-26 02:14:16,068 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-26 02:14:16,068 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 +2025-10-26 02:14:16,068 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-26 02:14:16,068 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 +2025-10-26 02:14:16,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-26 02:14:16,071 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 +2025-10-26 02:14:16,072 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-26 02:14:16,073 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 +2025-10-26 02:14:16,073 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-26 02:14:16,074 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 +2025-10-26 02:14:16,075 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-26 02:14:22,273 - root - INFO - Finished saving the checkpoint in 6.20 seconds +2025-10-26 02:14:22,273 - root - INFO - Sleeping 2 seconds for other ranks to complete +2025-10-26 02:14:22,276 - root - INFO - Finished saving the checkpoint in 6.21 seconds +2025-10-26 02:14:22,276 - root - INFO - Finished saving the checkpoint in 6.21 seconds +2025-10-26 02:14:22,277 - root - INFO - Finished saving the checkpoint in 6.21 seconds +2025-10-26 02:14:22,277 - root - INFO - Finished saving the checkpoint in 6.21 seconds +2025-10-26 02:14:22,277 - root - INFO - Training successfully completed! +2025-10-26 02:14:22,277 - root - INFO - Training successfully completed! +2025-10-26 02:14:22,277 - root - INFO - Training successfully completed! +2025-10-26 02:14:22,277 - root - INFO - Training successfully completed! +2025-10-26 02:14:22,278 - root - INFO - Finished saving the checkpoint in 6.20 seconds +2025-10-26 02:14:22,278 - root - INFO - Finished saving the checkpoint in 6.21 seconds +2025-10-26 02:14:22,278 - root - INFO - Training successfully completed! +2025-10-26 02:14:22,278 - root - INFO - Training successfully completed! +2025-10-26 02:14:22,280 - root - INFO - Finished saving the checkpoint in 6.21 seconds +2025-10-26 02:14:22,281 - root - INFO - Training successfully completed! +2025-10-26 02:14:24,273 - root - INFO - Training successfully completed!