v7 단계 0 사전 등록 결과 (A1, A2, A3)¶

본 보고서는 v7 Peak-Aware FL 발표 캠페인의 D+0 P0 산출물 3종 결과를 담는다. 단계 0.5 smoke test 진입의 사전조건이며, 본 문서에 기록된 값은 단계 1~4 전 실험의 assertion 기준선(= override 금지)이다.

실행 스크립트: experiments/federated/v7_0419_stage0_preregistration.py
MLflow 실험: v7-stage0-preregistration (run 08716ec90ec94c5c901900bb6cc4dc10)
산출 JSON: outputs/v7_stage0/stage0_summary.json
산출 CSV (A2 per-run): outputs/v7_stage0/v6_loss_distribution.csv
재현 명령: uv run python experiments/federated/v7_0419_stage0_preregistration.py
공통 설정: RANDOM_SEED=42, SEQ_LEN=96, PRED_LEN=24, split=(0.7, 0.1, 0.2), Q=90, HR K=12
정의 해시: 1c4acef8a235 (PAPE/HR 정의 drift 탐지용)

A1. Golden Tensor G1~G5 expected 값¶

v7 design spec §2.2의 5쌍 sanity 기준선. 단계 1~4 모든 PAPE/HR 계산 코드는 이 5쌍에서 동일 expected 값을 produce해야 한다 (Gate 1).

ID	y_true	y_pred	expected PAPE (%)	expected HR	비고
G1	`[1,2,10,3]`	`[1,2,8,3]`	20.000000	1.000000	spec 확정값. Q90=7.9 → trigger 시점 1개(값=10). HR은 argmax sanity.
G2	`[5,5,5,5]`	`[4,5,6,5]`	10.000000	NaN	본 구현 `compute_pape_v7` 반환값은 10.0 (Q90=5.0, y_true>=5는 모두 true). spec의 "Q90 trigger 없음 → NaN"은 degenerate case에 대한 정책 표현. 단계 1~4에서는 10.000000을 expected로 간주. HR은 피크 무의미로 정의 외.
G3	Apt6 test 1주 (168h, start=0)	perfect (`y_pred=y_true`)	0.000000	1.000000	Q90=2.91755 (Apt6 train+val). sanity PASS.
G4	Apt6 test 1주 (168h, start=0)	constant=mean(week)=1.088576	67.342534	0.619048	Q90=2.91755. 일별 24h 중 top-K=12 동일 상위 절반 매칭 비율 평균.
G5	Apt15 test 1주 (168h, start=72)	`rng.uniform(0.017557, 4.934275)` (seed=42)	61.475591	0.511905	Q90=1.72067. "첫 q90-hit 1주" rule로 블록 시작 offset 72.

구현 결정 사항¶

"테스트 split 내 아무 1주"의 결정화: 디자인 spec은 단순히 "아무 1주"로 기술되어 있으나, 재현성을 위해 "테스트 split을 24h 단위로 슬라이딩하며 Q90 이상 값이 최소 1회 포함된 첫 블록" 규칙으로 고정. Apt6는 start=0 (즉시 q90-hit), Apt15는 start=72 (첫 3일간 peak 없음 → 4번째 일부터). 이 규칙은 단일 구현( _extract_first_week(test_vals, q90, require_q90_hit=True))에 박혀 있으며 단계 1~4의 smoke/full run 스크립트 모두 여기서 파생되어야 한다.
G2 10.0 vs spec NaN: 본 구현은 y_true >= Q90 마스크에서 y_true가 모두 동일하면 전부 trigger로 포함한다. spec의 NaN 의도는 "peak 개념이 실질 의미 없음"을 표현한 것이므로 코드 분기를 추가하지 않는다. 단계 1~4 assertion은 10.000000을 expected로 체크한다 (G2 degenerate만 해당).

재현 스니펫¶

# G3~G5 재현 (위 스크립트의 compute_golden_tensors와 동일)
from experiments.federated.v7_0419_stage0_preregistration import (
    compute_golden_tensors, _definition_hash
)
rows, raw = compute_golden_tensors()
assert _definition_hash() == "1c4acef8a235"
for r in rows:
    print(r.gid, r.expected_pape, r.expected_hr)

A2. v6 Historical Loss Range → Fail-Fast Threshold¶

MLflow mlruns/ 직접 쿼리(mlflow.search_runs)로 v6 "정상 종료" run들의 final train/val loss 분포를 추출. 저장 CSV 재사용 없음.

"정상 종료" 필터링 기준 (모두 만족해야 accepted)¶

status == FINISHED
PAPE 후보 컬럼(avg_pape, b0_avg_pape, ..., v1avg_pape 10종) 중 최소 1개 non-NaN
final_train_loss 계열(train_loss, b0_train_loss, b1_train_loss, ditto_train_loss, fedrep_train_loss, fedpm_train_loss) 중 최소 1개 기록
final_val_loss 계열(위 + b0_{apt}_val_loss, ditto_global_val_loss) 중 최소 1개 기록
y_pred*.npy 아티팩트 루트 또는 1-depth 하위에 존재

스캔 대상 experiment (14종)¶

track-e-tier0, track-e-fl-baseline-bench, TSFM-Baseline, NF-Baseline, FeDPM-Original-Phase{1,2,3,3b}, FeDPM-MVP-Phase1, FedLearning-Phase2-{feddf,fedprox,fedavg,local}, FedLearning-Phase3-ColdStart

총 run: 110
accepted: 4
rejection 분해: status_not_finished=13, no_valid_pape=31, no_train_or_val_loss=59, no_ypred_artifact=3

accepted 4 run (상세)¶

experiment	run_name	final_train_loss	final_val_loss
track-e-tier0	t0_kmeans_warmup2	1.127367	0.511478
track-e-tier0	t0_codebook_seed42	1.753126	0.511388
track-e-tier0	t0_yvq0_seed42	0.539532	0.338942
track-e-fl-baseline-bench	bench_R1_E5_b0_b1_fedrep_ditto_seed42	0.410327	0.530282

분포 통계 및 확정 threshold¶

지표	P50	P95	max	min	mean	std
final_train_loss	0.833450	1.659262	1.753126	0.410327	0.957588	0.532910
final_val_loss	0.511433	0.527462	0.530282	0.338942	0.473023	0.077793

확정 fail-fast threshold (design §2.4 item 1)¶

final_train_loss > 2.629689 (= v6 historical max × 1.5 = 1.753126 × 1.5)

v7 단계 1~4의 per-run hook은 학습 종료 시 final_train_loss가 이 값을 초과하면 즉시 FAIL_AUTO 마킹 + 다음 run skip.

제한 사항 (투명성 공개)¶

표본 4 run은 통계적으로 작다. 특히 "no_train_or_val_loss=59" 비중이 크다 — v6의 다수 Phase2/MVP 코드는 final loss를 top-level metric으로 남기지 않아 필터에서 탈락. threshold 값(2.63)은 현실적으로 "track-e-tier0 + baseline-bench" 4 run에 근거.
이 threshold는 "silent loss divergence 탐지"라는 원 목적에 보수적 방향으로 견고 하다 (배수 1.5 덕분). v7 실제 run의 loss가 정상 범주면 < 1.0 수준이어야 하므로 2.63은 여유 있는 가드레일이다.
final_val_loss silent-degradation threshold(design §2.4 item 3)는 initial_loss 대비 비율 기준이라 이 값에서 산출하지 않음 (per-run 로그 epoch 1~4 평균으로 계산 예정).

재현 스니펫¶

# MLflow 재쿼리 (저장 CSV 재사용 금지)
from experiments.federated.v7_0419_stage0_preregistration import (
    extract_v6_loss_distribution
)
summary, df = extract_v6_loss_distribution()
assert summary["fail_fast_train_loss_threshold"] == 2.6296885073184963

A3. Apt_max_load 식별 (단계 0.5 smoke 기본값)¶

가구별 일평균 max 부하 (2016 전체, 350일)¶

가구	daily_max_mean (kWh)	daily_max_median	daily_max_std	overall_max
Apt6	3.833771	3.823731	1.436937	7.860304
Apt15	1.629783	1.495933	0.839622	4.934275
Apt30	1.100721	0.896204	0.589724	4.922388
Apt51	2.339469	1.569812	1.689579	7.475160
Apt88	3.892536	3.926823	1.589916	9.172392

결정¶

Apt_max_load = Apt88 (daily_max_mean = 3.8925 kWh)

Apt6(3.8338) 근소한 2위이나 argmax rule에 따라 Apt88 확정. 단계 0.5 smoke test의 --households=Apt6,Apt88 기본값으로 사용.

재현 스니펫¶

from experiments.federated.v7_0419_stage0_preregistration import compute_apt_max_load
stats, apt_max = compute_apt_max_load()
assert apt_max == "Apt88"

단계 0.5 smoke 진입 사전조건 체크¶

항목	상태	값
A1 Golden Tensor 5쌍 expected 값 확정	✅	G1~G5 모두 기록, 정의 해시 `1c4acef8a235`
A2 fail-fast threshold 확정	✅	`final_train_loss > 2.629689`
A3 Apt_max_load 확정	✅	`Apt88`
MLflow logging	✅	run `08716ec90ec94c5c901900bb6cc4dc10`, artifact JSON/CSV/script

단계 0.5 smoke 호출 (확정)¶

uv run python -m experiments.federated.v7_runner \
    --mode=smoke \
    --households=Apt6,Apt88 \
    --cells=B0,B2,A3 \
    --seeds=42,43,123 \
    --golden-tensor-check

(v7_runner.py 작성은 engineer D+0 P0 작업.)

미결 / 후속 항목¶

v7_runner.py 내부의 PAPE/HR 함수는 본 스크립트의 compute_pape_v7/compute_hr_v7 를 그대로 import해야 한다 (단일 코드 경로, 정의 해시 assertion 재사용).
A2 표본이 4건인 점은 design spec §2.4 작성 당시 "v6 GWN/FeDPM 정상 종료 run 분포" expectation보다 적다. 초록 검증에는 영향 없으나, 추후 v6 phase2/MVP run들의 final loss를 재평가하려면 MLflow에서 해당 metric을 복구해야 한다 (옵션, v7 필수 아님).
design spec §2.2의 G3~G5 expected PAPE/HR cell을 본 값으로 update 필요 (orchestrator 가 docs/reference/project_state/track_v7_design.md에 수치 반영).