KD Architecture Literature Review (April 2026)

Key findings from the 2026-04-08 literature survey (stored in docs/reference/kd_architecture_review.md):

Q1: Chronos Feature-based KD - DistilTS (ICASSP 2026, arXiv:2601.12785) is the direct competitor — uses Chronos as frozen zero-shot Teacher with Factorized Temporal Alignment Module - EC aggregate fine-tuning of Chronos is feasible (LoRA, low cost) but yields only aggregate-level peak patterns, not individual household patterns - Best Chronos hidden state extraction layer: final 2-3 encoder layers

Why: DistilTS선점 위험 대응 전략 수립에 필요. Dual-Teacher (GWN+Chronos) 차별화 근거 확보. How to apply: Phase 2 KD 설계 시 Chronos fine-tuning 여부를 실험 변수로 포함. EC 집계 fine-tuning은 개별 가구 Teacher로 불완전함을 주의.

Q2: GWN A_adp Validation - GWN paper itself validates via 5-variant ablation (forward/backward/adaptive combinations) - Trivial detection metrics: column variance, diagonal dominance ratio (>0.7 suspicious), matrix entropy, effective rank - Cross-validation with Pearson correlation matrix is the recommended domain-knowledge check

Why: A_adp가 trivial하면 KD contribution 주장 약화. 논문 contribution 방어에 필수. How to apply: exp-expert에게 50x50 A_adp heatmap + column variance + diagonal dominance ratio 측정 위임.

Q3: EC Multivariate Teacher Candidates - iTransformer (ICLR 2024 Spotlight) is best fit: variate-level tokens allow direct per-household hidden state extraction via enc_out[:, j, :] - GWN (current) vs. iTransformer comparison experiment recommended - arXiv:2502.12175 warns that adding GNN spatial structure to EC load forecasting does not always improve performance

Why: Phase 2 Teacher 선정에 직접 영향. iTransformer를 추가 실험 후보로 등록. How to apply: iTransformer를 EC Teacher 후보로 추가. GWN 대비 PAPE 비교 실험 설계.

Q4: Lightweight Student Models - No standard architecture exists for adjacency-matrix-as-auxiliary-input non-GNN student - Priority ranking: TiDE (adj as static covariate, P1) > TSMixer (channel-mixing + A_adp init, P1) > DLinear (current baseline, P0) > FITS (ultra-lightweight, P2) > PatchTST-small (feature KD, P2) - TSMixer channel-mixing weight initialization with A_adp is a novel, unpublished contribution opportunity - TimeDistill (KDD 2026) validated DLinear, FITS, TSMixer as student models

Why: Student 다양화로 ablation 실험 강화 및 논문 contribution 확보. How to apply: TiDE와 TSMixer를 DLinear 대비 baseline으로 추가. TSMixer+A_adp 초기화를 독자 contribution으로 설계.