Kaggle Competition: Helios Corn Futures Climate Challenge

Competition: https://www.kaggle.com/competitions/forecasting-the-future-the-helios-corn-climate-challenge. Note: I did not end up submitting this notebook into the competition due to kernel errors.

Overview

The notebook creates many derived features from climate risk columns: scaled variants, flaggers (IQR-based), production-weighted features, category-level severity measures, rolling-window exposure-weighted severity (30/60/120 days) and shock indicators.
Feature selection picks top correlated climate-derived features per futures column (TOP_K=5), resulting in 16 unique climate features selected in this run.
CFCS is a custom composite metric prioritizing average magnitude of significant correlations, then max correlation, then proportion of significant correlations.

Introduction

Helios Corn Futures Climate Challenge - Submission Sample

Objective: build better signals for commodity futures behaviour (prices, returns, volatility, term structure)
Evaluation Metric: Climate-Futures Correlation Score (CFCS)

Workspace Configuration

import warnings
import numpy as np
import pandas as pd
import seaborn as sns
from pathlib import Path
from datetime import datetime
import matplotlib.pyplot as plt
from sklearn.feature_selection import f_regression

warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

print("Libraries loaded successfully!")

INPUT_DIR = Path("/kaggle/input")
INPUT_PATH = INPUT_DIR / "forecasting-the-future-the-helios-corn-climate-challenge"
OUTPUT_DIR = Path("/kaggle/working")
OUTPUT_PATH = OUTPUT_DIR / "submission.csv"

df = pd.read_csv(INPUT_PATH / "corn_climate_risk_futures_daily_master.csv")
market_share_df = pd.read_csv('/kaggle/input/forecasting-the-future-the-helios-corn-climate-challenge/corn_regional_market_share.csv')

print(f"Main dataset shape: {df.shape}")
print(f"Date range: {df['date_on'].min()} to {df['date_on'].max()}")
print(f"Countries: {df['country_name'].nunique()}")
print(f"Regions: {df['region_name'].nunique()}")

# Identify climate risk and futures columns
df['date_on'] = pd.to_datetime(df['date_on'])
climate_cols = [c for c in df.columns if c.startswith('climate_risk_')]
futures_cols = [c for c in df.columns if c.startswith('futures_')]
pd.to_datetime(df['date_on']).dt.strftime('%Y-%m-%d')

print(f"Climate risk columns ({len(climate_cols)}):")
for col in climate_cols:
    print(f"  - {col}")

print(f"\nFutures columns ({len(futures_cols)}):")
for col in futures_cols[:10]:  # Show first 10
    print(f"  - {col}")
if len(futures_cols) > 30:
    print(f"  ... and {len(futures_cols) - 10} more")

df.head(5)

Console output & sample dataframe (click to expand)

Libraries loaded successfully!
Main dataset shape: (320661, 41)
Date range: 2016-01-01 to 2025-12-15
Countries: 11
Regions: 89
Climate risk columns (12):
  - climate_risk_cnt_locations_heat_stress_risk_low
  - climate_risk_cnt_locations_heat_stress_risk_medium
  - climate_risk_cnt_locations_heat_stress_risk_high
  - climate_risk_cnt_locations_unseasonably_cold_risk_low
  - climate_risk_cnt_locations_unseasonably_cold_risk_medium
  - climate_risk_cnt_locations_unseasonably_cold_risk_high
  - climate_risk_cnt_locations_excess_precip_risk_low
  - climate_risk_cnt_locations_excess_precip_risk_medium
  - climate_risk_cnt_locations_excess_precip_risk_high
  - climate_risk_cnt_locations_drought_risk_low
  - climate_risk_cnt_locations_drought_risk_medium
  - climate_risk_cnt_locations_drought_risk_high

Futures columns (17):
  - futures_close_ZC_1
  - futures_close_ZC_2
  - futures_close_ZW_1
  - futures_close_ZS_1
  - futures_zc1_ret_pct
  - futures_zc1_ret_log
  - futures_zc_term_spread
  - futures_zc_term_ratio
  - futures_zc1_ma_20
  - futures_zc1_ma_60
... (dataframe head printed)

Utilities

from dataclasses import dataclass, field

@dataclass(slots=True)
class Encoder:
    labels: list = field(init=True, repr=False)
    tag2idx: dict = field(init=False)
    idx2tag: dict = field(init=False)

    def __post_init__(self):
        self.labels = sorted([i.strip() if isinstance(i, str) else "Unknown" for i in self.labels])
        self.tag2idx = {label: idx for idx, label in enumerate(self.labels)}
        self.idx2tag = {idx: label for idx, label in enumerate(self.labels)}

class ClimateLabels:
    """ Climate Weather Feature Signals output from model and not the actual or true values during the insighted event. """

    heat_stress = ['climate_risk_cnt_locations_heat_stress_risk_low','climate_risk_cnt_locations_heat_stress_risk_medium','climate_risk_cnt_locations_heat_stress_risk_high']
    cold_stress = ['climate_risk_cnt_locations_unseasonably_cold_risk_low', 'climate_risk_cnt_locations_unseasonably_cold_risk_medium','climate_risk_cnt_locations_unseasonably_cold_risk_high']
    precip_stress = ['climate_risk_cnt_locations_excess_precip_risk_low','climate_risk_cnt_locations_excess_precip_risk_medium',
    'climate_risk_cnt_locations_excess_precip_risk_high']
    drought_stress = ['climate_risk_cnt_locations_drought_risk_low', 'climate_risk_cnt_locations_drought_risk_medium','climate_risk_cnt_locations_drought_risk_high']
    columns = heat_stress + cold_stress + precip_stress + drought_stress
    # helper labels
    extreme_signals = [
        'climate_risk_cnt_locations_heat_stress_risk_high',
        'climate_risk_cnt_locations_unseasonably_cold_risk_high',
        'climate_risk_cnt_locations_excess_precip_risk_high',
        'climate_risk_cnt_locations_drought_risk_high',
    ]
    medium_signals = [
        'climate_risk_cnt_locations_heat_stress_risk_medium',
        'climate_risk_cnt_locations_unseasonably_cold_risk_medium',
        'climate_risk_cnt_locations_excess_precip_risk_medium',
        'climate_risk_cnt_locations_drought_risk_medium',
    ]
    low_signals = [
        'climate_risk_cnt_locations_heat_stress_risk_low',
        'climate_risk_cnt_locations_unseasonably_cold_risk_low',
        'climate_risk_cnt_locations_excess_precip_risk_low',
        'climate_risk_cnt_locations_drought_risk_low',
    ]
    categories = ["heat_stress", "unseasonably_cold", "excess_precip", "drought"]

# Commodity Furture Pricing Signals
# C=Corn, W=Wheat, S=Soybean
# 1=FRONT MONTH FUTURES, 2=2ND MONTH FUTURES
# Closing price in the front month wrt commodity type
class FutureLabels:
    market_share = ["percent_country_production"]
    front_month_prices = ['futures_close_ZC_1', 'futures_close_ZW_1', 'futures_close_ZS_1']
    # Closing price in the second month wrt commodity type
    second_month_prices = ['futures_close_ZC_2']
    # Daily percentage / logs return for corn front-month
    daily_returns = ['futures_zc1_ret_pct', 'futures_zc1_ret_log']
    # Price diff / ratio of 2nd to front months
    spread_returns = ['futures_zc_term_spread', 'futures_zc_term_ratio']
    # Moving Averages wrt suffix days
    ma_measures = ['futures_zc1_ma_20', 'futures_zc1_ma_60', 'futures_zc1_ma_120']
    # Volatility wrt suffix days
    vol_measures = ['futures_zc1_vol_20', 'futures_zc1_vol_60']
    # quick ref
    measures = ma_measures + vol_measures
    close_prices = front_month_prices + second_month_prices
    columns = close_prices + daily_returns + spread_returns + ma_measures + vol_measures + market_share
    # extra
    cross_relations = [
        'futures_zw_zc_spread',
        'futures_zc_zw_ratio',
        'futures_zs_zc_spread',
        'futures_zc_zs_ratio'
    ]

class MetaLabels:
    identifiers = [
        'ID',
        'crop_name',
        'country_name',
        'country_code',
        'region_name',
        'region_id',
    ]
    temporal = ["harvest_period", "growing_season_year", "date_on"]
    columns = identifiers + temporal
    extra = ['date_on_year', 'date_on_month', 'date_on_year_month']

class ConfigLabels:
    identifiers = ["country_name", "region_name", "harvest_period"]
    futures = FutureLabels.measures
    climate = ClimateLabels
    meta = ["percent_country_production"]
    dt = ["date_on"]
    columns = identifiers + futures + climate.columns + meta
    x = ["harvest_period"] + climate.columns
    y = futures
    submission = ["date_on", "country_name", "region_name"]
    # new columns added
    temperature_risks = ['heat_stress', 'unseasonably_cold']
    precipitation_risks = ['excess_precip', 'drought']

class Labels:
    submission = FutureLabels.columns + FutureLabels.cross_relations + MetaLabels.columns + MetaLabels.extra

def create_submission(data: pd.DataFrame, submission_cols: list, output_file_path: Path = OUTPUT_PATH):
    data["date_on"] = pd.to_datetime(data['date_on']).dt.strftime('%Y-%m-%d')
    data[submission_cols].to_csv(output_file_path, index=False)
    print(f"Submitted file to: {output_file_path}")

Data Preparation

# Create a working copy

print("Preparing Dataset: Merging Market shares with main ... ")

merged_daily_df = df.copy()
merged_daily_df["code"] = merged_daily_df["country_name"] + "_" + merged_daily_df["region_name"]
country_region_encoder = Encoder(merged_daily_df.code.unique().tolist())
harvest_encoder = Encoder(merged_daily_df.harvest_period.unique().tolist())
merged_daily_df["code"] = merged_daily_df["code"].map(country_region_encoder.tag2idx)
merged_daily_df["harvest_period"] = merged_daily_df["harvest_period"].map(harvest_encoder.tag2idx)
merged_daily_df['day_of_year'] = merged_daily_df['date_on'].dt.dayofyear
merged_daily_df['quarter'] = merged_daily_df['date_on'].dt.quarter

print(f"Added basic futures\nDataset shape: {merged_daily_df.shape}")

merged_daily_df = merged_daily_df.merge(
    market_share_df[['region_id', 'percent_country_production']],
    on='region_id',
    how='left'
)

# merged_daily_df['percent_country_production'] = merged_daily_df['percent_country_production'].dropna()
median = merged_daily_df["percent_country_production"].median()
merged_daily_df["percent_country_production"] = merged_daily_df["percent_country_production"].fillna(median)
merged_daily_df["percent_country_production"] = merged_daily_df["percent_country_production"] / 100.0
merged_daily_df = merged_daily_df.sort_values(["code", "date_on"], ascending=False)

print(f"Merged with market share data\nProduction share range: {merged_daily_df['percent_country_production'].min():.1f}% to {merged_daily_df['percent_country_production'].max():.1f}%")
print(f"\n{'>'*10} Current Total {len(merged_daily_df.columns)} new features {'>'*10}")

Console output (click to expand)

Preparing Dataset: Merging Market shares with main ... 
Added basic futures
Dataset shape: (320661, 44)
Merged with market share data
Production share range: 0.0% to 0.7%

>>>>>>>>>> Current Total 45 new features >>>>>>>>>>

Flaggers & Tickers

from typing import Callable

LOWER_Q = 0.25
HIGHER_Q = 0.75
MUTATE_SUFFIX = "_scaled_"

def standardize(ds, col: str):
    max_val = ds[col].max()
    min_val = ds[col].min()
    output = (ds[col] - min_val) / (max_val - min_val)
    assert output.max() == 1.0 and output.min() == 0.0
    return output

def mutate_climate_features(df: pd.DataFrame, scaled_fn: Callable = standardize):
    merged_daily_df = df.copy()

    for col in ClimateLabels.columns:
        if str(col) != "harvest_period":
            col_label = f"{col}_scaled_"
            merged_daily_df[col_label] = scaled_fn(merged_daily_df, col=col)

    print(f"+ Added {len(ClimateLabels.columns)} Features of mutated / scaled / standardized datasets")

    return merged_daily_df

def create_conditional_climate_flaggers(ds: pd.DataFrame, scaled_suffix: str = MUTATE_SUFFIX):

    df = ds.copy()
    climate_risk_cols = [col for col in df.columns if str(col).startswith("climate_risk_cnt_locations_") and str(col).endswith(scaled_suffix)]

    for col in climate_risk_cols:
        df[f"{col}_flagger_"] = df[col].between(df[col].quantile(LOWER_Q), df[col].quantile(HIGHER_Q), inclusive="both").astype(int)

    print(f"+ Added {len(climate_risk_cols)} Flaggers from standardized features")
    return df

def create_weighted_climate_features(ds: pd.DataFrame, weight_col: str = "percent_country_production"):
    df = ds.copy()
    climate_risk_cols = [col for col in df.columns if str(col).startswith("climate_risk_cnt_locations_") and not str(col).endswith("_")]

    c = 0
    for risk_col in climate_risk_cols:
        df[f"{risk_col}_{weight_col}_weighted_"] = df[risk_col] * df[weight_col]
        df[f"{risk_col}_{weight_col}_weighted_sq_"] = df[risk_col] * (df[weight_col] ** 2)
        c+=2

    print(f"+ Added {c} weighted climate features")

    return df


def add_category_severity_features(df: pd.DataFrame, eps: float = 1e-9) -> pd.DataFrame:
    df = df.copy()

    mapping = {
        "heat_stress": ClimateLabels.heat_stress,
        "unseasonably_cold": ClimateLabels.cold_stress,
        "excess_precip": ClimateLabels.precip_stress,
        "drought": ClimateLabels.drought_stress,
    }

    for cat, cols in mapping.items():
        low, med, high = cols
        total = df[low] + df[med] + df[high]
        df[f"climate_{cat}_severity_"] = (df[med] + 2.0 * df[high]) / (total + eps)
        df[f"climate_{cat}_high_share_"] = df[high] / (total.replace(0, np.nan))
        df[f"climate_{cat}_total_cnt_"] = total

    return df

def add_group_dynamics(
    df: pd.DataFrame,
    cols: list[str],
    group_cols=("harvest_period", "country_name", "region_name"),
    time_col="date_on",
    windows=(30, 60, 120),
) -> pd.DataFrame:
    df = df.sort_values([*group_cols, time_col]).copy()

    for c in cols:
        g = df.groupby(list(group_cols))[c]
        df[f"{c}_diff1"] = g.diff()
        thr = df[f"{c}_diff1"].quantile(0.95)
        df[f"{c}_shock_up"] = (df[f"{c}_diff1"] > thr).astype(int)

        for w in windows:
            df[f"{c}_roll{w}_sum_"] = g.transform(lambda s: s.rolling(w, min_periods=1).sum())
            df[f"{c}_ewm{w}_"] = g.transform(lambda s: s.ewm(span=w, adjust=False).mean())

    return df

# Create Flaggers
merged_daily_df = mutate_climate_features(merged_daily_df, standardize)
merged_daily_df = create_conditional_climate_flaggers(merged_daily_df)
merged_daily_df = create_weighted_climate_features(merged_daily_df)
merged_daily_df = add_category_severity_features(merged_daily_df)
merged_daily_df = add_group_dynamics(merged_daily_df, cols=ClimateLabels.medium_signals)

print(f"\n{'>'*10} Current Total {len(merged_daily_df.columns)} new features {'>'*10}")

Console output (click to expand)

+ Added 12 Features of mutated / scaled / standardized datasets
+ Added 12 Flaggers from standardized features
+ Added 24 weighted climate features

>>>>>>>>>> Current Total 137 new features >>>>>>>>>>

Temporal Features & Risk Momentum & Deltas

Risk Momentums
Cross Country Relations
Price Indicators and volatility

import numpy as np

tds = [30, 60, 120]
grp = ["harvest_period", "code"]
time_col = "date_on"  # change if needed
weight_col = FutureLabels.market_share[0]  # "percent_country_production"
eps = 1e-9

# rolling depends on row order
merged_daily_df = merged_daily_df.sort_values(grp + [time_col]).copy()

for idx, window in enumerate(reversed(tds), start=1):
    wgt = idx / 2

    # keep track of per-category exposure-weighted severity cols for a window-level composite
    window_cat_scores = []

    for risk_col in ClimateLabels.categories:
        level_cols = {
            "low":    f"climate_risk_cnt_locations_{risk_col}_risk_low",
            "medium": f"climate_risk_cnt_locations_{risk_col}_risk_medium",
            "high":   f"climate_risk_cnt_locations_{risk_col}_risk_high",
        }

        created = {}

        # 1) weighted rolling features for each level
        for level, climate_col in level_cols.items():
            if climate_col not in merged_daily_df.columns:
                raise KeyError(f"Missing column: {climate_col}")

            new_col = f"climate_{risk_col}_roll{window}_{level}_sum_"
            merged_daily_df[new_col] = (
                merged_daily_df
                .groupby(grp)[climate_col]
                .transform(lambda s: wgt * s * s.rolling(window, min_periods=1).sum())
                .fillna(0)
            )
            created[level] = new_col

        # 2) severity on the *rolled-weighted* features (dimensionless, robust)
        sev_col = f"climate_{risk_col}_roll{window}_severity_"
        low_c, med_c, high_c = created["low"], created["medium"], created["high"]
        total = merged_daily_df[low_c] + merged_daily_df[med_c] + merged_daily_df[high_c]

        merged_daily_df[sev_col] = (merged_daily_df[med_c] + 2.0 * merged_daily_df[high_c]) / (total + eps)

        # 3) exposure-weighted severity (production importance)
        sev_x_prod = f"{sev_col}x_prod_"
        merged_daily_df[sev_x_prod] = merged_daily_df[sev_col] * merged_daily_df[weight_col]
        window_cat_scores.append(sev_x_prod)

        # 4) dynamics: diff + shock
        diff_col = f"{sev_x_prod}diff1_"
        shock_col = f"{sev_x_prod}shock_up_"

        merged_daily_df[diff_col] = merged_daily_df.groupby(grp)[sev_x_prod].diff()
        thr = merged_daily_df[diff_col].quantile(0.95)
        merged_daily_df[shock_col] = (merged_daily_df[diff_col] > thr).astype(int)

        print(f"[{risk_col}] severity + exposure-weighted severity")
        print(merged_daily_df[[sev_col, sev_x_prod, diff_col, shock_col]].describe())

    # 5) window-level composite score across categories (mean exposure-weighted severity)
    comp_col = f"climate_roll{window}_risk_score_x_prod_"
    merged_daily_df[comp_col] = merged_daily_df[window_cat_scores].mean(axis=1)

    # optional: z-score composite per dataset (comment out if you don't want standardization)
    mu = merged_daily_df[comp_col].mean()
    sd = merged_daily_df[comp_col].std(ddof=0)
    merged_daily_df[f"{comp_col}z_"] = (merged_daily_df[comp_col] - mu) / (sd + 1e-12)

    print(f"\n[COMPOSITE] window={window} -> {comp_col}")
    print(merged_daily_df[[comp_col, f"{comp_col}z_"]].describe())

print(f"\n{'>'*10} Current Total {len(merged_daily_df.columns)} columns {'>'*10}")

Large console output of severity / composite stats (click to expand)

(Output trimmed in the original; the script prints descriptive statistics for each category and window, and final counts)

>>>>>>>>>> Current Total 227 columns >>>>>>>>>>

Feature Selection

TOP_K = 5
fts = pd.DataFrame()
all_climate_features = [i for i in merged_daily_df.columns if i.startswith("climate")]

for future_col in FutureLabels.columns:
    if future_col != "percent_country_production":
        corrs = merged_daily_df[all_climate_features].corrwith(merged_daily_df[future_col]).abs().sort_values(ascending=False).head(TOP_K)
        corrs_df = corrs.reset_index().rename(columns={"index": "feature", 0: "corr"})
        corrs_df["returns"] = future_col
        fts = pd.concat([fts, corrs_df], axis=0)

report_df = fts.sort_values("corr", ascending=False)
final_climate_cols = list(fts.feature.unique())

print(f"\n{'>'*10} Selected {fts.feature.nunique()} columns {'>'*10}")

Console output (click to expand)

>>>>>>>>>> Selected 16 columns >>>>>>>>>>

Evaluation Test

def compute_monthly_climate_futures_correlations(df):

    # Dynamic detection
    climate_cols = [c for c in df.columns if c.startswith("climate_risk_")]
    futures_cols = [c for c in df.columns if c.startswith("futures_")]

    # Remove future data
    max_valid_date = df["date_on"].max()
    df = df[df["date_on"] <= max_valid_date]

    results = []
    # Loop by commodity + month
    for comm in df["crop_name"].unique():
        df_comm = df[df["crop_name"] == comm]

        for country in sorted(df_comm["country_name"].unique()):
            df_country = df_comm[df_comm["country_name"] == country]

            for month in sorted(df_country["date_on_month"].unique()):
                df_month = df_country[df_country["date_on_month"] == month]

                for clim in climate_cols:
                    for fut in futures_cols:

                        if df_month[clim].std() > 0 and df_month[fut].std() > 0:
                            corr = df_month[[clim, fut]].corr().iloc[0, 1]
                        else:
                            corr = None

                        results.append({
                            "crop_name": comm,
                            "country_name": country,
                            "month": month,
                            "climate_variable": clim,
                            "futures_variable": fut,
                            "correlation": corr
                        })

    results_df = pd.DataFrame(results)
    results_df['correlation'] = results_df['correlation']

    return results_df

def calculate_cfcs_score(correlations_df):
    """
    Calculate the Climate-Futures Correlation Score (CFCS) for leaderboard ranking.

    CFCS = (0.5 × Avg_Sig_Corr_Score) + (0.3 × Max_Corr_Score) + (0.2 × Sig_Count_Score)

    Focus on significant correlations (≥ |0.5|) only for average calculation.
    """
    # Remove null correlations
    valid_corrs = correlations_df["correlation"].dropna()

    if len(valid_corrs) == 0:
        return {'cfcs_score': 0.0, 'error': 'No valid correlations'}

    # Calculate base metrics
    abs_corrs = valid_corrs.abs()
    max_abs_corr = abs_corrs.max()
    significant_mask = abs_corrs >= 0.5
    significant_corrs = abs_corrs[significant_mask]
    significant_count = len(significant_corrs)
    total_count = len(valid_corrs)

    # Calculate component scores - ONLY average significant correlations
    if significant_count > 0:
        avg_sig_corr = significant_corrs.mean()
        avg_sig_score = min(100, avg_sig_corr * 100)  # Cap at 100 when avg sig reaches 1.0
    else:
        avg_sig_corr = 0.0
        avg_sig_score = 0.0

    max_corr_score = min(100, max_abs_corr * 100)  # Cap at 100 when max reaches 1.0
    sig_count_score = (significant_count / total_count) * 100  # Percentage

    # Composite score: Focus more on quality of significant correlations
    cfcs = (0.5 * avg_sig_score) + (0.3 * max_corr_score) + (0.2 * sig_count_score)

    return {
        'cfcs_score': round(cfcs, 2),
        'avg_significant_correlation': round(avg_sig_corr, 4),
        'max_abs_correlation': round(max_abs_corr, 4),
        'significant_correlations_pct': round(sig_count_score, 2),
        'avg_sig_score': round(avg_sig_score, 2),
        'max_corr_score': round(max_corr_score, 2),
        'sig_count_score': round(sig_count_score, 2),
        'total_correlations': total_count,
        'significant_correlations': significant_count
    }

def run_scoring(merged_daily_df: pd.DataFrame):
    monthly_corr_df = compute_monthly_climate_futures_correlations(merged_daily_df)
    score_results = calculate_cfcs_score(monthly_corr_df)

    print("=== CLIMATE-FUTURES CORRELATION SCORE (CFCS) ===")
    print(f"Final CFCS Score: {score_results['cfcs_score']}\n")
    print("Component Breakdown:")
    print(f"  Average Significant |Correlation|: {score_results['avg_significant_correlation']:.4f} → Score: {score_results['avg_sig_score']}")
    print(f"  Maximum |Correlation|: {score_results['max_abs_correlation']:.4f} → Score: {score_results['max_corr_score']}")
    print(f"  Significant Correlations: {score_results['significant_correlations']}/{score_results['total_correlations']} ({score_results['significant_correlations_pct']:.1f}%) → Score: {score_results['sig_count_score']}\n")
    print("Score Calculation:")
    print(f"  CFCS = (0.5 × {score_results['avg_sig_score']}) + (0.3 × {score_results['max_corr_score']}) + (0.2 × {score_results['sig_count_score']})")
    print(f"  CFCS = {0.5 * score_results['avg_sig_score']:.1f} + {0.3 * score_results['max_corr_score']:.1f} + {0.2 * score_results['sig_count_score']:.1f} = {score_results['cfcs_score']}")
    print("Key Insight: This metric focuses on the QUALITY of significant correlations rather than being diluted by weak signals.")

Submission & Scoring

# final = submission.merge(right=merged_daily_df, on=MetaLabels.identifiers + ["date_on"] + MetaLabels.extra, how="left")

try:
    #merged_daily_df["date_on"] = pd.to_datetime(merged_daily_df['date_on']).dt.strftime('%Y-%m-%d').info()
    # merged_daily_df.to_csv(OUTPUT_PATH, index=False)
    final_cols = list(final_climate_cols) + Labels.submission
    # run_scoring(merged_daily_df[final_cols])
    create_submission(data=merged_daily_df, submission_cols=final_climate_cols)
except Exception as e:
    raise e

Final scoring output (click to expand)

=== CLIMATE-FUTURES CORRELATION SCORE (CFCS) ===
Final CFCS Score: 51.53

Component Breakdown:
  Average Significant |Correlation|: 0.5721 → Score: 57.21
  Maximum |Correlation|: 0.7600 → Score: 76.0
  Significant Correlations: 111/17289 (0.6%) → Score: 0.64

Score Calculation:
  CFCS = (0.5 × 57.21) + (0.3 × 76.0) + (0.2 × 0.64)
  CFCS = 28.6 + 22.8 + 0.1 = 51.53
Key Insight: This metric focuses on the QUALITY of significant correlations rather than being diluted by weak signals.

Previous2026 Competitions Next2025 Competitions

Last updated 2 days ago

Was this helpful?

hashtagOverview

hashtagIntroduction

hashtagWorkspace Configuration

hashtagUtilities

hashtagData Preparation

hashtagFlaggers & Tickers

hashtagTemporal Features & Risk Momentum & Deltas

hashtagFeature Selection

hashtagEvaluation Test

hashtagSubmission & Scoring

Overview

Introduction

Workspace Configuration

Utilities

Data Preparation

Flaggers & Tickers

Temporal Features & Risk Momentum & Deltas

Feature Selection

Evaluation Test

Submission & Scoring