githubEdit

kKaggle Competition: Helios Corn Futures Climate Challenge

Competition: https://www.kaggle.com/competitions/forecasting-the-future-the-helios-corn-climate-challenge. Note: I did not end up submitting this notebook into the competition due to kernel errors.

Overview

  • The notebook creates many derived features from climate risk columns: scaled variants, flaggers (IQR-based), production-weighted features, category-level severity measures, rolling-window exposure-weighted severity (30/60/120 days) and shock indicators.

  • Feature selection picks top correlated climate-derived features per futures column (TOP_K=5), resulting in 16 unique climate features selected in this run.

  • CFCS is a custom composite metric prioritizing average magnitude of significant correlations, then max correlation, then proportion of significant correlations.

Introduction

Helios Corn Futures Climate Challenge - Submission Sample

  • Objective: build better signals for commodity futures behaviour (prices, returns, volatility, term structure)

  • Evaluation Metric: Climate-Futures Correlation Score (CFCS)

Workspace Configuration

import warnings
import numpy as np
import pandas as pd
import seaborn as sns
from pathlib import Path
from datetime import datetime
import matplotlib.pyplot as plt
from sklearn.feature_selection import f_regression

warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

print("Libraries loaded successfully!")

INPUT_DIR = Path("/kaggle/input")
INPUT_PATH = INPUT_DIR / "forecasting-the-future-the-helios-corn-climate-challenge"
OUTPUT_DIR = Path("/kaggle/working")
OUTPUT_PATH = OUTPUT_DIR / "submission.csv"

df = pd.read_csv(INPUT_PATH / "corn_climate_risk_futures_daily_master.csv")
market_share_df = pd.read_csv('/kaggle/input/forecasting-the-future-the-helios-corn-climate-challenge/corn_regional_market_share.csv')

print(f"Main dataset shape: {df.shape}")
print(f"Date range: {df['date_on'].min()} to {df['date_on'].max()}")
print(f"Countries: {df['country_name'].nunique()}")
print(f"Regions: {df['region_name'].nunique()}")

# Identify climate risk and futures columns
df['date_on'] = pd.to_datetime(df['date_on'])
climate_cols = [c for c in df.columns if c.startswith('climate_risk_')]
futures_cols = [c for c in df.columns if c.startswith('futures_')]
pd.to_datetime(df['date_on']).dt.strftime('%Y-%m-%d')

print(f"Climate risk columns ({len(climate_cols)}):")
for col in climate_cols:
    print(f"  - {col}")

print(f"\nFutures columns ({len(futures_cols)}):")
for col in futures_cols[:10]:  # Show first 10
    print(f"  - {col}")
if len(futures_cols) > 30:
    print(f"  ... and {len(futures_cols) - 10} more")

df.head(5)
chevron-rightConsole output & sample dataframe (click to expand)hashtag

Utilities

Data Preparation

chevron-rightConsole output (click to expand)hashtag

Flaggers & Tickers

chevron-rightConsole output (click to expand)hashtag

Temporal Features & Risk Momentum & Deltas

  • Risk Momentums

  • Cross Country Relations

  • Price Indicators and volatility

chevron-rightLarge console output of severity / composite stats (click to expand)hashtag

(Output trimmed in the original; the script prints descriptive statistics for each category and window, and final counts)

Feature Selection

chevron-rightConsole output (click to expand)hashtag

Evaluation Test

Submission & Scoring

chevron-rightFinal scoring output (click to expand)hashtag

Last updated

Was this helpful?