Project Case
Shanghai Housing Price Forecast
An explainable forecasting model for Shanghai monthly housing-price growth.
Role
Data collection / Feature engineering / XGBoost modeling / SHAP interpretation
Stage
Master thesis research, model iteration, and market interpretation
Key Outcome
Reduced forecast error by more than 98% versus the baseline model and identified momentum, liquidity, and sentiment signals.
Strategy Snapshot
BUSINESS QUESTION
Can housing movement be read before it becomes obvious?
The case turns a broad real-estate topic into a monthly forecasting problem, so the model can support earlier market judgment instead of only explaining past price changes.
REPORT EVIDENCE
98%+ error reduction with interpretable drivers
XGBoost reduced RMSE from 18,222.69 to 320.06, while SHAP identified price momentum, M2 liquidity, stock-market signals, and consumer confidence as readable market drivers.
STRATEGY SIGNAL
Use it as an early-warning framework
The output is strongest when translated into a dashboard for momentum, liquidity, sentiment, and policy direction, helping analysts discuss market timing and risk.
Background
This thesis reframes Shanghai housing-price analysis as a monthly growth forecasting problem. Instead of only describing long-term market trends, it builds a model that can capture short-term movements using housing prices, macro-financial indicators, policy signals, stock-market variables, and consumer sentiment.
Problem
A black-box forecast is not very useful for market judgment. The project needed to answer two questions at the same time: whether monthly housing-price movements could be predicted more accurately than a traditional linear model, and which variables were actually driving the model's prediction.
Approach
I collected and aligned multi-source data from 2000 to 2024, transformed annual and cumulative indicators into monthly features, built lagged variables for policy, LPR, M2, stock-market and sentiment signals, trained iterative XGBoost models, compared them with linear regression, and used SHAP to interpret feature contribution.
Key Evidence
FINAL MODEL
RMSE 320.06 / MAE 176.82
The final XGBoost version tracked Shanghai monthly housing prices from 2021 to 2024 with much lower error than the baseline.
ERROR REDUCTION
98%+
Compared with the initial baseline RMSE of 18,222.69, the final model achieved a substantial improvement after target redesign and feature engineering.
TOP SIGNAL
Price momentum
SHAP results showed that previous-month growth was the most influential predictor, indicating market inertia in short-term price movement.
MACRO DRIVER
M2 liquidity
Money supply emerged as an important macro-financial predictor, linking liquidity conditions to housing-market expectations.
Model Performance Snapshot
The final model sharply reduced error after target redesign and feature engineering.
Market Driver Map
The model was presented as a market-explanation framework, not only a prediction engine.
Momentum
Previous-month growth
The strongest short-term predictor in SHAP analysis.
Liquidity
M2 money supply
A macro-financial signal linked to market expectation.
Sentiment
Consumer confidence / stock market
Signals investor mood and capital reallocation pressure.
Decision Logic
Use the model as an early-warning lens, not an automatic investment rule
The strongest value is not a single price forecast, but a repeatable framework for detecting when momentum, liquidity, sentiment, and policy signals begin to move in the same direction.
Translate SHAP output into market narratives
For stakeholders, the model should be presented as a structured explanation of what is driving the market, so analysts can connect technical output with housing-policy and investment discussions.
Outcome
The final workflow produced an accurate and explainable model for short-term Shanghai housing-price dynamics. It showed that price momentum, liquidity conditions, seasonality, financial-market indicators, and sentiment variables can jointly improve forecasting and interpretation.
Reflection
For job-facing presentation, this is my strongest data case: it shows that I can move from messy real-world data to model design, performance evaluation, interpretability, and business-facing market judgment.
Original Deliverable
This page is a concise case summary. The full report contains the research process and modeling details.
View Full ReportContinue Browsing