House Price Prediction & Investment Analysis
An advanced regression pipeline built on the Ames Housing dataset to predict property values with state-of-the-art accuracy. The system employs a Stacking Regressor that harmonizes XGBoost and LightGBM through a Ridge meta-learner. Beyond raw prediction, the project features a custom business logic layer that reverses log-transformed prices to identify 'undervalued' assets—properties where the market price is significantly lower than the AI-estimated intrinsic value.
Key Features
- Stacking Ensemble Architecture: Combined XGBoost and LightGBM with a RidgeCV meta-model for superior generalization.
- Automated Preprocessing Pipeline: Integrated Scikit-Learn Pipelines with RobustScaling and One-Hot Encoding to handle outliers and categorical variance.
- Investment Signal Generation: Developed a financial discovery layer that calculates 'Potential Profit' and 'Profit Percentage' to rank investment targets.
- Advanced Feature Attribution: Utilized SHAP TreeExplainers to decode the impact of complex features like square footage and neighborhood on property value.
- Statistical Calibration: Implemented log-transformation on target variables to normalize price distribution and stabilize heteroscedasticity.
- End-to-End Workflow: A complete data-to-decision pipeline, including data cleaning, training, evaluation, and business reporting (CSV export).
Tech Stack
PythonXGBoostLightGBMScikit-learn (StackingRegressor)SHAPPandas & NumPyMatplotlib & Seaborn
Screenshots

