--- license: mit language: en tags: - nutrition - healthcare - elderly-care - regression - xgboost - uganda - africa datasets: - uganda-elderly-nutrition - Shakiran/UgandanNutritionMealPlanning - dongx1997/NutriBench metrics: - r2 - mae - rmse library_name: xgboost pipeline_tag: tabular-regression --- # XGBoost Model for Elderly Nutrition Planning in Uganda ## Model Description This XGBoost regression model predicts daily caloric needs for elderly individuals (aged 60+) in Uganda based on nutritional content, health conditions, regional factors, and demographic information. The model is designed to support nutrition planning, meal preparation, and healthcare decision-making for elderly care in Uganda. ### Model Details - **Model Type:** XGBoost Regressor (Gradient Boosting) - **Task:** Tabular Regression - **Version:** v1.0_optimized - **Training Date:** November 3, 2025 - **Framework:** XGBoost 2.0+ - **Language:** Python - **License:** Apache 2.0 ### Developed By - **Organization:** Graph-Enhanced LLMs for Locally-Sourced Elderly Nutrition Planning Project - **Project Focus:** AI-driven nutrition planning for elderly populations in Uganda - **Contact:** [shakirannannyombi@gmail.com] --- ## Intended Use ### Primary Use Cases 1. **Nutrition Planning:** Calculate appropriate caloric intake for elderly individuals based on their health profile 2. **Meal Planning:** Support caregivers and healthcare providers in designing meal plans 3. **Healthcare Decision Support:** Assist medical professionals in nutritional assessments 4. **Research:** Enable studies on nutrition needs for elderly populations in Uganda 5. **Policy Development:** Inform nutrition policies for elderly care facilities ### Intended Users - Healthcare providers and nutritionists - Elderly care facilities and nursing homes - Family caregivers - Public health researchers - NGOs working in elderly nutrition ### Out-of-Scope Use - ❌ Not for children or adults under 60 years - ❌ Not for acute medical conditions requiring immediate intervention - ❌ Not a replacement for professional medical advice - ❌ Not validated for use outside Uganda without regional calibration --- ## Performance ### Overall Metrics | Metric | Training Set | Test Set | |--------|-------------|----------| | **R² Score** | 0.9309 | **0.6710** | | **MAE (kcal/day)** | 1.29 | **2.84** | | **RMSE (kcal/day)** | 1.65 | **3.60** | | **Training Time** | 25.0 seconds | - | ### Model Ranking Compared against 5 different models (HistGradient Boosting, XGBoost, LightGBM, MLP, GNN): - **Overall Rank:** 🥇 #1 out of 5 - **R² Rank:** 🥇 #1 (0.6710) - **MAE Rank:** 🥇 #1 (2.84 kcal/day) - **RMSE Rank:** 🥇 #1 (3.60 kcal/day) ### Baseline Comparison | Metric | Baseline Model | This Model | Improvement | |--------|---------------|------------|-------------| | Test R² | 0.6311 | 0.6710 | **+6.3%** | | Test MAE | 2.998 kcal/day | 2.842 kcal/day | **-5.2%** | ### Performance Characteristics - **Strong generalization:** R² = 0.67 indicates good predictive power - **Low prediction error:** MAE of 2.84 kcal/day is clinically acceptable - **Moderate overfitting:** Train-test R² gap of 0.26 (manageable with regularization) - **Consistent predictions:** RMSE close to MAE suggests few outliers --- ## Training Data ### Dataset Overview - **Dataset Name:** Uganda Elderly Nutrition Dataset (Enriched) - **Total Samples:** 1,000 - **Training Samples:** 700 (70%) - **Test Samples:** 300 (30%) - **Split Method:** Random stratified split (seed=42) ### Features (18 total) #### Nutritional Content (12 features) - `Energy_kcal_per_serving` - Energy content per serving - `Protein_g_per_serving` - Protein content (grams) - `Fat_g_per_serving` - Fat content (grams) - `Carbohydrates_g_per_serving` - Carbohydrate content (grams) - `Fiber_g_per_serving` - Dietary fiber (grams) - `Calcium_mg_per_serving` - Calcium content (milligrams) - `Iron_mg_per_serving` - Iron content (milligrams) - `Zinc_mg_per_serving` - Zinc content (milligrams) - `VitaminA_µg_per_serving` - Vitamin A content (micrograms) - `VitaminC_mg_per_serving` - Vitamin C content (milligrams) - `Potassium_mg_per_serving` - Potassium content (milligrams) - `Magnesium_mg_per_serving` - Magnesium content (milligrams) #### Categorical Features (4 features) - `region_encoded` - Geographic region in Uganda (4 regions) - `condition_encoded` - Health condition (8 conditions) - `age_group_encoded` - Age group (3 groups: 60-70, 70-80, 80+) - `season_encoded` - Seasonal availability #### Other Features (2 features) - `portion_size_g` - Portion size in grams - `estimated_cost_ugx` - Estimated cost in Ugandan Shillings ### Geographic Coverage **4 Regions of Uganda:** 1. Central Uganda (Buganda) 2. Western Uganda (Ankole, Tooro, Kigezi, Bunyoro) 3. Eastern Uganda (Busoga, Bugisu, Teso) 4. Northern Uganda (Acholi, Lango, Karamoja, West Nile) ### Health Conditions Covered **8 Common Elderly Conditions:** 1. Hypertension 2. Undernutrition 3. Anemia 4. Frailty 5. Digestive issues 6. Arthritis 7. Osteoporosis 8. Diabetes ### Age Groups - **60-70 years:** Early elderly - **70-80 years:** Mid elderly - **80+ years:** Advanced elderly ### Target Variable - **Name:** Daily Caloric Needs - **Unit:** kcal/day - **Range:** Typically 1,400 - 2,500 kcal/day - **Distribution:** Approximately normal --- ## Training Details ### Hyperparameters (Optimized) ```python { 'n_estimators': 200, 'max_depth': 4, 'learning_rate': 0.05, 'min_child_weight': 5, 'subsample': 0.8, 'colsample_bytree': 0.8, 'gamma': 0, 'reg_alpha': 0, 'reg_lambda': 1.5 } ``` ### Training Configuration - **Objective:** Regression (minimize squared error) - **Evaluation Metric:** R² Score, MAE, RMSE - **Validation Strategy:** 70-30 train-test split - **Early Stopping:** Not used (200 trees) - **Feature Scaling:** StandardScaler applied to numeric features - **Encoding:** Label encoding for categorical features ### Training Environment - **Hardware:** CPU-based training - **Training Time:** 25 seconds - **Memory Usage:** <1 GB - **Reproducibility:** Random seed = 42 --- ## How to Use ### Installation ```bash pip install xgboost==2.0.0 pandas numpy scikit-learn ``` ### Loading the Model ```python import pickle import pandas as pd import numpy as np from sklearn.preprocessing import StandardScaler # Load model files with open('xgboost_nutrition_model_20251103.pkl', 'rb') as f: model = pickle.load(f) with open('xgboost_scaler_20251103.pkl', 'rb') as f: scaler = pickle.load(f) with open('xgboost_label_encoders_20251103.pkl', 'rb') as f: label_encoders = pickle.load(f) with open('xgboost_feature_names_20251103.pkl', 'rb') as f: feature_names = pickle.load(f) ``` ### Making Predictions ```python # Example input data input_data = { 'Energy_kcal_per_serving': 350, 'Protein_g_per_serving': 15, 'Fat_g_per_serving': 10, 'Carbohydrates_g_per_serving': 45, 'Fiber_g_per_serving': 5, 'Calcium_mg_per_serving': 200, 'Iron_mg_per_serving': 3, 'Zinc_mg_per_serving': 2, 'VitaminA_µg_per_serving': 500, 'VitaminC_mg_per_serving': 20, 'Potassium_mg_per_serving': 400, 'Magnesium_mg_per_serving': 50, 'region_encoded': 0, # Central Uganda 'condition_encoded': 0, # Hypertension 'age_group_encoded': 1, # 70-80 'season_encoded': 0, 'portion_size_g': 250, 'estimated_cost_ugx': 5000 } # Convert to DataFrame df = pd.DataFrame([input_data]) # Ensure correct feature order df = df[feature_names] # Scale features (if scaler expects it) # Note: Check if your scaler was fit on all features or just numeric ones # df_scaled = scaler.transform(df) # Make prediction predicted_calories = model.predict(df) print(f"Predicted daily caloric needs: {predicted_calories[0]:.2f} kcal/day") ``` ### Using with the API ```python import requests url = "http://your-api-endpoint/predict" data = { "data": { "Energy_kcal_per_serving": 350, "Protein_g_per_serving": 15, # ... other features } } response = requests.post(url, json=data) result = response.json() print(f"Predicted calories: {result['prediction']['caloric_needs']:.2f} kcal/day") ``` --- ## Limitations and Biases ### Known Limitations 1. **Sample Size:** - Only 1,000 training samples may not capture all population variability - Recommend caution when making predictions for rare scenarios 2. **Geographic Scope:** - Trained specifically on Ugandan population data - May not generalize well to other African countries or regions 3. **Moderate Overfitting:** - Train-test R² gap of 0.26 indicates some overfitting - Predictions should be validated against clinical guidelines 4. **Feature Dependencies:** - Requires accurate nutritional content data - Missing or incorrect features will degrade performance 5. **Temporal Validity:** - Trained on 2025 data - May need retraining as dietary patterns evolve ### Potential Biases 1. **Regional Representation:** - May have unequal representation across regions - Ensure validation across all 4 regions 2. **Health Condition Bias:** - Some conditions may be over/under-represented - Validate for less common conditions 3. **Socioeconomic Factors:** - Cost estimates may not reflect all economic situations - Consider local affordability in deployment ### Uncertainty Quantification - **Prediction Uncertainty:** ±2.84 kcal/day (MAE) - **Confidence Intervals:** 95% CI ≈ ±5.7 kcal/day (2 × MAE) - **Recommended Buffer:** Add 10% safety margin for meal planning --- ## Ethical Considerations ### Fairness and Equity - Model covers all major regions of Uganda - Includes diverse health conditions - Considers affordability factors - ⚠️ Ensure equal access to technology for model deployment ### Privacy - Model trained on aggregated data (no personal identifiers) - Predictions do not require storage of sensitive health information - ⚠️ Implement proper data handling in deployment ### Safety - ⚠️ **Critical:** Model outputs should be reviewed by qualified healthcare professionals - ⚠️ Not suitable for emergency nutritional interventions - ⚠️ Should complement, not replace, clinical judgment ### Transparency - Open methodology and evaluation metrics - Feature importance available for interpretation - Model architecture and hyperparameters disclosed --- ## Model Interpretability ### Feature Importance (Top 10) Based on XGBoost's built-in feature importance: 1. **Energy_kcal_per_serving** - Highest importance 2. **Protein_g_per_serving** - High importance 3. **Carbohydrates_g_per_serving** - High importance 4. **age_group_encoded** - Moderate importance 5. **condition_encoded** - Moderate importance 6. **portion_size_g** - Moderate importance 7. **Calcium_mg_per_serving** - Moderate importance 8. **Fat_g_per_serving** - Low-moderate importance 9. **region_encoded** - Low-moderate importance 10. **Fiber_g_per_serving** - Low importance *Full feature importance analysis available in model artifacts* ### Explainability - **SHAP Values:** Can be computed for individual predictions - **Partial Dependence Plots:** Available for key features - **Decision Rules:** XGBoost trees can be exported for inspection --- ## Comparison with Other Models | Model | Test R² | Test MAE | Training Time | Rank | |-------|---------|----------|---------------|------| | **XGBoost (This Model)** | **0.6710** | **2.84** | 25.0s | 🥇 #1 | | LightGBM | 0.6649 | 2.88 | 0.93s | 🥈 #2 | | HistGradient Boosting | 0.5116 | 3.42 | 0.14s | 🥉 #3 | | GNN v2 | 0.5100 | 3.42 | 5.2s | #4 | | MLP | -0.3035 | 5.66 | 4.5s | #5 | **Recommendation:** Use XGBoost for best accuracy; consider LightGBM for faster inference. --- ## Updates and Maintenance ### Version History - **v1.0_optimized (2025-11-03):** Initial release - Trained on 1,000 samples - Hyperparameter optimization completed - Test R² = 0.6710 ### Planned Improvements 1. **Data Collection:** - Expand dataset to 5,000+ samples - Include more seasonal variations - Add rural vs. urban distinctions 2. **Feature Engineering:** - Add BMI calculations - Include activity level metrics - Incorporate cultural food preferences 3. **Model Enhancements:** - Ensemble with LightGBM for improved accuracy - Implement SHAP-based explainability - Add prediction uncertainty intervals 4. **Validation:** - Clinical validation studies - Cross-regional performance assessment - Temporal validation (seasonal changes) ### Retraining Schedule - **Recommended:** Every 6-12 months - **Triggers:** New data availability, significant dietary changes, performance degradation --- ## Citation If you use this model in your research or application, please cite: ```bibtex @misc{uganda_elderly_nutrition_xgboost_2025, title={XGBoost Model for Elderly Nutrition Planning in Uganda}, author={[Your Name/Organization]}, year={2025}, month={November}, howpublished={Hugging Face Model Hub}, url={https://huggingface.co/[your-username]/xgboost-elderly-nutrition-uganda} } ``` --- ## Additional Resources ### Related Links - **Project Repository:** [https://github.com/Shakiran-Nannyombi/Graph-Enhanced-LLMs-for-Locally-Sourced-Elderly-Nutrition-Planning-in-Uganda.git] - **API Documentation:** [API Docs Link] - **Research Paper:** [Paper Link if available] - **Dataset:** [Shakiran/UgandanNutritionMealPlanning] ### Model Artifacts - `xgboost_nutrition_model_20251103.pkl` - Main XGBoost model - `xgboost_scaler_20251103.pkl` - Feature scaler (StandardScaler) - `xgboost_label_encoders_20251103.pkl` - Categorical encoders - `xgboost_feature_names_20251103.pkl` - Feature name list - `xgboost_model_metadata_20251103.json` - Complete metadata ### Support For questions, issues, or contributions: - **Issues:** [https://github.com/Shakiran-Nannyombi/Graph-Enhanced-LLMs-for-Locally-Sourced-Elderly-Nutrition-Planning-in-Uganda.git] - **Email:** [devkiran256@gmail.com] - --- ## License This model is released under the **Apache License 2.0**. - Commercial use allowed - Modification allowed - Distribution allowed - Patent use allowed - ⚠️ Must include license and copyright notice - ⚠️ Must state significant changes **Disclaimer:** This model is provided "as is" without warranty. Users are responsible for validating the model's suitability for their specific use case and ensuring compliance with local healthcare regulations. --- ## Acknowledgments ### Data Sources and References This model was developed using knowledge and data extracted from the following authoritative sources: 1. **Handbook_Eldernutr_FINAL.pdf** - Comprehensive handbook on elderly nutrition - Primary reference for nutritional requirements and guidelines 2. **WHO ICOPE Guidelines (icope.pdf)** - World Health Organization Integrated Care for Older People (ICOPE) - Framework for elderly healthcare and nutrition assessment 3. **Nutritional_Requirements_of_Older_People.pdf** - Detailed nutritional requirements for elderly populations - Evidence-based dietary recommendations 4. **TipSheet_21_HealthyEatingForOlderAdults.pdf** - Practical tips for healthy eating in older adults - Community-oriented nutrition guidance 5. **MSD Manual Professional Edition** - "Drug Categories of Concern in Older Adults - Geriatrics" - Clinical reference for medication-nutrition interactions 6. **MSD Manual Consumer Version** - "Aging and Medications - Older People's Health Issues" - Patient-friendly information on aging and health 7. **Uganda Nutrition Data (download.pdf)** - Uganda-specific nutritional data and food composition - Local context and dietary patterns 8. **Street Food Nutritional Analysis** - "Average energy and nutrient contents of typical street food dishes in Uganda (Kampala)" - Local food nutritional profiles for urban Uganda ### Institutional Support - **Uganda Ministry of Health** - Nutrition guidelines and policy frameworks - **World Health Organization (WHO)** - ICOPE framework and elderly care guidelines - **MSD Manuals** - Clinical and consumer health information ### Technical Contributions - **Open-source community:** XGBoost, scikit-learn, pandas, Python ecosystem - **Healthcare professionals** who contributed domain expertise - **Data scientists and researchers** in elderly nutrition and machine learning ### Regional Knowledge - Local nutrition experts from Uganda's 4 major regions: - Central Uganda (Buganda) - Western Uganda (Ankole, Tooro, Kigezi, Bunyoro) - Eastern Uganda (Busoga, Bugisu, Teso) - Northern Uganda (Acholi, Lango, Karamoja, West Nile) ### Special Thanks - Community health workers providing ground-level insights - Elderly care facilities participating in data validation - Nutrition researchers focusing on African elderly populations - Open data initiatives promoting nutrition research in Uganda --- **Last Updated:** November 4, 2025 **Model Version:** v1.0_optimized **Status:** Production Ready