AquaAware: A Two-Phase Learning System for Water Conservation
Published:
AquaAware is a project designed to reduce water consumption in homes by providing real-time recommendations for various tasks like brushing teeth, washing hands, showering, and washing dishes. The system employs a two-phase machine learning algorithm to first train users on best practices and then further optimize their personal habits.
Phase 1: Supervised Learning for Best Practices
Initially, the system uses a dataset containing optimal water consumption levels and time durations for common household tasks.
- Algorithm: A Decision Tree classifier is trained on this “best practice” data.
- Functionality: As the user consumes water, the device records the usage and time. The Decision Tree predicts the ongoing task and provides real-time recommendations to help the user align their habits with water-saving standards.
- Goal: The primary objective of this phase is to guide the user towards adopting more sustainable habits and reducing initial water wastage.
Phase 2: Unsupervised Learning for Personalized Optimization
Once the user has consistently adopted the recommended best practices, the system transitions to a personalized optimization phase.
- Data Collection: The system begins to store the user’s new, more efficient water usage patterns in a new dataset, but without task labels.
- Unsupervised Learning: Since the device cannot automatically classify these new patterns, the K-Means clustering algorithm is used. Knowing the number of tasks beforehand, K-Means groups the unlabeled data into distinct clusters, effectively re-identifying the tasks based on the user’s unique behavior.
- Retraining: Once the new data is labeled through clustering, a new Decision Tree model is trained on this personalized dataset.
- Goal: This second model provides even more refined predictions and recommendations, pushing the user to consume only what is strictly necessary, thereby minimizing waste beyond the standard best practices.
Design Choices
- Decision Tree was selected as the supervised learning algorithm due to its high accuracy and significantly faster response times compared to alternatives like K-Nearest Neighbors (KNN).
- K-Means was chosen for the unsupervised learning task because the number of clusters (i.e., the number of distinct household tasks) was known in advance, making it a highly efficient choice.