items_header

Open projects

Projects available to all portals

ARED Group Inc
Atlanta, Georgia, United States
Henri Nyakarundi
CEO
3
Preferred learners
  • Anywhere
  • Academic experience
Categories
Computer science & IT Machine learning Artificial intelligence Hardware
Skills
network troubleshooting operating systems anomaly detection zabbix system testing pattern recognition predictive maintenance adaptive learning freeradius resilience
Project scope
What is the main goal for this project?

The main objective of this project is to develop and implement a self-healing AI model for ARED's distributed edge gateway network, which is powered by GPUs and runs on the Yocto operating system. This network supports a range of applications essential for managing both the health of the hardware and various networking functionalities, including Zabbix for health monitoring, CoovaChilli and FreeRADIUS for network management, Hostapd for access point management, and additional tools for log collection and analysis.

Problem Learners Will Be Solving:

Learners will tackle the challenge of ensuring the robustness, reliability, and scalability of ARED's edge infrastructure by creating an AI-driven system capable of identifying and automatically rectifying a wide array of operational issues. This encompasses detecting and addressing hardware malfunctions, software crashes, network connectivity issues, and performance bottlenecks, among other potential failures, without human intervention.

Expected Outcome by the End of the Project:

By the end of this project, learners are expected to achieve the following outcomes:

  1. Develop a Self-Healing AI Model: Create a sophisticated AI model that can analyze data from various sources within the edge infrastructure, detect anomalies or signs of impending failures, and initiate corrective actions autonomously.
  2. Integrate with Existing Systems: Seamlessly integrate this AI model with ARED's current edge monitoring and management tools, ensuring a unified approach to infrastructure health and performance management.
  3. Implement Automation for Self-Healing: Establish a comprehensive set of automated response mechanisms that the AI model can trigger to address detected issues, ranging from simple service restarts to complex configuration adjustments.
  4. Adaptive Learning and Improvement: Incorporate mechanisms for continuous learning and adaptation within the AI model, enabling it to refine its predictive accuracy and effectiveness in issue resolution over time based on outcomes and feedback.
  5. Operationalize the Self-Healing System: Successfully deploy the self-healing system across ARED's distributed edge gateway network, demonstrating its ability to minimize downtime, reduce manual troubleshooting efforts, and enhance the overall reliability and performance of the infrastructure.

This project aims to significantly advance ARED's operational capabilities, enabling the company to scale its infrastructure deployment more effectively and ensure high levels of service availability and reliability for its business customers.








What tasks will learners need to complete to achieve the project goal?

To successfully achieve the project goal of developing a self-healing AI system for ARED's distributed edge infrastructure, learners will need to complete the following tasks:


1. Objective Clarification and Scope Definition

- Understand and articulate the specific goals of the self-healing system.

- Identify the components, applications, and potential issues within the edge infrastructure that the system will address.


2. Data Collection and Preparation

- Aggregate historical data on system performance, including logs related to failures, errors, and normal operations from various applications like Zabbix, CoovaChilli, FreeRADIUS, and Hostapd.

- Clean, preprocess, and label the data to facilitate analysis and model training.

3. Model Selection and Training

- Review different machine learning models and select those best suited for anomaly detection, pattern recognition, and predictive maintenance tasks.

- Train the selected models using the prepared dataset, focusing on accurately identifying issues that could lead to system failures or performance degradation.


4. Integration with Monitoring Tools

- Integrate the trained AI models with existing infrastructure monitoring tools, ensuring real-time data analysis for anomaly detection.

- Develop a middleware layer if necessary to standardize and streamline data inputs from different sources to the AI models.


5. Development of Automation Scripts

- Create automation scripts or leverage existing automation tools to perform self-healing actions based on the AI model's outputs.

- Test the scripts in controlled environments to ensure they effectively address identified issues without unintended consequences.


6. Implementation of Adaptive Learning

- Implement mechanisms for the AI models to learn from the outcomes of their actions, allowing for continuous improvement in their predictive accuracy and the effectiveness of self-healing actions.


7. System Testing and Validation

- Conduct comprehensive testing of the self-healing system, including scenario-based testing for various types of failures and performance issues.

- Validate the system's effectiveness in real-world conditions, ensuring it meets the project objectives.


8. Deployment and Monitoring

- Deploy the self-healing system across the edge infrastructure, monitoring its performance and impact on system reliability and availability.

- Gradually expand the deployment, adjusting the system based on feedback and observed results.


9. Documentation and Knowledge Sharing

- Document the design, implementation, and operational procedures of the self-healing system comprehensively.

- Share knowledge and insights gained from the project with the broader team, enabling them to understand, maintain, and further develop the system.


By completing these tasks, learners will not only contribute to enhancing the resilience and efficiency of ARED's edge infrastructure but also gain valuable experience in applying AI and machine learning techniques to real-world operational challenges.

How will you support learners in completing the project?

Enhancing the support and mentorship program to align with the available resources and addressing the constraints mentioned, here's a revised approach to ensure learners can successfully complete the project on developing a self-healing AI model for ARED's distributed edge infrastructure:


### Revised Support and Mentorship Program:


**Project Guidance and Oversight:**

- Although we may not have an in-house AI specialist, we will provide detailed project guidelines and structured milestones to help learners navigate the project. This includes clear objectives, expected outcomes, and step-by-step tasks.


. **Data Accessibility:**

- Ensure learners have access to anonymized datasets necessary for training and validating their AI models. This includes system logs, performance metrics, and historical data on system behavior.

- Create a data repository on a cloud platform where learners can easily download and upload data as required for their project tasks.


**Communication and Collaboration Tools:**

- Utilize Slack for daily communication, discussions, and troubleshooting among learners and project coordinators. Set up dedicated channels for project-related topics to keep conversations organized.

- Implement DevOps practices for task management using tools like Jira or Trello, where learners can track their progress, manage tasks, and coordinate effectively with teammates.


**Check-Ins and Progress Tracking:**

- Schedule bi-weekly check-ins via video calls where learners can present their progress, discuss challenges, and receive guidance on next steps from project coordinators.

- Use the DevOps tool to track progress on tasks and milestones, ensuring learners stay on schedule and any roadblocks are addressed promptly.


**Showcase and Feedback:**

- Plan a project showcase at the end of the program where learners can present their completed self-healing AI model and the strategies implemented for the edge infrastructure. This session will be conducted via a virtual meeting platform.

- Collect feedback from all participants to understand the learning experience, challenges faced, and areas for improvement in future projects.


Supported causes
Sustainable cities and communities
About the company

ARED is a distributed infrastructure as a service company that help combine WIFI, storage and computing services into one solution to help bridge the digital gap in developing countries.