items_header

Open projects

Projects available to all portals

ARED Group Inc
Atlanta, Georgia, United States
Henri Nyakarundi
CEO
2
Preferred learners
  • Anywhere
  • Academic experience
Categories
Computer science & it Machine learning Artificial intelligence Hardware
Project scope
What is the main goal for this project?

The main objective of this project is to develop and implement a self-healing AI model for ARED's distributed edge gateway network, which is powered by GPUs and runs on the Yocto operating system. This network supports a range of applications essential for managing both the health of the hardware and various networking functionalities, including Zabbix for health monitoring, CoovaChilli and FreeRADIUS for network management, Hostapd for access point management, and additional tools for log collection and analysis.

Problem Learners Will Be Solving:

Learners will tackle the challenge of ensuring the robustness, reliability, and scalability of ARED's edge infrastructure by creating an AI-driven system capable of identifying and automatically rectifying a wide array of operational issues. This encompasses detecting and addressing hardware malfunctions, software crashes, network connectivity issues, and performance bottlenecks, among other potential failures, without human intervention.

Expected Outcome by the End of the Project:

By the end of this project, learners are expected to achieve the following outcomes:

  1. Develop a Self-Healing AI Model: Create a sophisticated AI model that can analyze data from various sources within the edge infrastructure, detect anomalies or signs of impending failures, and initiate corrective actions autonomously.
  2. Integrate with Existing Systems: Seamlessly integrate this AI model with ARED's current edge monitoring and management tools, ensuring a unified approach to infrastructure health and performance management.
  3. Implement Automation for Self-Healing: Establish a comprehensive set of automated response mechanisms that the AI model can trigger to address detected issues, ranging from simple service restarts to complex configuration adjustments.
  4. Adaptive Learning and Improvement: Incorporate mechanisms for continuous learning and adaptation within the AI model, enabling it to refine its predictive accuracy and effectiveness in issue resolution over time based on outcomes and feedback.
  5. Operationalize the Self-Healing System: Successfully deploy the self-healing system across ARED's distributed edge gateway network, demonstrating its ability to minimize downtime, reduce manual troubleshooting efforts, and enhance the overall reliability and performance of the infrastructure.

This project aims to significantly advance ARED's operational capabilities, enabling the company to scale its infrastructure deployment more effectively and ensure high levels of service availability and reliability for its business customers.








What tasks will learners need to complete to achieve the project goal?

To successfully achieve the project goal of developing a self-healing AI system for ARED's distributed edge infrastructure, learners will need to complete the following tasks:


1. Objective Clarification and Scope Definition

- Understand and articulate the specific goals of the self-healing system.

- Identify the components, applications, and potential issues within the edge infrastructure that the system will address.


2. Data Collection and Preparation

- Aggregate historical data on system performance, including logs related to failures, errors, and normal operations from various applications like Zabbix, CoovaChilli, FreeRADIUS, and Hostapd.

- Clean, preprocess, and label the data to facilitate analysis and model training.

3. Model Selection and Training

- Review different machine learning models and select those best suited for anomaly detection, pattern recognition, and predictive maintenance tasks.

- Train the selected models using the prepared dataset, focusing on accurately identifying issues that could lead to system failures or performance degradation.


4. Integration with Monitoring Tools

- Integrate the trained AI models with existing infrastructure monitoring tools, ensuring real-time data analysis for anomaly detection.

- Develop a middleware layer if necessary to standardize and streamline data inputs from different sources to the AI models.


5. Development of Automation Scripts

- Create automation scripts or leverage existing automation tools to perform self-healing actions based on the AI model's outputs.

- Test the scripts in controlled environments to ensure they effectively address identified issues without unintended consequences.


6. Implementation of Adaptive Learning

- Implement mechanisms for the AI models to learn from the outcomes of their actions, allowing for continuous improvement in their predictive accuracy and the effectiveness of self-healing actions.


7. System Testing and Validation

- Conduct comprehensive testing of the self-healing system, including scenario-based testing for various types of failures and performance issues.

- Validate the system's effectiveness in real-world conditions, ensuring it meets the project objectives.


8. Deployment and Monitoring

- Deploy the self-healing system across the edge infrastructure, monitoring its performance and impact on system reliability and availability.

- Gradually expand the deployment, adjusting the system based on feedback and observed results.


9. Documentation and Knowledge Sharing

- Document the design, implementation, and operational procedures of the self-healing system comprehensively.

- Share knowledge and insights gained from the project with the broader team, enabling them to understand, maintain, and further develop the system.


By completing these tasks, learners will not only contribute to enhancing the resilience and efficiency of ARED's edge infrastructure but also gain valuable experience in applying AI and machine learning techniques to real-world operational challenges.

How will you support learners in completing the project?

Enhancing the support and mentorship program to align with the available resources and addressing the constraints mentioned, here's a revised approach to ensure learners can successfully complete the project on developing a self-healing AI model for ARED's distributed edge infrastructure:


### Revised Support and Mentorship Program:


**Project Guidance and Oversight:**

- Although we may not have an in-house AI specialist, we will provide detailed project guidelines and structured milestones to help learners navigate the project. This includes clear objectives, expected outcomes, and step-by-step tasks.


. **Data Accessibility:**

- Ensure learners have access to anonymized datasets necessary for training and validating their AI models. This includes system logs, performance metrics, and historical data on system behavior.

- Create a data repository on a cloud platform where learners can easily download and upload data as required for their project tasks.


**Communication and Collaboration Tools:**

- Utilize Slack for daily communication, discussions, and troubleshooting among learners and project coordinators. Set up dedicated channels for project-related topics to keep conversations organized.

- Implement DevOps practices for task management using tools like Jira or Trello, where learners can track their progress, manage tasks, and coordinate effectively with teammates.


**Check-Ins and Progress Tracking:**

- Schedule bi-weekly check-ins via video calls where learners can present their progress, discuss challenges, and receive guidance on next steps from project coordinators.

- Use the DevOps tool to track progress on tasks and milestones, ensuring learners stay on schedule and any roadblocks are addressed promptly.


**Showcase and Feedback:**

- Plan a project showcase at the end of the program where learners can present their completed self-healing AI model and the strategies implemented for the edge infrastructure. This session will be conducted via a virtual meeting platform.

- Collect feedback from all participants to understand the learning experience, challenges faced, and areas for improvement in future projects.


What skills or technologies will help learners to complete the project?

To be successful in this project, learners will benefit from a combination of specialized skills and knowledge in various technologies. Here’s a list of key competencies:

Skills:

  • Machine Learning and AI: Proficiency in machine learning algorithms, especially those used for anomaly detection, predictive maintenance, and time-series analysis. Understanding of deep learning frameworks like TensorFlow or PyTorch is crucial.
  • Data Analysis and Preprocessing: Ability to perform data cleaning, preprocessing, and feature engineering to prepare datasets for training AI models.
  • Programming: Strong programming skills in languages like Python, which is commonly used for data science and machine learning projects.
  • DevOps and Automation: Knowledge of automation tools and scripts to implement self-healing actions. Familiarity with DevOps practices for continuous integration and deployment (CI/CD) is beneficial.
  • System Monitoring and Logging: Understanding of system monitoring tools like Zabbix and logging mechanisms to collect and analyze data for the AI model.

Technologies:

  • Yocto Project: Familiarity with the Yocto Project to understand the operating system running on the edge devices.
  • Containerization and Virtualization: Experience with Docker, Kubernetes, or similar technologies for deploying and managing applications in isolated environments.
  • Networking Tools: Knowledge of networking management tools such as CoovaChilli, FreeRADIUS, and Hostapd.
  • AI and ML Libraries: Proficiency in using AI and machine learning libraries (e.g., scikit-learn, Keras) and deep learning frameworks.
  • Data Visualization and Dashboarding: Familiarity with data visualization tools (e.g., Grafana, Kibana) for creating dashboards to monitor AI model performance and system health.
  • Cloud Computing Platforms: Understanding of cloud services and architectures, especially in relation to deploying and managing edge computing solutions.
  • Database Management: Knowledge of database technologies for storing and managing application data securely and efficiently.

Soft Skills:

  • Problem-Solving: Ability to tackle complex challenges and devise effective solutions.
  • Critical Thinking: Skill in analyzing situations, data, and feedback to improve system performance and reliability.
  • Collaboration: Working effectively with team members, sharing insights, and collaborating on solutions.
  • Adaptability: Being open to learning new technologies and adapting to evolving project requirements.

Combining these skills and knowledge areas will empower learners to contribute significantly to the project's success, ensuring the development and deployment of a robust self-healing system for ARED's distributed edge infrastructure.

Supported causes
Sustainable cities and communities
About the company

ARED is a distributed infrastructure as a service company that help combine WIFI, storage and computing services into one solution to help bridge the digital gap in developing countries.