Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Survey if completed.
  2. Testbed is assigned - Pod12-Jump
  3. Framework : Acumos (too many issues).
  4. Problem Domain - Failure Prediction
  5. Clear Definition of Failure Prediction - Ongoing.
  6. Existing Models with FP - ARIMA or RNN - Used to deploy and test.
  7. Enhancement to Existing works on FP - Not yet started
  8. Data Gathering: (Important*)
    1. Publicly Available: Searching...
    2. Collecting from existing testbeds: WIP


Sl. No.TopicPresenterNotes
1Framework Deployment Status

Acumos - Container/K8S based approach.

Vanilla deployment - Failure to deploy for both approached (with and without cluster deployment).

  1. Work on Acumos on Pod18 - Existing Cluster - Girish
  2. Work on Other framework on Pod12-Jump - Rohit. Decision on 'other' framework by EoW.
2Survey - Implementation details - Status

Completed 

https://docs.google.com/spreadsheets/d/15XRdrWvbSCPsg1zZ9PfT9yvnElq21AvB/edit#gid=971676644

3Model Deployment Status

Waiting for the Framework to be UP - to run on the testbed.

Currently running locally - Google Collab. (Jupyter Notebooks).

Data: CPU consumption.

Failure: VM.

4Publicly Available Data

To be added by Girish/Rohit:


4Failure Prediction Definition - Status

Existing works:

  1. Mostly VM and Application Failures.
  2. Failure -  Crash and Connectivity

Gaps:

  1. Hardware, Containers
  2. Other failure types aren't considered

How to collect Data:

Take advantage of Chaos Engg Project - Litmus, Pumba, blockade etc.