...
Week | Task | Status | Comments |
20-May | Study Work: State of art on the models, optimization and Evaluation | Done | Look for optimization techniques, how they evaluate anonymization models. |
27-May | Finalizing Dataset and Libraries to use -- suppression/rename/ .. etc. | Done | Kubernetes logs/Metrics, Openstack logs/metrics .. any data that has PII information |
3-June | Anonymization Impact on the Model's utility | Done | |
10-June | Done | ||
17-June | Containeration and the APIs | Done | |
24-June | Automation using Python | Done | |
1-July | Testing of the containerized Architecture | Done | |
8-July | NLP Model for anonymizing Telco Data | ||
15-July | |||
22-July | |||
29-July | |||
5-Aug | Evaluation of the Model | ||
12-Aug | Integration of the developed model with the architecture | ||
19-Aug | Documentation and release of the code. | ||
26-Aug | [BUFFER] |
...
- Metrics like precision, recall, and F1-score can be used to assess how well the method identifies sensitive information.
- https://github.com/anonymous-NLP/anonymisation/blob/main/aggregated_annotations.pdf I also thought of to somehow compare the anonymization with the one given so as to have a valid approval for the model's performance.
- However, the impact on models requires domain-specific evaluation. Some approaches that I will follow are:
- Compare model performance: Train and test models on original and anonymized data to see the accuracy drop.
- Evaluate information loss: Measure how much relevant information is lost due to anonymization.
Anonymization Impact on the Model's utility
The work has been updated on the personal page to prevent exposure of undergoing progress.