Page History

...

Week	Task	Status	Comments
20-May	Study Work: State of art on the models, optimization and Evaluation	Done	Look for optimization techniques, how they evaluate anonymization models.
27-May	Finalizing Dataset and Libraries to use -- suppression/rename/ .. etc.	Done	Kubernetes logs/Metrics, Openstack logs/metrics .. any data that has PII information
3-June	Anonymization Impact on the Model's utility	Done
10-June	Anonymization Impact on the Model's utility	Done
17-June	Containeration and the APIs	Done
24-June	Automation using Python	Done
1-July	Testing of the containerized Architecture	Done
8-July	NLP Model for anonymizing Telco Data
15-July
22-July
29-July
5-Aug	Evaluation of the Model
12-Aug	Integration of the developed model with the architecture
19-Aug	Documentation and release of the code.
26-Aug	[BUFFER]

...

Metrics like precision, recall, and F1-score can be used to assess how well the method identifies sensitive information.
- https://github.com/anonymous-NLP/anonymisation/blob/main/aggregated_annotations.pdf I also thought of to somehow compare the anonymization with the one given so as to have a valid approval for the model's performance.
However, the impact on models requires domain-specific evaluation. Some approaches that I will follow are:
1. Compare model performance: Train and test models on original and anonymized data to see the accuracy drop.
2. Evaluate information loss: Measure how much relevant information is lost due to anonymization.

The work has been updated on the personal page to prevent exposure of undergoing progress.

Space shortcuts