LLM Probabilities, Training Size, and Perturbation Thresholds in Entity Recognition

mrarup82May 1, 2025

0 1 1 minute read

Authors:

(1) Anthi Papadopoulou, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway and Corresponding author ([email protected]);

(2) Pierre Lison, Norwegian Computing Center, Gaustadalleen 23A, 0373 Oslo, Norway;

(3) Mark Anderson, Norwegian Computing Center, Gaustadalleen 23A, 0373 Oslo, Norway;

(4) Lilja Øvrelid, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway;

(5) Ildiko Pilan, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway.

Table of Links

Abstract and 1 Introduction

2 Background

2.1 Definitions

2.2 NLP Approaches

2.3 Privacy-Preserving Data Publishing

2.4 Differential Privacy

3 Datasets and 3.1 Text Anonymization Benchmark (TAB)

3.2 Wikipedia Biographies

4 Privacy-oriented Entity Recognizer

4.1 Wikidata Properties

4.2 Silver Corpus and Model Fine-tuning

4.3 Evaluation

4.4 Label Disagreement

4.5 MISC Semantic Type

5 Privacy Risk Indicators

5.1 LLM Probabilities

5.2 Span Classification

5.3 Perturbations

5.4 Sequence Labelling and 5.5 Web Search

6 Analysis of Privacy Risk Indicators and 6.1 Evaluation Metrics

6.2 Experimental Results and 6.3 Discussion

6.4 Combination of Risk Indicators

7 Conclusions and Future Work

Declarations

References

Appendices

A. Human properties from Wikidata

B. Training parameters of entity recognizer

C. Label Agreement

D. LLM probabilities: base models

E. Training size and performance

F. Perturbation thresholds

A Human properties from Wikidata

The two tables below show the selected Wikidata properties mentioned in Section 4.1 that constitute the DEM and MISC gazetteers.

Table 8:

Table 8: (Continued)

Table 9:

Table 9: (Continued)

B Training parameters of entity recognizer

Table 10 details the parameters employed to train the privacy-oriented entity recognition model from Section 4.

C Label Agreement

Frequently confused label pairs (see Section 4.4) are shown in Figure 4.

D LLM probabilities: base models

Table 11 describes the (ordered) based models the Autogluon tabular predictor employs for the LLM-probability based approach of Section 5.1

E Training size and performance

Figure 5 shows the F1 score of both the Tabular and the Multimodal Autogluon predictors (LLM probabilities Section 6.3 and span classification Section 6.3 respectively) at different training sizes for both datasets. We use a random sample of 1% to 100% for each training dataset split.

F Perturbation thresholds

Figure 6 shows the performance of different perturbation thresholds for both datasets for the training dataset split, with the black line indicating the threshold used in Section 5.3 for evaluation.

Fig. 4: Most common label confusion pairs common in the test sets of the annotated Wikipedia biographies and the TAB corpus. The first element of the pair corresponds to the gold standard label and the second to the output from the entity recognizer.

Table 11: Base models of the Tabular predictor in the order they are trained when using the AutoGluon library. This order based on training time and reliability to ensure efficient training time (Erickson et al., 2020).

Fig. 5: Performance of the tabular and multimodal predictors when different training sizes are used during training. We report the F1 score for the annotated Wikipedia test dataset and the TAB test dataset as well.

Fig. 6: Precision and recall score for direct and quasi identifiers at different thresholds of probability difference for the Wikipedia and TAB train datasets. The black line indicates the threshold where the cost function is maximized. This is approximately 3.5 for Wikipedia and 10 for TAB)

mrarup82May 1, 2025

0 1 1 minute read

Table of Links

A Human properties from Wikidata

DEM-related properties

MISC-related properties

B Training parameters of entity recognizer

C Label Agreement

D LLM probabilities: base models

E Training size and performance

F Perturbation thresholds

mrarup82

Related Articles

Binance Referral ID Promo Code

Crypto startups can’t just rely on solid tech to win VC funding: OKX

James Goodnight

Aptos Movemaker Launches US$2 Million Grant Program and Exclusive Co-Working Space for Builders in Hong Kong

Leave a Reply Cancel reply