Skip to main content

Table 2 Linkage metrics TP (true positives), FP (false positives), FN (false negatives), TN (true negatives), and f-score for each classifier per pair of data sets

From: Application of data linkage techniques to Pacific Northwest commercial fishing injury and fatality data

Commercial Fishing Incident Database & Oregon Trauma Registry
Match Parameters: Incident Date, Incident State
Combinations (2966 * 11): 32,626
Golden Matches: 5
Classifier Threshold TP FP FN TN f-score
Expectation/Conditional Maximization 0.5 4 6 1 32,615 0.53
Support vector machine 0.5 0 0 5 32,621 0
Naïve-Bayes 0.005 5 29 0 32,592 0.26
Logistic regression 0.005 5 29 0 32,592 0.26
Commercial Fishing Incident Database & Vessel Casualty
Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude
Combinations (1315 * 524): 689,060
Golden Matches: 9
Classifier Threshold TP FP FN TN f-score
Expectation/Conditional Maximization 0.5 9 3 0 689,048 0.86
Support vector machine 0.5 8 0 1 689,051 0.94
Naïve-Bayes 0.005 9 3 0 689,048 0.86
Logistic regression 0.005 9 7 0 689,044 0.72
Commercial Fishing Incident Database & Nonfatal Injuries
Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude
Combinations (2966 * 232): 688,112
Golden Matches: 12
Classifier Threshold TP FP FN TN f-score
Expectation/Conditional Maximization 0.5 12 52 0 688,048 0.32
Support vector machine 0.5 0 0 12 688,100 0
Naïve-Bayes 0.005 12 52 0 688,048 0.32
Logistic regression 0.005 12 52 0 688,048 0.32
Nonfatal Injuries & Vessel Casualty
Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude
Combinations (232 * 524): 121,568
Golden Matches: 10
Classifier Threshold TP FP FN TN f-score
Expectation/Conditional Maximization 0.5 10 13 0 121,545 0.61
Support vector machine 0.5 9 1 1 121,557 0.90
Naïve-Bayes 0.01 10 2 0 121,556 0.91
Logistic regression 0.01 10 13 0 121,545 0.61
Nonfatal Injuries & Oregon Trauma Registry
Match Parameters: Incident Date, Incident State
Combinations (232 * 11): 2552
Golden Matches: 4
Classifier Threshold TP FP FN TN f-score
Expectation/Conditional Maximization 0.2 4 3 0 2545 0.73
Support vector machine 0.5 0 0 4 2548 0
Naïve-Bayes 0.005 4 3 0 2545 0.73
Logistic regression 0.005 4 7 0 2541 0.53
Vessel Casualty & Oregon Trauma Registry
Match Parameters: Incident Date, Incident State
Combinations (524 * 11): 5764
Golden Matches: 0