Datathon 2025 Scoring

Scoring isn't always easy. ;-)

Evaluating performance, team cohesion, and skill retention transparently.

Overview

We evaluate teams across two challenges (C1 and C2). Scores reflect both the quality of deliverables and the ability to maintain team capacity despite dropouts.

The final score combines a Base Score, a Team Cohesion Multiplier, and a Skill Retention (Capacity) Multiplier.

Base Scoring Rubric

Challenge #1: Dataset Creation & Annotation (50 points total)

Variable	Max	Definition
`C1_Relevance_Variety`	10	Dataset includes relevant antisemitic and non-antisemitic posts; diverse sources (hashtags, keywords, user groups).
`C1_Annotation_Schema`	10	Correct application of IHRA-WDA or justified adaptation; schema maps clearly to labels used.
`C1_Internal_Consistency`	10	Labels applied consistently with minimal misclassification; shows internal review.
`C1_Data_Report_Quality`	10	Dataset report includes keyword/time range, label definitions, label distribution, methodology.
`C1_Nuance_Reflection`	10	Report addresses challenges, limitations, ambiguity, and—if present—social/ethical implications.
`C1_Bonus_IAA`	10*	Bonus: Formal inter-annotator agreement (Cohen’s Kappa, Krippendorff’s Alpha) reported and interpreted.

* Bonus points may push a team above 50 for C1 before the final multipliers.

Challenge #2: Modeling & Evaluation (50 points total)

Variable	Max	Definition
`C2_Model_Performance`	15	Precision, recall, and F1-score reported; confusion matrix included.
`C2_Use_Gold_Dataset`	10	Used provided gold standard dataset correctly (no data leakage, correct splits).
`C2_Training_Pipeline`	10	Training process documented (hyperparameters, train/test/val split, reproducibility).
`C2_Error_Analysis`	10	Identifies error patterns; gives 3–5 FP/FN examples with reasoning.
`C2_Documentation`	5	Code is clear, well-structured, and reproducible (e.g., runnable Colab/README).
`C2_Bonus_UnseenData`	10*	Bonus: Model tested on new, manually annotated unseen data; performance reported and reflected on.

* Bonus points may push a team above 50 for C2 before the final multipliers.

Variables

Variable	Meaning
`Initial_Team_Size`	Members at the start of the competition.
`Remaining_Team_Members`	Members who completed the competition.
`Dropouts`	Members who left.
`Initial_Coding_Experience`	Initial count with prior coding experience.
`Remaining_Coding_Experience`	Remaining count with coding experience.
`Initial_Antisemitism_Knowledge`	Initial count with prior antisemitism knowledge.
`Remaining_Antisemitism_Knowledge`	Remaining count with antisemitism knowledge.
`Total_Score`	Sum of all C1 and C2 scoring components.
`Final_Score`	Final score after cohesion and capacity multipliers (see formulas).

Note: Antisemitism and coding knowledge are inferred from application responses.

Formulas

1) Base Score

$$ \text{Total\_Score} = \sum \text{C1\_items} + \sum \text{C2\_items} $$

In other words: We just add up all points from Challenge 1 and Challenge 2 (including bonuses).

2) Team Cohesion Multiplier

Rewards teams that retained members through to submission.

$$ \text{Cohesion\_Multiplier} = 1 + \alpha \times \left(\frac{\text{Remaining\_Team\_Members}}{\text{Initial\_Team\_Size}} - 1\right) $$

In other words: If nobody dropped out, this would equal 1. However, the Cohesion Multiplier shrinks in proportion to the number of teammates lost.

If everyone remains, the ratio is 1:1, but if there are fewer members left, the multiplier goes down. We use $\alpha = 1.0$ by default.

3) Skill Retention (Capacity) Multiplier

$$ \text{Capacity\_Multiplier} = 0.5 \times \frac{\text{Remaining\_Coding\_Experience}}{\text{Initial\_Coding\_Experience}} + 0.5 \times \frac{\text{Remaining\_Antisemitism\_Knowledge}}{\text{Initial\_Antisemitism\_Knowledge}} $$

In other words: Average two fractions—how much coding skill the teams kept and how much prior knowledge on antisemitism the teams kept (50/50).

4) Final Score

$$ \text{Final\_Score} = \text{Total\_Score} \times \text{Cohesion\_Multiplier} \times \text{Capacity\_Multiplier} $$

In Other words: We take the base score and scale it by the two multipliers. Teams that stayed intact and kept their skills get to keep more of their points..

Here is a fictional example of how to handle evaluation scores.

Inputs:

Total_Score = 115
Initial_Team_Size = 5
Remaining_Team_Members = 3 (2 dropouts)
Initial_Coding_Experience = 3, Remaining_Coding_Experience = 2
Initial_Antisemitism_Knowledge = 3, Remaining_Antisemitism_Knowledge = 1
α = 1.0

Step 1 — Cohesion Multiplier
$\text{Cohesion\_Multiplier} = 1 + \left(\frac{3}{5} - 1\right) = 0.6$

Step 2 — Capacity Multiplier
$\text{Capacity\_Multiplier} = 0.5 \times \frac{2}{3} + 0.5 \times \frac{1}{3} = 0.3333 + 0.1667 \approx 0.5$

Step 3 — Final Score
$\text{Final\_Score} = 115 \times 0.6 \times 0.5 \approx 34.5$

Interpretation:
While the Team produced strong deliverables, a 40% reduction in team size and loss of topical and technical expertise reduced their final ranking.

Daniel Miehling (Ph.D.)