Datathon 2025 Scoring

Scoring isn't always easy. ;-)

Evaluating performance, team cohesion, and skill retention transparently.

Overview

We evaluate teams across two challenges (C1 and C2). Scores reflect both the quality of deliverables and the ability to maintain team capacity despite dropouts.

The final score combines a Base Score, a Team Cohesion Multiplier, and a Skill Retention (Capacity) Multiplier.

Base Scoring Rubric

Challenge #1: Dataset Creation & Annotation (50 points total)

VariableMaxDefinition
C1_Relevance_Variety10Dataset includes relevant antisemitic and non-antisemitic posts; diverse sources (hashtags, keywords, user groups).
C1_Annotation_Schema10Correct application of IHRA-WDA or justified adaptation; schema maps clearly to labels used.
C1_Internal_Consistency10Labels applied consistently with minimal misclassification; shows internal review.
C1_Data_Report_Quality10Dataset report includes keyword/time range, label definitions, label distribution, methodology.
C1_Nuance_Reflection10Report addresses challenges, limitations, ambiguity, and—if present—social/ethical implications.
C1_Bonus_IAA10*Bonus: Formal inter-annotator agreement (Cohen’s Kappa, Krippendorff’s Alpha) reported and interpreted.

* Bonus points may push a team above 50 for C1 before the final multipliers.

Challenge #2: Modeling & Evaluation (50 points total)

VariableMaxDefinition
C2_Model_Performance15Precision, recall, and F1-score reported; confusion matrix included.
C2_Use_Gold_Dataset10Used provided gold standard dataset correctly (no data leakage, correct splits).
C2_Training_Pipeline10Training process documented (hyperparameters, train/test/val split, reproducibility).
C2_Error_Analysis10Identifies error patterns; gives 3–5 FP/FN examples with reasoning.
C2_Documentation5Code is clear, well-structured, and reproducible (e.g., runnable Colab/README).
C2_Bonus_UnseenData10*Bonus: Model tested on new, manually annotated unseen data; performance reported and reflected on.

* Bonus points may push a team above 50 for C2 before the final multipliers.

Variables

VariableMeaning
Initial_Team_SizeMembers at the start of the competition.
Remaining_Team_MembersMembers who completed the competition.
DropoutsMembers who left.
Initial_Coding_ExperienceInitial count with prior coding experience.
Remaining_Coding_ExperienceRemaining count with coding experience.
Initial_Antisemitism_KnowledgeInitial count with prior antisemitism knowledge.
Remaining_Antisemitism_KnowledgeRemaining count with antisemitism knowledge.
Total_ScoreSum of all C1 and C2 scoring components.
Final_ScoreFinal score after cohesion and capacity multipliers (see formulas).

Note: Antisemitism and coding knowledge are inferred from application responses.

Formulas

1) Base Score

$$ \text{Total\_Score} = \sum \text{C1\_items} + \sum \text{C2\_items} $$

In other words: We just add up all points from Challenge 1 and Challenge 2 (including bonuses).

2) Team Cohesion Multiplier

Rewards teams that retained members through to submission.

$$ \text{Cohesion\_Multiplier} = 1 + \alpha \times \left(\frac{\text{Remaining\_Team\_Members}}{\text{Initial\_Team\_Size}} - 1\right) $$

In other words: If nobody dropped out, this would equal 1. However, the Cohesion Multiplier shrinks in proportion to the number of teammates lost.

If everyone remains, the ratio is 1:1, but if there are fewer members left, the multiplier goes down. We use \(\alpha = 1.0\) by default.

3) Skill Retention (Capacity) Multiplier

$$ \text{Capacity\_Multiplier} = 0.5 \times \frac{\text{Remaining\_Coding\_Experience}}{\text{Initial\_Coding\_Experience}} + 0.5 \times \frac{\text{Remaining\_Antisemitism\_Knowledge}}{\text{Initial\_Antisemitism\_Knowledge}} $$

In other words: Average two fractions—how much coding skill the teams kept and how much prior knowledge on antisemitism the teams kept (50/50).

4) Final Score

$$ \text{Final\_Score} = \text{Total\_Score} \times \text{Cohesion\_Multiplier} \times \text{Capacity\_Multiplier} $$

In Other words: We take the base score and scale it by the two multipliers. Teams that stayed intact and kept their skills get to keep more of their points..


Here is a fictional example of how to handle evaluation scores.

Inputs:

  • Total_Score = 115
  • Initial_Team_Size = 5
  • Remaining_Team_Members = 3 (2 dropouts)
  • Initial_Coding_Experience = 3, Remaining_Coding_Experience = 2
  • Initial_Antisemitism_Knowledge = 3, Remaining_Antisemitism_Knowledge = 1
  • α = 1.0

Step 1 — Cohesion Multiplier
\(\text{Cohesion\_Multiplier} = 1 + \left(\frac{3}{5} - 1\right) = 0.6\)

Step 2 — Capacity Multiplier
\(\text{Capacity\_Multiplier} = 0.5 \times \frac{2}{3} + 0.5 \times \frac{1}{3} = 0.3333 + 0.1667 \approx 0.5\)

Step 3 — Final Score
\(\text{Final\_Score} = 115 \times 0.6 \times 0.5 \approx 34.5\)

Interpretation:
While Team 2 produced strong deliverables, a 40% reduction in team size and loss of topical and technical expertise reduced their final ranking.