IRUEX Leaderboard
Welcome to the IRUEX Leaderboard!
This platform evaluates large language models based on Iran's University Entrance Exam subjects.
Explore the IRUEX Dataset on GitHub.
{
- "headers": [
- "T",
- "Model",
- "Average โฌ๏ธ",
- "Math",
- "Chemistry",
- "Physics",
- "Arabic",
- "English",
- "Religion",
- "Persian Literature",
- "Type",
- "Architecture",
- "Precision",
- "Hub License",
- "#Params (B)",
- "Hub โค๏ธ",
- "Available on the hub",
- "Model sha"
- "data": [
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/gemini_2.0_flash" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">gemini_2.0_flash</a>",
- 76.51,
- 81.18,
- 79.69,
- 93.68,
- 77,
- 94,
- 62,
- 48,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/deepseek_chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek_chat</a>",
- 70.05,
- 76.32,
- 69.58,
- 87.47,
- 60,
- 96,
- 56,
- 45,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/GPT_4o" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">GPT_4o</a>",
- 66.15,
- 52.22,
- 62.06,
- 79.76,
- 68,
- 96,
- 66,
- 39,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/LLaMA3.1_405B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">LLaMA3.1_405B</a>",
- 57.98,
- 41.47,
- 49.07,
- 72.32,
- 61,
- 94,
- 46,
- 42,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/GPT_4o_mini" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">GPT_4o_mini</a>",
- 55.57,
- 53.18,
- 49.3,
- 71.54,
- 52,
- 88,
- 42,
- 33,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/Qwen2_72B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen2_72B</a>",
- 54.73,
- 45.14,
- 45.85,
- 64.13,
- 55,
- 93,
- 43,
- 37,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/GPT_4" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">GPT_4</a>",
- 52.57,
- 30.91,
- 43.94,
- 51.17,
- 61,
- 95,
- 48,
- 38,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/LLaMA3.1_70B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">LLaMA3.1_70B</a>",
- 51.54,
- 41.37,
- 47.97,
- 61.46,
- 50,
- 87,
- 46,
- 27,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/Gemma2_27B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Gemma2_27B</a>",
- 47.36,
- 27.07,
- 40.66,
- 53.77,
- 50,
- 88,
- 40,
- 32,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/LLaMA3_70B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">LLaMA3_70B</a>",
- 45.24,
- 31.06,
- 37.47,
- 51.12,
- 49,
- 87,
- 39,
- 22,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/Mixtral_8x22B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Mixtral_8x22B</a>",
- 42.64,
- 29.99,
- 32.33,
- 43.16,
- 44,
- 83,
- 44,
- 22,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/Gemma2_9B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Gemma2_9B</a>",
- 38.89,
- 21.28,
- 31.61,
- 47.33,
- 41,
- 81,
- 32,
- 18,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/GPT_3.5" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">GPT_3.5</a>",
- 35.69,
- 25.45,
- 29.02,
- 38.35,
- 30,
- 72,
- 24,
- 31,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "๐ข",
- "<a target="_blank" href="https://huggingface.co/LLaMA3.1_8B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">LLaMA3.1_8B</a>",
- 29.96,
- 11.34,
- 16.76,
- 25.62,
- 27,
- 78,
- 26,
- 25,
- "pretrained",
- "?",
- "float16",
- "custom",
- 0.1,
- 0,
- false,
- "fake"
- [
- "metadata": null
Evaluation Process
We assess models across various subjects, including Math, Chemistry, Physics, Arabic, English, Religion, and Persian Literature. Each model's performance is measured using accuracy metrics specific to each subject.
Reproducibility
To reproduce our results, execute the following commands:
# Example command to run evaluations
python evaluate_model.py --model_name your_model_name --task Math --num_fewshot 0
model | revision | private | precision | weight_type | status |
---|---|---|---|---|---|
model | revision | private | precision | weight_type | status |
---|---|---|---|---|---|
model | revision | private | precision | weight_type | status |
---|
model | revision | private | precision | weight_type | status |
---|---|---|---|---|---|
model | revision | private | precision | weight_type | status |
---|
โ๏ธโจ Submit your model here!
Model type
Precision
Weights type