
Deploying a good clinical LLM requires more than just acing closed-ended medical QA exams. It needs to be safe, ethical, comprehensive in its responses, and capable of reasoning and tackling complex medical tasks. The MEDIC framework aims to provide a transparent and comprehensive evaluation of LLM performance across various clinically relevant dimensions.
Disclaimer: It is important to note that the purpose of this evaluation is purely academic and exploratory. The models assessed here have not been approved for clinical use, and their results should not be interpreted as clinically validated. The leaderboard serves as a platform for researchers to compare models, understand their strengths and limitations, and drive further advancements in the field of clinical NLP.
🟦 🏥 | 2124 | +18/-20 | 8.95 | +0.02/-0.01 |
🟦 🏥 | 2124 | +18/-20 | 8.95 | +0.02/-0.01 | false | false | ? | instruction-tuned | Original | cc-by-nc-4.0 | 1149 | 122.61 | 2024-10-25 07:09:19+00:00 |
⭕ | 2124 | +18/-20 | 8.95 | +0.02/-0.01 | false | true | ? | instruction-tuned | Original | llama3.1 | 1149 | 70.55 | 2024-10-25 07:09:19+00:00 | |
🟦 🏥 | 1907 | +28/-21 | 8.8 | +0.01/-0.01 | true | true | ? | preference-tuned | Original | llama3 | 34 | 70.55 | 2024-10-24 06:24:59+00:00 | |
⭕ | 1880 | +21/-18 | 8.75 | +0.02/-0.02 | false | true | ? | instruction-tuned | Original | other | 1980 | 685 | 2024-10-22 23:04:13+00:00 | |
⭕ | 1878 | +27/-18 | 8.72 | +0.02/-0.02 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 66 | 32.3 | 2024-10-25 07:13:05+00:00 | |
🟦 | 1862 | +26/-20 | 8.67 | +0.02/-0.02 | false | true | ? | preference-tuned | Original | llama3.1 | 616 | 70.55 | 2024-10-24 06:24:17+00:00 | |
🟦 | 1842 | +23/-18 | 8.65 | +0.02/-0.02 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-20 10:32:48+00:00 | |
🟦 | 1842 | +21/-20 | 8.64 | +0.02/-0.02 | false | true | ? | preference-tuned | Original | llama3 | 1417 | 70.55 | 2024-10-24 13:25:47+00:00 | |
🟦 | 1792 | +22/-19 | 8.42 | +0.03/-0.03 | false | true | ? | preference-tuned | Original | llama3.1 | 2845 | 8.03 | 2024-07-24 14:33:56+00:00 | |
🟦 | 1771 | +23/-19 | 8.4 | +0.03/-0.03 | false | true | ? | preference-tuned | Original | llama3.2 | 402 | 3.21 | 2024-10-24 06:23:04+00:00 | |
⭕ | 1741 | +27/-24 | 8.49 | +0.02/-0.02 | false | true | ? | instruction-tuned | Original | null | 0 | -1 | 2025-01-17 12:10:32+00:00 | |
🟦 | 1675 | +27/-21 | 8.34 | +0.03/-0.03 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:07:42+00:00 | |
🟦 | 1669 | +27/-30 | 8.26 | +0.04/-0.03 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:05+00:00 | |
🟦 | 1661 | +27/-24 | 8.24 | +0.03/-0.03 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:06:51+00:00 | |
⭕ | 1654 | +24/-22 | 8.34 | +0.02/-0.03 | false | true | ? | instruction-tuned | Original | other | 808 | 122.61 | 2024-11-25 11:27:40+00:00 | |
🟦 | 1637 | +33/-26 | 8.15 | +0.03/-0.02 | false | true | ? | preference-tuned | Original | mit | 110 | 9.24 | 2024-10-25 07:11:14+00:00 | |
⭕ | 1617 | +30/-26 | 7.91 | +0.05/-0.03 | false | true | ? | instruction-tuned | Original | apache-2.0 | 41 | 6.06 | 2024-10-22 23:04:13+00:00 | |
🟦 | 1613 | +27/-24 | 7.56 | +0.06/-0.06 | false | true | ? | preference-tuned | Original | llama3.2 | 416 | 1.24 | 2024-10-24 07:45:03+00:00 | |
🟦 | 1597 | +28/-26 | 8.15 | +0.03/-0.03 | false | true | ? | preference-tuned | Original | other | 675 | 72.71 | 2024-11-14 11:37:18+00:00 | |
🟦 | 1594 | +30/-23 | 8.17 | +0.03/-0.02 | false | true | ? | preference-tuned | Original | other | 343 | 72.71 | 2024-10-22 14:35:49+00:00 | |
🟦 | 1578 | +25/-28 | 7.87 | +0.04/-0.03 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:22+00:00 | |
🟦 | 1562 | +26/-18 | 8.02 | +0.03/-0.03 | false | true | ? | preference-tuned | Original | llama3 | 254 | 8.03 | 2024-12-10 09:38:34+00:00 | |
⭕ | 1554 | +33/-26 | 7.78 | +0.04/-0.03 | false | true | ? | instruction-tuned | Original | llama3.1 | 164 | 8.03 | 2024-11-14 11:35:17+00:00 | |
🟦 | 1518 | +25/-26 | 7.87 | +0.03/-0.03 | false | true | ? | preference-tuned | Original | apache-2.0 | 274 | 7.62 | 2024-11-14 11:36:44+00:00 | |
🟢 🏥 | 1512 | +25/-26 | 7.59 | +0.04/-0.04 | true | true | ? | pretrained | Original | null | 9 | 70.55 | 2024-11-11 13:58:37+00:00 | |
🟦 🏥 | 1511 | +26/-18 | 7.84 | +0.04/-0.04 | true | true | ? | preference-tuned | Original | llama3 | 339 | 70 | 2024-07-24 14:33:56+00:00 | |
🟦 | 1499 | +24/-25 | 7.64 | +0.03/-0.03 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-03-06 02:18:06+00:00 | |
🟦 | 1478 | +32/-21 | 7.61 | +0.04/-0.04 | false | true | ? | preference-tuned | Original | apache-2.0 | 67 | 14.77 | 2024-12-10 07:27:22+00:00 | |
🟦 | 1476 | +34/-30 | 6.28 | +0.08/-0.08 | false | true | ? | preference-tuned | Original | apache-2.0 | 239 | 32.76 | 2024-11-28 04:57:07+00:00 | |
⭕ | 1469 | +27/-22 | 7.69 | +0.04/-0.03 | false | true | ? | instruction-tuned | Original | apache-2.0 | 1131 | 7.25 | 2024-11-14 11:38:25+00:00 | |
🟦 | 1456 | +29/-24 | 7.41 | +0.05/-0.04 | false | true | ? | preference-tuned | Original | other | 23 | 7.46 | 2024-12-19 05:59:29+00:00 | |
🟦 | 1452 | +23/-23 | 6.94 | +0.05/-0.06 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:42+00:00 | |
🟦 | 1432 | +23/-24 | 7.55 | +0.04/-0.03 | false | true | ? | preference-tuned | Original | llama3 | 411 | 8.03 | 2024-12-10 10:10:16+00:00 | |
⭕ | 1407 | +25/-22 | 7.46 | +0.04/-0.03 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 613 | 10.73 | 2024-10-22 22:52:54+00:00 | |
🟢 | 1388 | +26/-24 | 7.19 | +0.05/-0.03 | false | true | ? | pretrained | Original | other | 39 | 72.71 | 2024-11-14 11:37:02+00:00 | |
🟦 | 1385 | +21/-21 | 7.2 | +0.04/-0.04 | false | true | ? | preference-tuned | Original | other | 58 | 3.09 | 2024-10-22 11:25:51+00:00 | |
🟦 | 1382 | +26/-27 | 7.03 | +0.05/-0.04 | false | true | ? | preference-tuned | Original | other | 12 | 3.23 | 2024-12-19 06:00:40+00:00 | |
🟦 | 1314 | +25/-23 | 6.51 | +0.07/-0.04 | false | true | ? | preference-tuned | Original | other | 21 | 1.67 | 2024-12-19 06:01:10+00:00 | |
🟢 | 1309 | +30/-25 | 5.66 | +0.08/-0.07 | false | false | ? | pretrained | Original | unknown | 210 | 11.1 | 2024-10-29 07:23:16+00:00 | |
🟦 | 1308 | +30/-23 | 6.71 | +0.04/-0.05 | false | true | ? | preference-tuned | Original | other | 40 | 10.31 | 2024-12-19 05:58:51+00:00 | |
⭕ | 1247 | +23/-24 | 6.49 | +0.05/-0.03 | false | true | ? | instruction-tuned | Original | apache-2.0 | 346 | 1.71 | 2024-11-22 10:44:37+00:00 | |
🟢 | 1134 | +26/-25 | 4.03 | +0.08/-0.09 | false | true | ? | pretrained | Original | apache-2.0 | 67 | 7.62 | 2024-11-14 11:36:22+00:00 | |
🟦 | 1057 | +28/-22 | 3.75 | +0.05/-0.05 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:09:02+00:00 | |
🟦 | 1026 | +29/-28 | 4.17 | +0.06/-0.05 | false | true | ? | preference-tuned | Original | apache-2.0 | 60 | 0.36 | 2024-12-10 08:36:15+00:00 | |
🟢 | 946 | +25/-21 | 2.44 | +0.06/-0.05 | false | false | ? | pretrained | Original | apache-2.0 | 101 | 0.49 | 2024-10-22 13:46:13+00:00 | |
⭕ | 917 | +18/-19 | 2.76 | +0.05/-0.03 | false | true | ? | instruction-tuned | Original | gemma | 44 | 9.24 | 2024-11-14 11:39:56+00:00 | |
🟢 🏥 | 846 | +33/-20 | 1.27 | +0.04/-0.04 | true | false | ? | pretrained | Original | llama3 | 37 | 8.03 | 2024-10-25 07:16:58+00:00 | |
🟦 | 767 | +28/-25 | 0.99 | +0.06/-0.05 | false | false | ? | preference-tuned | Original | other | 26 | 3.09 | 2024-10-22 13:17:21+00:00 |
🟦 🏥 | 1.02 | +0.005/-0.006 |
🟦 | 1.02 | +0.005/-0.006 | |
🟦 | 1.05 | +0.006/-0.005 | |
⭕ | 1.12 | +0.012/-0.012 | |
🟦 | 1.12 | +0.01/-0.01 | |
🟦 | 1.13 | +0.012/-0.012 | |
⭕ | 1.2 | +0.016/-0.019 | |
⭕ | 1.21 | +0.011/-0.012 | |
🟦 | 1.21 | +0.012/-0.015 | |
🟦 | 1.22 | +0.017/-0.017 | |
🟦 | 1.23 | +0.016/-0.014 | |
🟦 | 1.23 | +0.018/-0.017 | |
🟦 | 1.25 | +0.013/-0.014 | |
🟦 | 1.26 | +0.019/-0.018 | |
🟦 🏥 | 1.27 | +0.023/-0.016 | |
🟦 | 1.32 | +0.017/-0.019 | |
⭕ | 1.32 | +0.021/-0.019 | |
🟦 | 1.33 | +0.016/-0.018 | |
🟦 🏥 | 1.39 | +0.02/-0.024 | |
🟦 | 1.4 | +0.018/-0.03 | |
⭕ | 1.4 | +0.018/-0.015 | |
⭕ | 1.41 | +0.027/-0.027 | |
🟢 | 1.41 | +0.026/-0.019 | |
🟦 | 1.41 | +0.022/-0.024 | |
🟦 | 1.46 | +0.023/-0.023 | |
🟢 🏥 | 1.47 | +0.021/-0.019 | |
⭕ | 1.49 | +0.03/-0.022 | |
🟦 | 1.5 | +0.021/-0.016 | |
🟦 | 1.51 | +0.024/-0.031 | |
⭕ | 1.52 | +0.021/-0.02 | |
🟦 | 1.52 | +0.021/-0.025 | |
⭕ | 1.54 | +0.022/-0.027 | |
🟦 | 1.69 | +0.021/-0.02 | |
🟢 🏥 | 1.71 | +0.025/-0.025 | |
🟦 | 1.92 | +0.026/-0.022 | |
🟦 | 2.08 | +0.035/-0.051 | |
⭕ | 2.19 | +0.031/-0.039 | |
🟦 | 2.42 | +0.032/-0.035 | |
🟢 | 2.45 | +0.029/-0.038 | |
🟦 | 2.65 | +0.042/-0.032 | |
🟦 | 2.95 | +0.042/-0.037 |
🟦 🏥 | 1.02 | +0.005/-0.006 | 1.01 | 1.02 | 1.02 | 1.03 | 1.03 | 1.05 | 1.01 | 1.04 | 1.02 | false | false | ? | instruction-tuned | Original | cc-by-nc-4.0 | 1149 | 122.61 | 2024-11-28 04:57:07+00:00 |
🟦 | 1.02 | +0.005/-0.006 | 1.01 | 1.02 | 1 | 1.03 | 1.03 | 1.05 | 1.01 | 1.04 | 1.02 | false | true | ? | preference-tuned | Original | apache-2.0 | 239 | 32.76 | 2024-11-28 04:57:07+00:00 | |
🟦 | 1.05 | +0.006/-0.005 | 1.04 | 1.02 | 1.02 | 1.09 | 1.07 | 1.1 | 1.01 | 1.06 | 1.02 | false | true | ? | preference-tuned | Original | other | 23 | 7.46 | 2024-12-19 05:59:29+00:00 | |
⭕ | 1.12 | +0.012/-0.012 | 1.08 | 1.09 | 1.12 | 1.18 | 1.12 | 1.22 | 1.07 | 1.12 | 1.11 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 66 | 32.3 | 2024-10-25 07:13:05+00:00 | |
🟦 | 1.12 | +0.01/-0.01 | 1.08 | 1.08 | 1.12 | 1.18 | 1.12 | 1.2 | 1.05 | 1.15 | 1.13 | false | true | ? | preference-tuned | Original | other | 675 | 72.71 | 2024-11-14 11:37:18+00:00 | |
🟦 | 1.13 | +0.012/-0.012 | 1.1 | 1.08 | 1.09 | 1.24 | 1.12 | 1.25 | 1.05 | 1.16 | 1.09 | false | true | ? | preference-tuned | Original | other | 21 | 1.67 | 2024-12-19 06:01:10+00:00 | |
⭕ | 1.2 | +0.016/-0.019 | 1.13 | 1.13 | 1.16 | 1.4 | 1.21 | 1.34 | 1.13 | 1.17 | 1.17 | false | true | ? | instruction-tuned | Original | llama3.1 | 164 | 8.03 | 2024-11-14 11:35:17+00:00 | |
⭕ | 1.21 | +0.011/-0.012 | 1.15 | 1.16 | 1.18 | 1.41 | 1.19 | 1.3 | 1.09 | 1.21 | 1.17 | false | true | ? | instruction-tuned | Original | llama3.1 | 1149 | 70.55 | 2024-10-25 07:09:19+00:00 | |
🟦 | 1.21 | +0.012/-0.015 | 1.16 | 1.13 | 1.24 | 1.37 | 1.22 | 1.29 | 1.12 | 1.22 | 1.15 | false | true | ? | preference-tuned | Original | other | 58 | 3.09 | 2024-10-22 11:25:51+00:00 | |
🟦 | 1.22 | +0.017/-0.017 | 1.11 | 1.11 | 1.12 | 1.56 | 1.2 | 1.37 | 1.1 | 1.17 | 1.21 | false | true | ? | preference-tuned | Original | llama3.2 | 416 | 1.24 | 2024-10-24 07:45:03+00:00 | |
🟦 | 1.23 | +0.016/-0.014 | 1.2 | 1.1 | 1.27 | 1.16 | 1.3 | 1.31 | 1.23 | 1.27 | 1.2 | false | true | ? | preference-tuned | Original | mit | 110 | 9.24 | 2024-10-25 07:11:14+00:00 | |
🟦 | 1.23 | +0.018/-0.017 | 1.17 | 1.12 | 1.24 | 1.34 | 1.26 | 1.34 | 1.15 | 1.28 | 1.18 | false | true | ? | preference-tuned | Original | llama3 | 1417 | 70.55 | 2024-10-24 13:25:47+00:00 | |
🟦 | 1.25 | +0.013/-0.014 | 1.21 | 1.19 | 1.19 | 1.39 | 1.27 | 1.4 | 1.09 | 1.31 | 1.23 | false | true | ? | preference-tuned | Original | other | 343 | 72.71 | 2024-10-22 14:35:49+00:00 | |
🟦 | 1.26 | +0.019/-0.018 | 1.18 | 1.14 | 1.27 | 1.42 | 1.28 | 1.41 | 1.13 | 1.26 | 1.25 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-20 10:32:48+00:00 | |
🟦 🏥 | 1.27 | +0.023/-0.016 | 1.24 | 1.19 | 1.27 | 1.39 | 1.25 | 1.38 | 1.13 | 1.39 | 1.17 | true | true | ? | preference-tuned | Original | llama3 | 339 | 70 | 2024-07-24 14:33:56+00:00 | |
🟦 | 1.32 | +0.017/-0.019 | 1.3 | 1.24 | 1.34 | 1.43 | 1.35 | 1.46 | 1.21 | 1.3 | 1.26 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-03-06 02:18:06+00:00 | |
⭕ | 1.32 | +0.021/-0.019 | 1.25 | 1.22 | 1.38 | 1.34 | 1.37 | 1.45 | 1.22 | 1.41 | 1.28 | false | true | ? | instruction-tuned | Original | other | 808 | 122.61 | 2024-11-25 11:27:40+00:00 | |
🟦 | 1.33 | +0.016/-0.018 | 1.27 | 1.22 | 1.3 | 1.57 | 1.31 | 1.46 | 1.16 | 1.36 | 1.34 | false | true | ? | preference-tuned | Original | llama3.3 | 632 | 70.55 | 2024-12-09 09:10:34+00:00 | |
🟦 🏥 | 1.39 | +0.02/-0.024 | 1.37 | 1.31 | 1.43 | 1.45 | 1.39 | 1.45 | 1.32 | 1.43 | 1.38 | true | true | ? | preference-tuned | Original | llama3 | 34 | 70.55 | 2024-10-24 06:24:59+00:00 | |
🟦 | 1.4 | +0.018/-0.03 | 1.23 | 1.33 | 1.25 | 1.97 | 1.33 | 1.53 | 1.23 | 1.46 | 1.29 | false | false | ? | preference-tuned | Original | apache-2.0 | 100 | 0.49 | 2024-11-18 11:36:27+00:00 | |
⭕ | 1.4 | +0.018/-0.015 | 1.36 | 1.32 | 1.44 | 1.46 | 1.4 | 1.53 | 1.31 | 1.46 | 1.35 | false | true | ? | instruction-tuned | Original | other | 1980 | 685 | 2024-10-22 23:04:13+00:00 | |
⭕ | 1.41 | +0.027/-0.027 | 1.38 | 1.22 | 1.43 | 1.55 | 1.48 | 1.52 | 1.32 | 1.42 | 1.34 | false | true | ? | instruction-tuned | Original | gemma | 44 | 9.24 | 2024-11-14 11:39:56+00:00 | |
🟢 | 1.41 | +0.026/-0.019 | 1.29 | 1.27 | 1.37 | 1.81 | 1.35 | 1.57 | 1.21 | 1.46 | 1.34 | false | true | ? | pretrained | Original | other | 39 | 72.71 | 2024-11-14 11:37:02+00:00 | |
🟦 | 1.41 | +0.022/-0.024 | 1.36 | 1.2 | 1.42 | 1.72 | 1.38 | 1.55 | 1.24 | 1.41 | 1.39 | false | true | ? | preference-tuned | Original | llama3.1 | 2845 | 8.03 | 2024-07-24 14:33:56+00:00 | |
🟦 | 1.46 | +0.023/-0.023 | 1.34 | 1.31 | 1.4 | 1.81 | 1.42 | 1.58 | 1.37 | 1.42 | 1.46 | false | true | ? | preference-tuned | Original | llama3.2 | 402 | 3.21 | 2024-10-24 06:23:04+00:00 | |
🟢 🏥 | 1.47 | +0.021/-0.019 | 1.4 | 1.39 | 1.44 | 1.71 | 1.46 | 1.59 | 1.43 | 1.46 | 1.37 | true | true | ? | pretrained | Original | null | 9 | 70.55 | 2024-11-11 13:58:37+00:00 | |
⭕ | 1.49 | +0.03/-0.022 | 1.37 | 1.43 | 1.52 | 1.73 | 1.45 | 1.53 | 1.46 | 1.5 | 1.42 | false | true | ? | instruction-tuned | Original | apache-2.0 | 41 | 6.06 | 2024-10-22 23:04:13+00:00 | |
🟦 | 1.5 | +0.021/-0.016 | 1.44 | 1.45 | 1.47 | 1.62 | 1.53 | 1.63 | 1.29 | 1.56 | 1.51 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:06:51+00:00 | |
🟦 | 1.51 | +0.024/-0.031 | 1.42 | 1.33 | 1.5 | 1.75 | 1.51 | 1.55 | 1.44 | 1.57 | 1.49 | false | true | ? | preference-tuned | Original | llama3 | 254 | 8.03 | 2024-12-10 09:38:34+00:00 | |
⭕ | 1.52 | +0.021/-0.02 | 1.49 | 1.39 | 1.54 | 1.58 | 1.58 | 1.54 | 1.55 | 1.51 | 1.5 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 613 | 10.73 | 2024-10-22 22:52:54+00:00 | |
🟦 | 1.52 | +0.021/-0.025 | 1.37 | 1.38 | 1.53 | 1.94 | 1.52 | 1.69 | 1.27 | 1.46 | 1.53 | false | true | ? | preference-tuned | Original | llama3.1 | 616 | 70.55 | 2024-10-24 06:24:17+00:00 | |
⭕ | 1.54 | +0.022/-0.027 | 1.49 | 1.43 | 1.57 | 1.63 | 1.65 | 1.51 | 1.56 | 1.57 | 1.44 | false | true | ? | instruction-tuned | Original | apache-2.0 | 1131 | 7.25 | 2024-11-14 11:38:25+00:00 | |
🟦 | 1.69 | +0.021/-0.02 | 1.69 | 1.53 | 1.7 | 1.79 | 1.77 | 1.81 | 1.52 | 1.7 | 1.72 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:05+00:00 | |
🟢 🏥 | 1.71 | +0.025/-0.025 | 1.77 | 1.64 | 1.89 | 1.61 | 1.71 | 1.76 | 1.58 | 1.69 | 1.7 | true | false | ? | pretrained | Original | llama3 | 37 | 8.03 | 2024-10-25 07:16:58+00:00 | |
🟦 | 1.92 | +0.026/-0.022 | 1.87 | 1.8 | 1.92 | 2.05 | 1.93 | 1.93 | 1.88 | 1.94 | 1.93 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:42+00:00 | |
🟦 | 2.08 | +0.035/-0.051 | 2.04 | 1.93 | 2.02 | 2.38 | 2.22 | 1.86 | 2.42 | 1.93 | 1.92 | false | true | ? | preference-tuned | Original | llama3 | 411 | 8.03 | 2024-12-10 10:10:16+00:00 | |
⭕ | 2.19 | +0.031/-0.039 | 2.11 | 2.07 | 2.09 | 2.61 | 2.21 | 2.01 | 2.41 | 2.09 | 2.09 | false | true | ? | instruction-tuned | Original | apache-2.0 | 346 | 1.71 | 2024-11-22 10:44:37+00:00 | |
🟦 | 2.42 | +0.032/-0.035 | 2.32 | 2.51 | 2.33 | 2.51 | 2.43 | 2.31 | 2.55 | 2.43 | 2.37 | false | true | ? | preference-tuned | Original | apache-2.0 | 33 | 3.32 | 2024-12-10 10:39:41+00:00 | |
🟢 | 2.45 | +0.029/-0.038 | 2.36 | 2.37 | 2.26 | 2.87 | 2.49 | 2.22 | 2.77 | 2.37 | 2.39 | false | false | ? | pretrained | Original | unknown | 210 | 11.1 | 2024-10-29 07:23:16+00:00 | |
🟦 | 2.65 | +0.042/-0.032 | 2.67 | 2.52 | 2.65 | 2.86 | 2.75 | 2.56 | 2.69 | 2.65 | 2.53 | false | true | ? | preference-tuned | Original | apache-2.0 | 67 | 14.77 | 2024-12-10 07:27:22+00:00 | |
🟦 | 2.95 | +0.042/-0.037 | 2.74 | 2.96 | 2.93 | 3.2 | 2.93 | 2.97 | 3.02 | 2.78 | 3.05 | false | false | ? | preference-tuned | Original | other | 26 | 3.09 | 2024-10-22 13:17:21+00:00 |
- Coverage: Measures how thoroughly the summary covers the original document. A higher score means the summary includes more details from the original.
- Conformity: Also called the non-contradiction score, this checks if the summary avoids contradicting the original document. A higher score means the summary aligns better with the original.
- Consistency: Measures the level of non-hallucination, or how much the summary sticks to the facts in the document. A higher score means the summary is more factual and accurate.
- Conciseness: Measures how brief the summary is. A higher score means the summary is more concise. A negative score means the summary is longer than the original document.
- Overall Score: The average of coverage, conformity, consistency, and the harmonic mean of coverage and conciseness (if both are positive, otherwise 0).
🟦 🏥 | 87.82 | 64.31 | 95.78 | 99.61 | -218.26 |
⭕ | 87.82 | 64.31 | 95.78 | 99.61 | 72.32 | |
🟦 | 87.51 | 61.51 | 95.99 | 99.47 | 73.72 | |
⭕ | 87.44 | 67.28 | 95.9 | 99.52 | 66.52 | |
🟦 | 87.41 | 63.09 | 95.89 | 99.56 | 70.91 | |
⭕ | 87.09 | 61.74 | 97.16 | 99.2 | 68.42 | |
🟦 🏥 | 86.75 | 66.7 | 95.79 | 98.5 | 65.21 | |
🟦 | 86.21 | 56.12 | 96.13 | 99.14 | 72.77 | |
🟦 | 86.15 | 55.44 | 96.23 | 99.32 | 72.64 | |
🟦 | 85.84 | 57.35 | 96.13 | 98.78 | 68.93 | |
🟦 | 85.8 | 57.8 | 95.89 | 98.59 | 69.06 | |
🟦 | 85.76 | 53.52 | 96.13 | 99.3 | 73.2 | |
🟦 | 85.6 | 50.84 | 96.35 | 99.21 | 76.95 | |
🟦 | 85.59 | 56.58 | 95.45 | 98.74 | 70.05 | |
⭕ | 85.47 | 54.31 | 96.4 | 99.31 | 68.84 | |
🟦 | 85.43 | 49.13 | 96.36 | 99.71 | 77.76 | |
🟦 | 85.36 | 53.13 | 95.66 | 98.5 | 74.18 | |
⭕ | 85.19 | 53.17 | 96.57 | 97.96 | 71.62 | |
🟢 | 85.08 | 51.12 | 96.46 | 99.08 | 71.75 | |
🟦 | 84.86 | 46.65 | 96.73 | 99.21 | 78.97 | |
🟦 | 84.85 | 73.9 | 96.21 | 99.31 | 49.13 | |
⭕ | 84.68 | 55.94 | 97.04 | 96.21 | 66.59 | |
🟦 | 84.33 | 44.54 | 96.53 | 99.29 | 79.77 | |
🟦 | 84.31 | 51.62 | 96.71 | 97.47 | 68.16 | |
🟦 | 84.09 | 44.91 | 96.21 | 99.15 | 77.67 | |
🟢 | 84.07 | 51.09 | 97.58 | 99.16 | 60.65 | |
🟦 | 83.95 | 45.8 | 96.48 | 98.31 | 75.66 | |
🟦 | 83.79 | 54.71 | 96.44 | 96.75 | 62.11 | |
🟦 | 83.52 | 50.42 | 96.33 | 96.87 | 66.54 | |
⭕ | 83.26 | 41.57 | 96.79 | 98.13 | 80.65 | |
🟦 | 83.19 | 44.17 | 96.68 | 97.63 | 73.71 | |
🟦 | 83.06 | 45.3 | 96.61 | 96.02 | 75.24 | |
🟢 🏥 | 83.05 | 40.46 | 97.11 | 98.39 | 79.65 | |
🟦 🏥 | 82.93 | 40.34 | 96.81 | 98.51 | 79.23 | |
🟦 | 82.84 | 41.79 | 96.5 | 97.77 | 77.29 | |
⭕ | 82.32 | 41.76 | 95.92 | 97.37 | 75.1 | |
🟦 | 82.26 | 46.44 | 95.86 | 96.99 | 64.28 | |
🟦 | 81.66 | 44.32 | 97.41 | 97.62 | 57.24 | |
🟦 | 80.61 | 44.66 | 95.86 | 93.62 | 63.23 | |
🟦 | 80.35 | 34.13 | 96.06 | 97.03 | 80.61 | |
🟦 | 79.21 | 38.47 | 96.5 | 96.13 | 54.23 | |
⭕ | 73.66 | 16.29 | 97.72 | 95.57 | 92.16 | |
🟦 | 69.33 | 35.32 | 96.15 | 93.2 | 12.67 | |
🟢 | 64.87 | 60.94 | 97.61 | 97 | -48.55 | |
🟢 | 64.01 | 45.33 | 98.2 | 93.82 | -218.26 | |
🟦 | 62.64 | 92.95 | 98.5 | 89.42 | -172.93 | |
🟢 🏥 | 59.15 | 15.9 | 97.75 | 52.88 | 85.85 | |
🟦 | 54.69 | 2.1 | 99.49 | 64.59 | -237.7 |
🟦 🏥 | 87.82 | 64.31 | 95.78 | 99.61 | -218.26 | false | false | ? | instruction-tuned | Original | cc-by-nc-4.0 | 1980 | 122.61 | 2024-10-22 23:04:13+00:00 |
⭕ | 87.82 | 64.31 | 95.78 | 99.61 | 72.32 | false | true | ? | instruction-tuned | Original | other | 1980 | 685 | 2024-10-22 23:04:13+00:00 | |
🟦 | 87.51 | 61.51 | 95.99 | 99.47 | 73.72 | false | true | ? | preference-tuned | Original | other | 675 | 72.71 | 2024-11-14 11:37:18+00:00 | |
⭕ | 87.44 | 67.28 | 95.9 | 99.52 | 66.52 | false | true | ? | instruction-tuned | Original | null | 0 | -1 | 2025-01-17 12:10:32+00:00 | |
🟦 | 87.41 | 63.09 | 95.89 | 99.56 | 70.91 | false | true | ? | preference-tuned | Original | other | 343 | 72.71 | 2024-10-22 14:35:49+00:00 | |
⭕ | 87.09 | 61.74 | 97.16 | 99.2 | 68.42 | false | true | ? | instruction-tuned | Original | other | 808 | 122.61 | 2024-11-25 11:27:40+00:00 | |
🟦 🏥 | 86.75 | 66.7 | 95.79 | 98.5 | 65.21 | true | true | ? | preference-tuned | Original | llama3 | 34 | 70.55 | 2024-10-24 06:24:59+00:00 | |
🟦 | 86.21 | 56.12 | 96.13 | 99.14 | 72.77 | false | true | ? | preference-tuned | Original | apache-2.0 | 274 | 7.62 | 2024-11-14 11:36:44+00:00 | |
🟦 | 86.15 | 55.44 | 96.23 | 99.32 | 72.64 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:07:42+00:00 | |
🟦 | 85.84 | 57.35 | 96.13 | 98.78 | 68.93 | false | true | ? | preference-tuned | Original | apache-2.0 | 67 | 14.77 | 2024-12-10 07:27:22+00:00 | |
🟦 | 85.8 | 57.8 | 95.89 | 98.59 | 69.06 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:22+00:00 | |
🟦 | 85.76 | 53.52 | 96.13 | 99.3 | 73.2 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:05+00:00 | |
🟦 | 85.6 | 50.84 | 96.35 | 99.21 | 76.95 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:06:51+00:00 | |
🟦 | 85.59 | 56.58 | 95.45 | 98.74 | 70.05 | false | true | ? | preference-tuned | Original | other | 58 | 3.09 | 2024-10-22 11:25:51+00:00 | |
⭕ | 85.47 | 54.31 | 96.4 | 99.31 | 68.84 | false | true | ? | instruction-tuned | Original | apache-2.0 | 1131 | 7.25 | 2024-11-14 11:38:25+00:00 | |
🟦 | 85.43 | 49.13 | 96.36 | 99.71 | 77.76 | false | true | ? | preference-tuned | Original | other | 40 | 10.31 | 2024-12-19 05:58:51+00:00 | |
🟦 | 85.36 | 53.13 | 95.66 | 98.5 | 74.18 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:42+00:00 | |
⭕ | 85.19 | 53.17 | 96.57 | 97.96 | 71.62 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 66 | 32.3 | 2024-10-25 07:13:05+00:00 | |
🟢 | 85.08 | 51.12 | 96.46 | 99.08 | 71.75 | false | true | ? | pretrained | Original | apache-2.0 | 67 | 7.62 | 2024-11-14 11:36:22+00:00 | |
🟦 | 84.86 | 46.65 | 96.73 | 99.21 | 78.97 | false | true | ? | preference-tuned | Original | llama3 | 254 | 8.03 | 2024-12-10 09:38:34+00:00 | |
🟦 | 84.85 | 73.9 | 96.21 | 99.31 | 49.13 | false | true | ? | preference-tuned | Original | apache-2.0 | 239 | 32.76 | 2024-11-28 04:57:07+00:00 | |
⭕ | 84.68 | 55.94 | 97.04 | 96.21 | 66.59 | false | true | ? | instruction-tuned | Original | llama3.1 | 1149 | 70.55 | 2024-10-25 07:09:19+00:00 | |
🟦 | 84.33 | 44.54 | 96.53 | 99.29 | 79.77 | false | true | ? | preference-tuned | Original | other | 23 | 7.46 | 2024-12-19 05:59:29+00:00 | |
🟦 | 84.31 | 51.62 | 96.71 | 97.47 | 68.16 | false | true | ? | preference-tuned | Original | llama3.1 | 616 | 70.55 | 2024-10-24 06:24:17+00:00 | |
🟦 | 84.09 | 44.91 | 96.21 | 99.15 | 77.67 | false | true | ? | preference-tuned | Original | other | 12 | 3.23 | 2024-12-19 06:00:40+00:00 | |
🟢 | 84.07 | 51.09 | 97.58 | 99.16 | 60.65 | false | true | ? | pretrained | Original | other | 39 | 72.71 | 2024-11-14 11:37:02+00:00 | |
🟦 | 83.95 | 45.8 | 96.48 | 98.31 | 75.66 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-20 10:32:48+00:00 | |
🟦 | 83.79 | 54.71 | 96.44 | 96.75 | 62.11 | false | true | ? | preference-tuned | Original | llama3.1 | 2845 | 8.03 | 2024-07-24 14:33:56+00:00 | |
🟦 | 83.52 | 50.42 | 96.33 | 96.87 | 66.54 | false | true | ? | preference-tuned | Original | llama3.2 | 402 | 3.21 | 2024-10-24 06:23:04+00:00 | |
⭕ | 83.26 | 41.57 | 96.79 | 98.13 | 80.65 | false | true | ? | instruction-tuned | Original | llama3.1 | 164 | 8.03 | 2024-11-14 11:35:17+00:00 | |
🟦 | 83.19 | 44.17 | 96.68 | 97.63 | 73.71 | false | true | ? | preference-tuned | Original | llama3.3 | 632 | 70.55 | 2024-12-09 09:10:34+00:00 | |
🟦 | 83.06 | 45.3 | 96.61 | 96.02 | 75.24 | false | true | ? | preference-tuned | Original | llama3 | 1417 | 70.55 | 2024-10-24 13:25:47+00:00 | |
🟢 🏥 | 83.05 | 40.46 | 97.11 | 98.39 | 79.65 | true | true | ? | pretrained | Original | null | 9 | 70.55 | 2024-11-11 13:58:37+00:00 | |
🟦 🏥 | 82.93 | 40.34 | 96.81 | 98.51 | 79.23 | true | true | ? | preference-tuned | Original | llama3 | 339 | 70 | 2024-07-24 14:33:56+00:00 | |
🟦 | 82.84 | 41.79 | 96.5 | 97.77 | 77.29 | false | true | ? | preference-tuned | Original | mit | 110 | 9.24 | 2024-10-25 07:11:14+00:00 | |
⭕ | 82.32 | 41.76 | 95.92 | 97.37 | 75.1 | false | true | ? | instruction-tuned | Original | apache-2.0 | 346 | 1.71 | 2024-11-22 10:44:37+00:00 | |
🟦 | 82.26 | 46.44 | 95.86 | 96.99 | 64.28 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:09:02+00:00 | |
🟦 | 81.66 | 44.32 | 97.41 | 97.62 | 57.24 | false | false | ? | preference-tuned | Original | other | 26 | 3.09 | 2024-10-22 13:17:21+00:00 | |
🟦 | 80.61 | 44.66 | 95.86 | 93.62 | 63.23 | false | true | ? | preference-tuned | Original | llama3.2 | 416 | 1.24 | 2024-10-24 07:45:03+00:00 | |
🟦 | 80.35 | 34.13 | 96.06 | 97.03 | 80.61 | false | true | ? | preference-tuned | Original | other | 21 | 1.67 | 2024-12-19 06:01:10+00:00 | |
🟦 | 79.21 | 38.47 | 96.5 | 96.13 | 54.23 | false | false | ? | preference-tuned | Original | apache-2.0 | 100 | 0.49 | 2024-11-18 11:36:27+00:00 | |
⭕ | 73.66 | 16.29 | 97.72 | 95.57 | 92.16 | false | true | ? | instruction-tuned | Original | gemma | 44 | 9.24 | 2024-11-14 11:39:56+00:00 | |
🟦 | 69.33 | 35.32 | 96.15 | 93.2 | 12.67 | false | true | ? | preference-tuned | Original | apache-2.0 | 60 | 0.36 | 2024-12-10 08:36:15+00:00 | |
🟢 | 64.87 | 60.94 | 97.61 | 97 | -48.55 | false | false | ? | pretrained | Original | unknown | 210 | 11.1 | 2024-10-29 07:23:16+00:00 | |
🟢 | 64.01 | 45.33 | 98.2 | 93.82 | -218.26 | false | false | ? | pretrained | Original | apache-2.0 | 101 | 0.49 | 2024-10-22 13:46:13+00:00 | |
🟦 | 62.64 | 92.95 | 98.5 | 89.42 | -172.93 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-03-06 02:18:06+00:00 | |
🟢 🏥 | 59.15 | 15.9 | 97.75 | 52.88 | 85.85 | true | false | ? | pretrained | Original | llama3 | 37 | 8.03 | 2024-10-25 07:16:58+00:00 | |
🟦 | 54.69 | 2.1 | 99.49 | 64.59 | -237.7 | false | true | ? | preference-tuned | Original | apache-2.0 | 33 | 3.32 | 2024-12-10 10:39:41+00:00 |
- Coverage: Measures how thoroughly the summary covers the original document. A higher score means the summary includes more details from the original.
- Conformity: Also called the non-contradiction score, this checks if the summary avoids contradicting the original document. A higher score means the summary aligns better with the original.
- Consistency: Measures the level of non-hallucination, or how much the summary sticks to the facts in the document. A higher score means the summary is more factual and accurate.
- Overall Score: The average of the above three scores.
🟦 🏥 | 95.14 | 86.25 | 97.42 | 98.33 |
🟦 🏥 | 95.14 | 90 | 97.42 | 98 | |
⭕ | 93.94 | 86.25 | 97.58 | 98 | |
⭕ | 93.89 | 85.33 | 97.33 | 99 | |
🟦 | 93.85 | 85.48 | 97.75 | 98.33 | |
🟦 | 93.28 | 88.25 | 96.42 | 95.17 | |
🟦 | 93.16 | 85.07 | 96.58 | 97.83 | |
⭕ | 92.91 | 83.56 | 96 | 99.17 | |
⭕ | 92.88 | 84.16 | 95.82 | 98.67 | |
⭕ | 92.64 | 82.16 | 96.58 | 99.17 | |
🟦 | 92.61 | 83.25 | 96.58 | 98 | |
🟦 | 92.33 | 80.66 | 96.67 | 99.67 | |
🟦 | 92.21 | 80.65 | 97.16 | 98.83 | |
🟦 | 92.02 | 79.73 | 96.99 | 99.33 | |
🟦 | 91.99 | 79.64 | 96.83 | 99.5 | |
🟦 | 91.85 | 79.82 | 96.57 | 99.17 | |
🟦 | 91.8 | 81.57 | 96.67 | 97.17 | |
⭕ | 91.53 | 79.75 | 97 | 97.83 | |
⭕ | 91.49 | 79.06 | 95.75 | 99.67 | |
🟦 | 91.16 | 79.15 | 96.17 | 98.17 | |
🟦 | 91.08 | 76.91 | 97.33 | 99 | |
🟦 | 90.94 | 77.91 | 96.25 | 98.67 | |
🟦 | 90.8 | 76.81 | 96.42 | 99.17 | |
🟦 🏥 | 90.66 | 75.65 | 96.67 | 99.67 | |
🟦 | 90.66 | 75.64 | 97 | 99.33 | |
🟦 | 90.5 | 75.92 | 96.42 | 99.17 | |
🟦 | 90.45 | 74.98 | 97.08 | 99.29 | |
🟦 | 90.29 | 74.31 | 96.74 | 99.83 | |
🟦 | 90.1 | 76.06 | 96.58 | 97.67 | |
🟢 | 89.66 | 74.73 | 95.92 | 98.33 | |
🟦 | 89.63 | 75.06 | 96.32 | 97.5 | |
🟦 | 89.2 | 73.71 | 95.73 | 98.17 | |
🟢 🏥 | 89.17 | 70.42 | 97.08 | 100 | |
🟦 | 88.91 | 74.73 | 94.17 | 97.83 | |
🟢 | 88.83 | 70.25 | 96.58 | 99.67 | |
🟦 | 87.06 | 69.11 | 94.74 | 97.33 | |
🟦 | 86.8 | 68.24 | 94.49 | 97.67 | |
🟦 | 85.32 | 66.68 | 94.46 | 94.83 | |
🟢 | 84.18 | 57.53 | 96.67 | 98.33 | |
⭕ | 82.87 | 63.04 | 93.9 | 91.67 | |
🟦 | 82.57 | 64.56 | 92.31 | 90.83 | |
🟦 | 80.21 | 48.9 | 96.07 | 95.67 | |
🟦 | 79.82 | 52.29 | 94.33 | 92.83 | |
🟦 | 78.54 | 44.37 | 98.08 | 93.17 | |
🟢 🏥 | 71.44 | 47.33 | 95.67 | 71.33 | |
🟢 | 70 | 21.95 | 97.82 | 90.21 | |
⭕ | 69.17 | 10.69 | 99.5 | 97.33 | |
🟦 | 56.67 | 1.58 | 99.33 | 69.08 |
🟦 🏥 | 95.14 | 86.25 | 97.42 | 98.33 | false | false | ? | instruction-tuned | Original | cc-by-nc-4.0 | 1149 | 122.61 | 2024-10-24 06:24:59+00:00 |
🟦 🏥 | 95.14 | 90 | 97.42 | 98 | true | true | ? | preference-tuned | Original | llama3 | 34 | 70.55 | 2024-10-24 06:24:59+00:00 | |
⭕ | 93.94 | 86.25 | 97.58 | 98 | false | true | ? | instruction-tuned | Original | llama3.1 | 1149 | 70.55 | 2024-10-25 07:09:19+00:00 | |
⭕ | 93.89 | 85.33 | 97.33 | 99 | false | true | ? | instruction-tuned | Original | other | 1980 | 685 | 2024-10-22 23:04:13+00:00 | |
🟦 | 93.85 | 85.48 | 97.75 | 98.33 | false | true | ? | preference-tuned | Original | apache-2.0 | 239 | 32.76 | 2024-11-28 04:57:07+00:00 | |
🟦 | 93.28 | 88.25 | 96.42 | 95.17 | false | true | ? | preference-tuned | Original | other | 343 | 72.71 | 2024-10-22 14:35:49+00:00 | |
🟦 | 93.16 | 85.07 | 96.58 | 97.83 | false | true | ? | preference-tuned | Original | mit | 110 | 9.24 | 2024-10-25 07:11:14+00:00 | |
⭕ | 92.91 | 83.56 | 96 | 99.17 | false | true | ? | instruction-tuned | Original | llama3.1 | 164 | 8.03 | 2024-11-14 11:35:17+00:00 | |
⭕ | 92.88 | 84.16 | 95.82 | 98.67 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 66 | 32.3 | 2024-10-25 07:13:05+00:00 | |
⭕ | 92.64 | 82.16 | 96.58 | 99.17 | false | true | ? | instruction-tuned | Original | null | 0 | -1 | 2025-01-17 12:10:32+00:00 | |
🟦 | 92.61 | 83.25 | 96.58 | 98 | false | true | ? | preference-tuned | Original | apache-2.0 | 274 | 7.62 | 2024-11-14 11:36:44+00:00 | |
🟦 | 92.33 | 80.66 | 96.67 | 99.67 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:06:51+00:00 | |
🟦 | 92.21 | 80.65 | 97.16 | 98.83 | false | true | ? | preference-tuned | Original | llama3 | 254 | 8.03 | 2024-12-10 09:38:34+00:00 | |
🟦 | 92.02 | 79.73 | 96.99 | 99.33 | false | true | ? | preference-tuned | Original | llama3.1 | 616 | 70.55 | 2024-10-24 06:24:17+00:00 | |
🟦 | 91.99 | 79.64 | 96.83 | 99.5 | false | true | ? | preference-tuned | Original | llama3.3 | 632 | 70.55 | 2024-12-09 09:10:34+00:00 | |
🟦 | 91.85 | 79.82 | 96.57 | 99.17 | false | true | ? | preference-tuned | Original | llama3.1 | 2845 | 8.03 | 2024-07-24 14:33:56+00:00 | |
🟦 | 91.8 | 81.57 | 96.67 | 97.17 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-03-06 02:18:06+00:00 | |
⭕ | 91.53 | 79.75 | 97 | 97.83 | false | true | ? | instruction-tuned | Original | other | 808 | 122.61 | 2024-11-25 11:27:40+00:00 | |
⭕ | 91.49 | 79.06 | 95.75 | 99.67 | false | true | ? | instruction-tuned | Original | apache-2.0 | 1131 | 7.25 | 2024-11-14 11:38:25+00:00 | |
🟦 | 91.16 | 79.15 | 96.17 | 98.17 | false | true | ? | preference-tuned | Original | other | 40 | 10.31 | 2024-12-19 05:58:51+00:00 | |
🟦 | 91.08 | 76.91 | 97.33 | 99 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-20 10:32:48+00:00 | |
🟦 | 90.94 | 77.91 | 96.25 | 98.67 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:22+00:00 | |
🟦 | 90.8 | 76.81 | 96.42 | 99.17 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:05+00:00 | |
🟦 🏥 | 90.66 | 75.65 | 96.67 | 99.67 | true | true | ? | preference-tuned | Original | llama3 | 339 | 70 | 2024-07-24 14:33:56+00:00 | |
🟦 | 90.66 | 75.64 | 97 | 99.33 | false | true | ? | preference-tuned | Original | llama3 | 1417 | 70.55 | 2024-10-24 13:25:47+00:00 | |
🟦 | 90.5 | 75.92 | 96.42 | 99.17 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:07:42+00:00 | |
🟦 | 90.45 | 74.98 | 97.08 | 99.29 | false | true | ? | preference-tuned | Original | other | 675 | 72.71 | 2024-11-14 11:37:18+00:00 | |
🟦 | 90.29 | 74.31 | 96.74 | 99.83 | false | true | ? | preference-tuned | Original | other | 23 | 7.46 | 2024-12-19 05:59:29+00:00 | |
🟦 | 90.1 | 76.06 | 96.58 | 97.67 | false | true | ? | preference-tuned | Original | apache-2.0 | 67 | 14.77 | 2024-12-10 07:27:22+00:00 | |
🟢 | 89.66 | 74.73 | 95.92 | 98.33 | false | true | ? | pretrained | Original | apache-2.0 | 67 | 7.62 | 2024-11-14 11:36:22+00:00 | |
🟦 | 89.63 | 75.06 | 96.32 | 97.5 | false | true | ? | preference-tuned | Original | llama3.2 | 402 | 3.21 | 2024-10-24 06:23:04+00:00 | |
🟦 | 89.2 | 73.71 | 95.73 | 98.17 | false | true | ? | preference-tuned | Original | other | 58 | 3.09 | 2024-10-22 11:25:51+00:00 | |
🟢 🏥 | 89.17 | 70.42 | 97.08 | 100 | true | true | ? | pretrained | Original | null | 9 | 70.55 | 2024-11-11 13:58:37+00:00 | |
🟦 | 88.91 | 74.73 | 94.17 | 97.83 | false | true | ? | preference-tuned | Original | other | 12 | 3.23 | 2024-12-19 06:00:40+00:00 | |
🟢 | 88.83 | 70.25 | 96.58 | 99.67 | false | true | ? | pretrained | Original | other | 39 | 72.71 | 2024-11-14 11:37:02+00:00 | |
🟦 | 87.06 | 69.11 | 94.74 | 97.33 | false | false | ? | preference-tuned | Original | other | 26 | 3.09 | 2024-10-22 13:17:21+00:00 | |
🟦 | 86.8 | 68.24 | 94.49 | 97.67 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:42+00:00 | |
🟦 | 85.32 | 66.68 | 94.46 | 94.83 | false | true | ? | preference-tuned | Original | llama3.2 | 416 | 1.24 | 2024-10-24 07:45:03+00:00 | |
🟢 | 84.18 | 57.53 | 96.67 | 98.33 | false | false | ? | pretrained | Original | unknown | 210 | 11.1 | 2024-10-29 07:23:16+00:00 | |
⭕ | 82.87 | 63.04 | 93.9 | 91.67 | false | true | ? | instruction-tuned | Original | apache-2.0 | 346 | 1.71 | 2024-11-22 10:44:37+00:00 | |
🟦 | 82.57 | 64.56 | 92.31 | 90.83 | false | true | ? | preference-tuned | Original | other | 21 | 1.67 | 2024-12-19 06:01:10+00:00 | |
🟦 | 80.21 | 48.9 | 96.07 | 95.67 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:09:02+00:00 | |
🟦 | 79.82 | 52.29 | 94.33 | 92.83 | false | false | ? | preference-tuned | Original | apache-2.0 | 100 | 0.49 | 2024-11-18 11:36:27+00:00 | |
🟦 | 78.54 | 44.37 | 98.08 | 93.17 | false | true | ? | preference-tuned | Original | apache-2.0 | 60 | 0.36 | 2024-12-10 08:36:15+00:00 | |
🟢 🏥 | 71.44 | 47.33 | 95.67 | 71.33 | true | false | ? | pretrained | Original | llama3 | 37 | 8.03 | 2024-10-25 07:16:58+00:00 | |
🟢 | 70 | 21.95 | 97.82 | 90.21 | false | false | ? | pretrained | Original | apache-2.0 | 101 | 0.49 | 2024-10-22 13:46:13+00:00 | |
⭕ | 69.17 | 10.69 | 99.5 | 97.33 | false | true | ? | instruction-tuned | Original | gemma | 44 | 9.24 | 2024-11-14 11:39:56+00:00 | |
🟦 | 56.67 | 1.58 | 99.33 | 69.08 | false | true | ? | preference-tuned | Original | apache-2.0 | 33 | 3.32 | 2024-12-10 10:39:41+00:00 |
🟢 🏥 | 94.23 | 89.33 | 97.45 | 95.92 |
⭕ | 94.23 | 89.33 | 97.45 | 95.92 | |
🟢 | 93.92 | 86.72 | 98.56 | 96.48 | |
⭕ | 93.89 | 88.91 | 97.32 | 95.44 | |
🟦 | 93.71 | 90.48 | 97.37 | 93.28 | |
🟢 🏥 | 93.34 | 85.57 | 98.28 | 96.16 | |
🟦 | 92.91 | 88.25 | 97.44 | 93.04 | |
🟦 | 92.9 | 85.57 | 97.76 | 95.36 | |
🟦 | 92.66 | 84.23 | 97.28 | 96.48 | |
⭕ | 92.61 | 84.06 | 97.52 | 96.24 | |
🟦 | 92.59 | 85.6 | 97.44 | 94.72 | |
🟦 | 92.16 | 86.99 | 97.2 | 92.3 | |
🟦 | 91.75 | 83.19 | 96.92 | 95.12 | |
🟦 | 91.59 | 85.26 | 97.76 | 91.74 | |
🟦 🏥 | 91.58 | 88.45 | 96.04 | 90.24 | |
🟦 | 91.55 | 80.48 | 97.68 | 96.48 | |
⭕ | 91.53 | 82.86 | 97.92 | 93.82 | |
🟦 | 91.2 | 86.28 | 97.12 | 90.2 | |
🟦 | 91.01 | 81.26 | 97.76 | 94 | |
🟦 | 90.84 | 76.57 | 98.04 | 97.92 | |
🟦 | 90.65 | 77.57 | 98.08 | 96.32 | |
🟢 | 90.65 | 81.88 | 97.1 | 92.96 | |
🟦 | 90.4 | 81.8 | 96.76 | 92.64 | |
🟦 🏥 | 90.3 | 75.87 | 97.84 | 97.2 | |
🟦 | 90.04 | 81.43 | 97.41 | 91.28 | |
⭕ | 89.6 | 80.16 | 96.79 | 91.84 | |
🟦 | 89.08 | 84.08 | 95.18 | 88 | |
⭕ | 88.82 | 78.15 | 97.04 | 91.26 | |
🟦 | 88.69 | 78.99 | 96.83 | 90.24 | |
⭕ | 88.37 | 81.43 | 97.13 | 86.56 | |
🟦 | 88.23 | 77.17 | 95.93 | 91.6 | |
🟦 | 88.18 | 73.81 | 96.5 | 94.24 | |
🟦 | 87.73 | 77 | 96.6 | 89.58 | |
🟦 | 87.24 | 74.01 | 96.35 | 91.36 | |
🟦 | 85.77 | 76.01 | 96.99 | 84.3 | |
🟢 | 85.77 | 66.64 | 97.94 | 92.72 | |
🟦 | 84.92 | 69.69 | 96.04 | 89.04 | |
🟦 | 84.33 | 64.5 | 97.14 | 91.36 | |
⭕ | 83.99 | 56.97 | 97.82 | 97.2 | |
🟦 | 82.4 | 63.68 | 96.17 | 87.36 | |
⭕ | 81.02 | 53.96 | 97.1 | 92 | |
🟢 | 80.24 | 49.69 | 97.83 | 93.2 | |
🟦 | 79.33 | 52.03 | 96.69 | 89.28 | |
🟦 | 76.66 | 47.84 | 96.58 | 85.58 | |
🟦 | 73.7 | 35.97 | 98.16 | 86.96 | |
🟦 | 71.02 | 48.86 | 97.64 | 66.56 | |
🟢 🏥 | 70.41 | 48.97 | 95.34 | 66.92 | |
🟦 | 55.79 | 1.4 | 99.56 | 66.41 |
🟢 🏥 | 94.23 | 89.33 | 97.45 | 95.92 | false | false | ? | instruction-tuned | Original | cc-by-nc-4.0 | 1980 | 122.61 | 2024-10-22 23:04:13+00:00 |
⭕ | 94.23 | 89.33 | 97.45 | 95.92 | false | true | ? | instruction-tuned | Original | other | 1980 | 685 | 2024-10-22 23:04:13+00:00 | |
🟢 | 93.92 | 86.72 | 98.56 | 96.48 | false | true | ? | pretrained | Original | other | 39 | 72.71 | 2024-11-14 11:37:02+00:00 | |
⭕ | 93.89 | 88.91 | 97.32 | 95.44 | false | true | ? | instruction-tuned | Original | null | 0 | -1 | 2025-01-17 12:10:32+00:00 | |
🟦 | 93.71 | 90.48 | 97.37 | 93.28 | false | true | ? | preference-tuned | Original | apache-2.0 | 239 | 32.76 | 2024-11-28 04:57:07+00:00 | |
🟢 🏥 | 93.34 | 85.57 | 98.28 | 96.16 | true | true | ? | pretrained | Original | null | 9 | 70.55 | 2024-11-11 13:58:37+00:00 | |
🟦 | 92.91 | 88.25 | 97.44 | 93.04 | false | true | ? | preference-tuned | Original | other | 343 | 72.71 | 2024-10-22 14:35:49+00:00 | |
🟦 | 92.9 | 85.57 | 97.76 | 95.36 | false | true | ? | preference-tuned | Original | other | 675 | 72.71 | 2024-11-14 11:37:18+00:00 | |
🟦 | 92.66 | 84.23 | 97.28 | 96.48 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-20 10:32:48+00:00 | |
⭕ | 92.61 | 84.06 | 97.52 | 96.24 | false | true | ? | instruction-tuned | Original | llama3.1 | 164 | 8.03 | 2024-11-14 11:35:17+00:00 | |
🟦 | 92.59 | 85.6 | 97.44 | 94.72 | false | true | ? | preference-tuned | Original | llama3 | 254 | 8.03 | 2024-12-10 09:38:34+00:00 | |
🟦 | 92.16 | 86.99 | 97.2 | 92.3 | false | true | ? | preference-tuned | Original | apache-2.0 | 274 | 7.62 | 2024-11-14 11:36:44+00:00 | |
🟦 | 91.75 | 83.19 | 96.92 | 95.12 | false | true | ? | preference-tuned | Original | other | 58 | 3.09 | 2024-10-22 11:25:51+00:00 | |
🟦 | 91.59 | 85.26 | 97.76 | 91.74 | false | true | ? | preference-tuned | Original | apache-2.0 | 67 | 14.77 | 2024-12-10 07:27:22+00:00 | |
🟦 🏥 | 91.58 | 88.45 | 96.04 | 90.24 | true | true | ? | preference-tuned | Original | llama3 | 34 | 70.55 | 2024-10-24 06:24:59+00:00 | |
🟦 | 91.55 | 80.48 | 97.68 | 96.48 | false | true | ? | preference-tuned | Original | llama3.3 | 632 | 70.55 | 2024-12-09 09:10:34+00:00 | |
⭕ | 91.53 | 82.86 | 97.92 | 93.82 | false | true | ? | instruction-tuned | Original | llama3.1 | 1149 | 70.55 | 2024-10-25 07:09:19+00:00 | |
🟦 | 91.2 | 86.28 | 97.12 | 90.2 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-03-06 02:18:06+00:00 | |
🟦 | 91.01 | 81.26 | 97.76 | 94 | false | true | ? | preference-tuned | Original | llama3.1 | 616 | 70.55 | 2024-10-24 06:24:17+00:00 | |
🟦 | 90.84 | 76.57 | 98.04 | 97.92 | false | false | ? | preference-tuned | Original | other | 26 | 3.09 | 2024-10-22 13:17:21+00:00 | |
🟦 | 90.65 | 77.57 | 98.08 | 96.32 | false | true | ? | preference-tuned | Original | llama3 | 1417 | 70.55 | 2024-10-24 13:25:47+00:00 | |
🟢 | 90.65 | 81.88 | 97.1 | 92.96 | false | true | ? | pretrained | Original | apache-2.0 | 67 | 7.62 | 2024-11-14 11:36:22+00:00 | |
🟦 | 90.4 | 81.8 | 96.76 | 92.64 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:06:51+00:00 | |
🟦 🏥 | 90.3 | 75.87 | 97.84 | 97.2 | true | true | ? | preference-tuned | Original | llama3 | 339 | 70 | 2024-07-24 14:33:56+00:00 | |
🟦 | 90.04 | 81.43 | 97.41 | 91.28 | false | true | ? | preference-tuned | Original | llama3.1 | 2845 | 8.03 | 2024-07-24 14:33:56+00:00 | |
⭕ | 89.6 | 80.16 | 96.79 | 91.84 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 66 | 32.3 | 2024-10-25 07:13:05+00:00 | |
🟦 | 89.08 | 84.08 | 95.18 | 88 | false | true | ? | preference-tuned | Original | mit | 110 | 9.24 | 2024-10-25 07:11:14+00:00 | |
⭕ | 88.82 | 78.15 | 97.04 | 91.26 | false | true | ? | instruction-tuned | Original | apache-2.0 | 1131 | 7.25 | 2024-11-14 11:38:25+00:00 | |
🟦 | 88.69 | 78.99 | 96.83 | 90.24 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:22+00:00 | |
⭕ | 88.37 | 81.43 | 97.13 | 86.56 | false | true | ? | instruction-tuned | Original | other | 808 | 122.61 | 2024-11-25 11:27:40+00:00 | |
🟦 | 88.23 | 77.17 | 95.93 | 91.6 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:05+00:00 | |
🟦 | 88.18 | 73.81 | 96.5 | 94.24 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:07:42+00:00 | |
🟦 | 87.73 | 77 | 96.6 | 89.58 | false | true | ? | preference-tuned | Original | llama3.2 | 402 | 3.21 | 2024-10-24 06:23:04+00:00 | |
🟦 | 87.24 | 74.01 | 96.35 | 91.36 | false | true | ? | preference-tuned | Original | other | 23 | 7.46 | 2024-12-19 05:59:29+00:00 | |
🟦 | 85.77 | 76.01 | 96.99 | 84.3 | false | true | ? | preference-tuned | Original | other | 40 | 10.31 | 2024-12-19 05:58:51+00:00 | |
🟢 | 85.77 | 66.64 | 97.94 | 92.72 | false | false | ? | pretrained | Original | unknown | 210 | 11.1 | 2024-10-29 07:23:16+00:00 | |
🟦 | 84.92 | 69.69 | 96.04 | 89.04 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:42+00:00 | |
🟦 | 84.33 | 64.5 | 97.14 | 91.36 | false | true | ? | preference-tuned | Original | llama3.2 | 416 | 1.24 | 2024-10-24 07:45:03+00:00 | |
⭕ | 83.99 | 56.97 | 97.82 | 97.2 | false | true | ? | instruction-tuned | Original | gemma | 44 | 9.24 | 2024-11-14 11:39:56+00:00 | |
🟦 | 82.4 | 63.68 | 96.17 | 87.36 | false | true | ? | preference-tuned | Original | other | 12 | 3.23 | 2024-12-19 06:00:40+00:00 | |
⭕ | 81.02 | 53.96 | 97.1 | 92 | false | true | ? | instruction-tuned | Original | apache-2.0 | 346 | 1.71 | 2024-11-22 10:44:37+00:00 | |
🟢 | 80.24 | 49.69 | 97.83 | 93.2 | false | false | ? | pretrained | Original | apache-2.0 | 101 | 0.49 | 2024-10-22 13:46:13+00:00 | |
🟦 | 79.33 | 52.03 | 96.69 | 89.28 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:09:02+00:00 | |
🟦 | 76.66 | 47.84 | 96.58 | 85.58 | false | true | ? | preference-tuned | Original | other | 21 | 1.67 | 2024-12-19 06:01:10+00:00 | |
🟦 | 73.7 | 35.97 | 98.16 | 86.96 | false | true | ? | preference-tuned | Original | apache-2.0 | 60 | 0.36 | 2024-12-10 08:36:15+00:00 | |
🟦 | 71.02 | 48.86 | 97.64 | 66.56 | false | false | ? | preference-tuned | Original | apache-2.0 | 100 | 0.49 | 2024-11-18 11:36:27+00:00 | |
🟢 🏥 | 70.41 | 48.97 | 95.34 | 66.92 | true | false | ? | pretrained | Original | llama3 | 37 | 8.03 | 2024-10-25 07:16:58+00:00 | |
🟦 | 55.79 | 1.4 | 99.56 | 66.41 | false | true | ? | preference-tuned | Original | apache-2.0 | 33 | 3.32 | 2024-12-10 10:39:41+00:00 |
🟢 🏥 | 80.02 | 86.97 | 65.33 | 70.95 | 79.26 | 91.95 | 80.6 | 86.38 |
🟢 🏥 | 80.02 | 86.97 | 64 | 70.95 | 79.26 | 91.95 | 80.6 | 86.38 | |
🟦 | 79.82 | 87.62 | 65.33 | 71.79 | 78.16 | 94.79 | 73.6 | 87.45 | |
🟦 | 79.47 | 86.34 | 64.24 | 72.01 | 78.32 | 94.6 | 73 | 87.77 | |
🟦 🏥 | 78.31 | 90.4 | 64.2 | 73.2 | 76.9 | 79 | 73.2 | 91.3 | |
🟦 🏥 | 78.28 | 86.49 | 61.09 | 72.82 | 79.42 | 83.73 | 79.2 | 85.21 | |
🟦 | 76.95 | 85.73 | 62.91 | 72.15 | 78.08 | 84.73 | 67.4 | 87.66 | |
🟦 | 75.73 | 83.04 | 58.91 | 65.91 | 74.47 | 88.48 | 71 | 88.3 | |
🟦 | 75.59 | 87.37 | 63.52 | 68.4 | 76.12 | 84.87 | 63.2 | 85.64 | |
⭕ | 75.2 | 81.68 | 60.36 | 70.69 | 76.67 | 91.67 | 58.6 | 86.7 | |
🟢 | 75.02 | 86.98 | 62.06 | 67.32 | 75.49 | 84.94 | 74.8 | 73.51 | |
🟦 | 72.6 | 85.17 | 61.45 | 67.97 | 72.9 | 75.47 | 61.6 | 83.62 | |
⭕ | 71.13 | 85.9 | 59.64 | 64.81 | 71.48 | 75.03 | 53 | 88.09 | |
🟦 | 70.57 | 81.83 | 60.85 | 61.46 | 69.36 | 73.68 | 61.6 | 85.21 | |
🟢 🏥 | 70.22 | 93.5 | 49.09 | 74.4 | 75.96 | 55.78 | 69 | 73.83 | |
🟦 | 69.85 | 83.15 | 59.15 | 63.73 | 69.52 | 75.84 | 49.6 | 87.98 | |
🟦 | 68.8 | 80.47 | 53.09 | 59.12 | 65.59 | 70.69 | 70.4 | 82.23 | |
🟦 | 67.2 | 73.4 | 49.9 | 58.4 | 62 | 68.2 | 76.2 | 82.3 | |
⭕ | 66.16 | 76.09 | 49.21 | 54.94 | 61.59 | 61.52 | 75.6 | 84.15 | |
⭕ | 65.99 | 73.07 | 50.79 | 57.9 | 63 | 69.5 | 62.8 | 84.89 | |
🟢 | 65.56 | 82.44 | 60.36 | 65.14 | 75.88 | 81.39 | 15.6 | 78.09 | |
⭕ | 65.46 | 77.65 | 50.55 | 59.14 | 65.36 | 69.17 | 50 | 86.38 | |
🟦 | 65.05 | 75.49 | 51.52 | 55.22 | 62.53 | 64.58 | 67 | 79.04 | |
🟦 | 63.2 | 67.76 | 38.06 | 52.81 | 55.77 | 74.41 | 70.6 | 82.98 | |
⭕ 🏥 | 63 | 73.21 | 44.85 | 61.61 | 65.2 | 61.06 | 77.2 | 57.87 | |
🟦 | 61.78 | 75.39 | 48.61 | 54.86 | 59.07 | 63.31 | 50.6 | 80.64 | |
🟦 | 61.14 | 76.81 | 48.48 | 52.93 | 59.62 | 59.4 | 49 | 81.7 | |
🟦 | 60.03 | 76.71 | 48 | 56.83 | 60.17 | 61.4 | 45.2 | 71.91 | |
🟢 | 59.84 | 73.38 | 45.33 | 53.84 | 57.66 | 57.69 | 55.8 | 75.21 | |
⭕ | 59.38 | 69.46 | 36.61 | 46.71 | 52 | 58.56 | 65.6 | 86.7 | |
🟦 | 58.83 | 68.24 | 39.03 | 50.82 | 53.89 | 56.98 | 58.2 | 84.68 | |
🟦 | 58.22 | 70.51 | 43.15 | 51.11 | 55.7 | 55.65 | 49 | 82.45 | |
🟦 | 57.45 | 70.83 | 39.39 | 52.35 | 54.99 | 54.48 | 53.2 | 76.91 | |
⭕ | 56.69 | 68.3 | 39.27 | 50.2 | 53.02 | 54.42 | 57 | 74.57 | |
⭕ | 55.01 | 65.23 | 35.03 | 45.85 | 45.95 | 41.85 | 69.2 | 81.91 | |
🟢 | 53.43 | 64.31 | 31.88 | 46.67 | 46.58 | 39.45 | 66.4 | 78.72 | |
⭕ | 53.42 | 65.06 | 34.79 | 46.31 | 49.25 | 50.63 | 45.8 | 82.13 | |
🟦 | 52.89 | 66.89 | 34.3 | 49.32 | 47.92 | 48.04 | 67.8 | 55.96 | |
🟦 | 52.61 | 69.82 | 38.55 | 47 | 50.51 | 55.28 | 49 | 58.09 | |
🟢 | 51.9 | 62.45 | 27.88 | 43.49 | 43.91 | 44.54 | 58.4 | 82.66 | |
⭕ | 50.45 | 64 | 35.15 | 42.55 | 44.23 | 46.93 | 61.8 | 58.51 | |
🟢 | 50.06 | 64.82 | 38.67 | 49.96 | 55.07 | 52.18 | 38 | 51.7 | |
🟦 | 49.62 | 67.99 | 34.79 | 49.15 | 48.78 | 51.55 | 29.2 | 65.85 | |
🟦 | 48.41 | 59.48 | 26.42 | 42.22 | 44.3 | 41.85 | 57.8 | 66.81 | |
🟦 | 47.12 | 49.82 | 22.79 | 38.97 | 41.32 | 53.32 | 57 | 66.6 | |
🟦 | 45.74 | 57.52 | 24.97 | 42.29 | 41.24 | 42.85 | 42.4 | 68.94 | |
⭕ | 42.39 | 50.61 | 18.67 | 37.37 | 36.06 | 28.19 | 69 | 56.81 | |
🟦 | 39.87 | 50.65 | 22.91 | 36.19 | 33.15 | 30.81 | 48.6 | 56.81 | |
🟦 | 37.97 | 45.15 | 15.27 | 35.43 | 33.23 | 28.61 | 49.6 | 58.51 | |
🟦 | 34.6 | 31.17 | 13.21 | 32.54 | 29.69 | 24.09 | 55.2 | 56.28 | |
🟢 | 34.56 | 38.88 | 12.24 | 29.12 | 29.93 | 18.46 | 56.4 | 56.91 | |
🟦 | 28.62 | 22.64 | 10.55 | 25.1 | 28.83 | 21.2 | 35.2 | 56.81 | |
🟦 | 28.33 | 22.65 | 10.79 | 24.41 | 25.92 | 19.61 | 38 | 56.91 |
🟢 🏥 | 80.02 | 86.97 | 65.33 | 70.95 | 79.26 | 91.95 | 80.6 | 86.38 | false | false | ? | instruction-tuned | Original | cc-by-nc-sa-4.0 | 1417 | 70.55 | 2024-11-11 13:58:37+00:00 |
🟢 🏥 | 80.02 | 86.97 | 64 | 70.95 | 79.26 | 91.95 | 80.6 | 86.38 | true | true | ? | pretrained | Original | null | 9 | 70.55 | 2024-11-11 13:58:37+00:00 | |
🟦 | 79.82 | 87.62 | 65.33 | 71.79 | 78.16 | 94.79 | 73.6 | 87.45 | false | true | ? | preference-tuned | Original | llama3.1 | 616 | 70.55 | 2024-10-24 06:24:17+00:00 | |
🟦 | 79.47 | 86.34 | 64.24 | 72.01 | 78.32 | 94.6 | 73 | 87.77 | false | true | ? | preference-tuned | Original | llama3.3 | 632 | 70.55 | 2024-12-09 09:10:34+00:00 | |
🟦 🏥 | 78.31 | 90.4 | 64.2 | 73.2 | 76.9 | 79 | 73.2 | 91.3 | true | true | ? | preference-tuned | Original | llama3 | 339 | 70 | 2024-07-24 14:33:56+00:00 | |
🟦 🏥 | 78.28 | 86.49 | 61.09 | 72.82 | 79.42 | 83.73 | 79.2 | 85.21 | true | true | ? | preference-tuned | Original | llama3 | 34 | 70.55 | 2024-10-24 06:24:59+00:00 | |
🟦 | 76.95 | 85.73 | 62.91 | 72.15 | 78.08 | 84.73 | 67.4 | 87.66 | false | true | ? | preference-tuned | Original | llama3 | 1417 | 70.55 | 2024-10-24 13:25:47+00:00 | |
🟦 | 75.73 | 83.04 | 58.91 | 65.91 | 74.47 | 88.48 | 71 | 88.3 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:06:51+00:00 | |
🟦 | 75.59 | 87.37 | 63.52 | 68.4 | 76.12 | 84.87 | 63.2 | 85.64 | false | true | ? | preference-tuned | Original | other | 343 | 72.71 | 2024-10-22 14:35:49+00:00 | |
⭕ | 75.2 | 81.68 | 60.36 | 70.69 | 76.67 | 91.67 | 58.6 | 86.7 | false | true | ? | instruction-tuned | Original | llama3.1 | 1149 | 70.55 | 2024-10-25 07:09:19+00:00 | |
🟢 | 75.02 | 86.98 | 62.06 | 67.32 | 75.49 | 84.94 | 74.8 | 73.51 | false | true | ? | pretrained | Original | other | 39 | 72.71 | 2024-11-14 11:37:02+00:00 | |
🟦 | 72.6 | 85.17 | 61.45 | 67.97 | 72.9 | 75.47 | 61.6 | 83.62 | false | true | ? | preference-tuned | Original | other | 675 | 72.71 | 2024-11-14 11:37:18+00:00 | |
⭕ | 71.13 | 85.9 | 59.64 | 64.81 | 71.48 | 75.03 | 53 | 88.09 | false | true | ? | instruction-tuned | Original | null | 0 | -1 | 2025-01-17 12:10:32+00:00 | |
🟦 | 70.57 | 81.83 | 60.85 | 61.46 | 69.36 | 73.68 | 61.6 | 85.21 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:07:42+00:00 | |
🟢 🏥 | 70.22 | 93.5 | 49.09 | 74.4 | 75.96 | 55.78 | 69 | 73.83 | true | false | ? | pretrained | Original | llama3 | 37 | 8.03 | 2024-10-25 07:16:58+00:00 | |
🟦 | 69.85 | 83.15 | 59.15 | 63.73 | 69.52 | 75.84 | 49.6 | 87.98 | false | true | ? | preference-tuned | Original | apache-2.0 | 239 | 32.76 | 2024-11-28 04:57:07+00:00 | |
🟦 | 68.8 | 80.47 | 53.09 | 59.12 | 65.59 | 70.69 | 70.4 | 82.23 | false | true | ? | preference-tuned | Original | apache-2.0 | 67 | 14.77 | 2024-12-10 07:27:22+00:00 | |
🟦 | 67.2 | 73.4 | 49.9 | 58.4 | 62 | 68.2 | 76.2 | 82.3 | false | true | ? | preference-tuned | Original | llama3.1 | 2845 | 8.03 | 2024-07-24 14:33:56+00:00 | |
⭕ | 66.16 | 76.09 | 49.21 | 54.94 | 61.59 | 61.52 | 75.6 | 84.15 | false | true | ? | instruction-tuned | Original | gemma | 44 | 9.24 | 2024-11-14 11:39:56+00:00 | |
⭕ | 65.99 | 73.07 | 50.79 | 57.9 | 63 | 69.5 | 62.8 | 84.89 | false | true | ? | instruction-tuned | Original | llama3.1 | 164 | 8.03 | 2024-11-14 11:35:17+00:00 | |
🟢 | 65.56 | 82.44 | 60.36 | 65.14 | 75.88 | 81.39 | 15.6 | 78.09 | false | false | ? | pretrained | Original | llama3.1 | 308 | 70.55 | 2024-11-14 11:33:15+00:00 | |
⭕ | 65.46 | 77.65 | 50.55 | 59.14 | 65.36 | 69.17 | 50 | 86.38 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 66 | 32.3 | 2024-10-25 07:13:05+00:00 | |
🟦 | 65.05 | 75.49 | 51.52 | 55.22 | 62.53 | 64.58 | 67 | 79.04 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:05+00:00 | |
🟦 | 63.2 | 67.76 | 38.06 | 52.81 | 55.77 | 74.41 | 70.6 | 82.98 | false | true | ? | preference-tuned | Original | llama3.2 | 402 | 3.21 | 2024-10-24 06:23:04+00:00 | |
⭕ 🏥 | 63 | 73.21 | 44.85 | 61.61 | 65.2 | 61.06 | 77.2 | 57.87 | true | true | ? | instruction-tuned | Original | cc-by-nc-sa-4.0 | 3 | 0 | 2024-11-25 07:11:28+00:00 | |
🟦 | 61.78 | 75.39 | 48.61 | 54.86 | 59.07 | 63.31 | 50.6 | 80.64 | false | true | ? | preference-tuned | Original | other | 40 | 10.31 | 2024-12-19 05:58:51+00:00 | |
🟦 | 61.14 | 76.81 | 48.48 | 52.93 | 59.62 | 59.4 | 49 | 81.7 | false | true | ? | preference-tuned | Original | mit | 110 | 9.24 | 2024-10-25 07:11:14+00:00 | |
🟦 | 60.03 | 76.71 | 48 | 56.83 | 60.17 | 61.4 | 45.2 | 71.91 | false | true | ? | preference-tuned | Original | apache-2.0 | 274 | 7.62 | 2024-11-14 11:36:44+00:00 | |
🟢 | 59.84 | 73.38 | 45.33 | 53.84 | 57.66 | 57.69 | 55.8 | 75.21 | false | true | ? | pretrained | Original | apache-2.0 | 67 | 7.62 | 2024-11-14 11:36:22+00:00 | |
⭕ | 59.38 | 69.46 | 36.61 | 46.71 | 52 | 58.56 | 65.6 | 86.7 | false | true | ? | instruction-tuned | Original | cc-by-nc-4.0 | 613 | 10.73 | 2024-10-22 22:52:54+00:00 | |
🟦 | 58.83 | 68.24 | 39.03 | 50.82 | 53.89 | 56.98 | 58.2 | 84.68 | false | true | ? | preference-tuned | Original | llama3 | 411 | 8.03 | 2024-12-10 10:10:16+00:00 | |
🟦 | 58.22 | 70.51 | 43.15 | 51.11 | 55.7 | 55.65 | 49 | 82.45 | false | true | ? | preference-tuned | Original | llama3 | 254 | 8.03 | 2024-12-10 09:38:34+00:00 | |
🟦 | 57.45 | 70.83 | 39.39 | 52.35 | 54.99 | 54.48 | 53.2 | 76.91 | false | true | ? | preference-tuned | Original | other | 23 | 7.46 | 2024-12-19 05:59:29+00:00 | |
⭕ | 56.69 | 68.3 | 39.27 | 50.2 | 53.02 | 54.42 | 57 | 74.57 | false | true | ? | instruction-tuned | Original | apache-2.0 | 76 | 7.94 | 2024-10-25 09:59:22+00:00 | |
⭕ | 55.01 | 65.23 | 35.03 | 45.85 | 45.95 | 41.85 | 69.2 | 81.91 | false | true | ? | instruction-tuned | Original | other | 64 | 7.27 | 2024-11-18 11:47:16+00:00 | |
🟢 | 53.43 | 64.31 | 31.88 | 46.67 | 46.58 | 39.45 | 66.4 | 78.72 | false | false | ? | pretrained | Original | other | 211 | 7.27 | 2024-10-29 07:20:18+00:00 | |
⭕ | 53.42 | 65.06 | 34.79 | 46.31 | 49.25 | 50.63 | 45.8 | 82.13 | false | true | ? | instruction-tuned | Original | apache-2.0 | 1131 | 7.25 | 2024-11-14 11:38:25+00:00 | |
🟦 | 52.89 | 66.89 | 34.3 | 49.32 | 47.92 | 48.04 | 67.8 | 55.96 | false | false | ? | preference-tuned | Original | other | 26 | 3.09 | 2024-10-22 13:17:21+00:00 | |
🟦 | 52.61 | 69.82 | 38.55 | 47 | 50.51 | 55.28 | 49 | 58.09 | false | true | ? | preference-tuned | Original | other | 13 | 7.27 | 2024-12-19 06:00:16+00:00 | |
🟢 | 51.9 | 62.45 | 27.88 | 43.49 | 43.91 | 44.54 | 58.4 | 82.66 | false | false | ? | pretrained | Original | unknown | 210 | 11.1 | 2024-10-29 07:23:16+00:00 | |
⭕ | 50.45 | 64 | 35.15 | 42.55 | 44.23 | 46.93 | 61.8 | 58.51 | false | true | ? | instruction-tuned | Original | apache-2.0 | 41 | 6.06 | 2024-10-22 23:04:13+00:00 | |
🟢 | 50.06 | 64.82 | 38.67 | 49.96 | 55.07 | 52.18 | 38 | 51.7 | false | false | ? | pretrained | Original | llama3.1 | 1068 | 8.03 | 2024-11-14 07:33:20+00:00 | |
🟦 | 49.62 | 67.99 | 34.79 | 49.15 | 48.78 | 51.55 | 29.2 | 65.85 | false | true | ? | preference-tuned | Original | other | 58 | 3.09 | 2024-10-22 11:25:51+00:00 | |
🟦 | 48.41 | 59.48 | 26.42 | 42.22 | 44.3 | 41.85 | 57.8 | 66.81 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:22+00:00 | |
🟦 | 47.12 | 49.82 | 22.79 | 38.97 | 41.32 | 53.32 | 57 | 66.6 | false | true | ? | preference-tuned | Original | llama3.2 | 416 | 1.24 | 2024-10-24 07:45:03+00:00 | |
🟦 | 45.74 | 57.52 | 24.97 | 42.29 | 41.24 | 42.85 | 42.4 | 68.94 | false | true | ? | preference-tuned | Original | other | 12 | 3.23 | 2024-12-19 06:00:40+00:00 | |
⭕ | 42.39 | 50.61 | 18.67 | 37.37 | 36.06 | 28.19 | 69 | 56.81 | false | true | ? | instruction-tuned | Original | apache-2.0 | 346 | 1.71 | 2024-11-22 10:44:37+00:00 | |
🟦 | 39.87 | 50.65 | 22.91 | 36.19 | 33.15 | 30.81 | 48.6 | 56.81 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:08:42+00:00 | |
🟦 | 37.97 | 45.15 | 15.27 | 35.43 | 33.23 | 28.61 | 49.6 | 58.51 | false | true | ? | preference-tuned | Original | other | 21 | 1.67 | 2024-12-19 06:01:10+00:00 | |
🟦 | 34.6 | 31.17 | 13.21 | 32.54 | 29.69 | 24.09 | 55.2 | 56.28 | false | true | ? | preference-tuned | Original | null | 0 | -1 | 2025-01-22 17:09:02+00:00 | |
🟢 | 34.56 | 38.88 | 12.24 | 29.12 | 29.93 | 18.46 | 56.4 | 56.91 | false | false | ? | pretrained | Original | apache-2.0 | 101 | 0.49 | 2024-10-22 13:46:13+00:00 | |
🟦 | 28.62 | 22.64 | 10.55 | 25.1 | 28.83 | 21.2 | 35.2 | 56.81 | false | true | ? | preference-tuned | Original | apache-2.0 | 33 | 3.32 | 2024-12-10 10:39:41+00:00 | |
🟦 | 28.33 | 22.65 | 10.79 | 24.41 | 25.92 | 19.61 | 38 | 56.91 | false | true | ? | preference-tuned | Original | apache-2.0 | 60 | 0.36 | 2024-12-10 08:36:15+00:00 |
About
The MEDIC Leaderboard evaluates large language models (LLMs) on various healthcare tasks across five key dimensions. Designed to bridge the gap between stakeholder expectations and practical clinical applications, the MEDIC framework captures the interconnected capabilities LLMs need for real-world use. Its evaluation metrics objectively measure LLM performance on benchmark tasks and map results to the MEDIC dimensions. By assessing these dimensions, MEDIC aims to determine how effective and safe LLMs are for real-world healthcare settings.

Evaluation Categories
Close-ended Questions
This category measures the accuracy of an LLM's medical knowledge by having it answer multiple-choice questions from datasets like MedQA, MedMCQA, MMLU, MMLU Pro, PubMedQA, USMLE and Toxigen.
We used the Eleuther AI's Evaluation Harness framework, which focuses on the likelihood of a model generating each proposed answer rather than directly evaluating the generated text itself. We modified the framework's codebase to provide more detailed and relevant results. Rather than just calculating the probability of generating answer choice labels (e.g., a., b., c., or d.), we calculate the probability of generating the full answer text.
Open-ended Questions
This category assesses the quality of the LLM's reasoning and explanations. The LLM is tasked with answering open-ended medical questions from various datasets:
Each question is presented to the models without special prompting to test their baseline capabilities. To compare models, we use a tournament-style approach. A judge (Llama3.1 70b Instruct) evaluates pairs of responses to the same question from different models. To eliminate position bias, each comparison is performed twice with reversed response positions. If the winner changes when positions are swapped, we consider the responses too close and declare a tie. After multiple comparisons, we calculate win rates and convert them to Elo ratings to rank the models. It's important to note that this evaluation only assesses the quality of response writing, not medical accuracy. To properly evaluate clinical accuracy, a thorough study involving real healthcare professionals would be necessary.
Medical Safety
Medical Safety category uses the "med-safety" benchmark dataset, which consists of 4500 scenarios presenting harmful medical requests. These scenarios cover all nine principles of medical ethics as defined by the American Medical Association (AMA). In this evaluation, an LLM acts as a judge. Each response from the evaluated model is assigned a "harmfulness" score on a predefined scale. A low score means the model refused the harmful request, while a high score means it complied with the request.
Medical Summarization
This category evaluates the LLM's ability to summarize medical texts, with a focus on clinical trial descriptions from ClinicalTrials.gov. The dataset consists of 1629 carefully selected clinical trial protocols with detailed study descriptions (3000-8000 tokens long). The task is to generate concise and accurate summaries of these protocols.
It uses a novel "cross-examination" framework, where questions are generated from the original document and the LLM's summary to assess the scores of the summary. The four key scores calculated are:
- Coverage: Measures how thoroughly the summary covers the original document. A higher score means the summary includes more details from the original.
- Conformity: Also called the non-contradiction score, this checks if the summary avoids contradicting the original document. A higher score means the summary aligns better with the original.
- Consistency: Measures the level of non-hallucination, or how much the summary sticks to the facts in the document. A higher score means the summary is more factual and accurate.
- Conciseness: Measures how brief the summary is. A higher score means the summary is more concise. A negative score means the summary is longer than the original document.
Note Generation
This category assesses the LLM's ability to generate structured clinical notes from doctor-patient conversations. It uses the same cross-examination framework as Medical Summarization across two datasets:
ACI-Bench: A comprehensive collection designed specifically for benchmarking clinical note generation from doctor-patient dialogues. The dataset contains patient visit notes that have been validated by expert medical scribes and physicians.
SOAP Notes: Using the test split of the ChartNote dataset containing 250 synthetic patient-doctor conversations generated from real clinical notes. The task involves generating notes in the SOAP format with the following sections:
- Subjective: Patient's description of symptoms, medical history, and personal experiences
- Objective: Observable data like physical exam findings, vital signs, and diagnostic test results
- Assessment: Healthcare provider's diagnosis based on subjective and objective information
- Plan: Treatment plan including medications, therapies, follow-ups, and referrals
Currently, the benchmark supports evaluation for models hosted on the huggingface hub and of decoder type. It doesn't support adapter models yet but we will soon add adapters too.
Submission Guide for the MEDIC Benchamark
First Steps Before Submitting a Model
1. Ensure Your Model Loads with AutoClasses
Verify that you can load your model and tokenizer using AutoClasses:
from transformers import AutoConfig, AutoModel, AutoTokenizer
config = AutoConfig.from_pretrained("your model name", revision=revision)
model = AutoModel.from_pretrained("your model name", revision=revision)
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
Note:
- If this step fails, debug your model before submitting.
- Ensure your model is public.
2. Convert Weights to Safetensors
Safetensors is a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the Extended Viewer
!
3. Complete Your Model Card
When we add extra information about models to the leaderboard, it will be automatically taken from the model card
4. Select the correct model type
Choose the correct model cateogory from the option below:
- 🟢 : 🟢 pretrained model: new, base models, trained on a given text corpora using masked modelling or new, base models, continuously trained on further corpus (which may include IFT/chat data) using masked modelling
- ⭕ : ⭕ fine-tuned models: pretrained models finetuned on more data or tasks.
- 🟦 : 🟦 preference-tuned models: chat like fine-tunes, either using IFT (datasets of task instruction), RLHF or DPO (changing the model loss a bit with an added policy), etc
5. Select Correct Precision
Choose the right precision to avoid evaluation errors:
- Not all models convert properly from float16 to bfloat16.
- Incorrect precision can cause issues (e.g., loading a bf16 model in fp16 may generate NaNs).
- If you have selected auto, the precision mentioned under
torch_dtype
under model config will be used.
6. Medically oriented model
If the model has been specifically built for medical domains i.e. pretrained/finetuned on significant medical data, make sure check the Domain specific
checkbox
7. Chat template
Select this option if your model uses a chat template. The chat template will be used during evaluation.
- Before submitting, make sure the chat template is defined in tokenizer config.
Upon successful submission of your request, your model's result would be updated on the leaderboard within 5 working days!
main | false | instruction-tuned | auto | Original | FINISHED | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | LOW CONTEXT LENGTH | LOW CONTEXT LENGTH | |
main | false | instruction-tuned | auto | Original | FINISHED | CHAT TEMPLATE ISSUE | CHAT TEMPLATE ISSUE | CHAT TEMPLATE ISSUE | CHAT TEMPLATE ISSUE | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | pretrained | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | pretrained | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | pretrained | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | pretrained | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | pretrained | auto | Original | FINISHED | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | pretrained | auto | Original | FINISHED | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | FINISHED | VLLM NOT SUPPORTED | VLLM NOT SUPPORTED | VLLM NOT SUPPORTED | VLLM NOT SUPPORTED | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | FINISHED | LOW CONTEXT LENGTH | LOW CONTEXT LENGTH | LOW CONTEXT LENGTH | LOW CONTEXT LENGTH | |
main | false | pretrained | auto | Original | FINISHED | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | FINISHED | LOW CONTEXT LENGTH | LOW CONTEXT LENGTH | LOW CONTEXT LENGTH | LOW CONTEXT LENGTH | |
main | false | pretrained | auto | Original | FINISHED | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | CHAT TEMPLATE MISSING | |
main | false | instruction-tuned | auto | Original | FINISHED | FINISHED | FINISHED | LOW CONTEXT LENGTH | LOW CONTEXT LENGTH |
main | false | instruction-tuned | bfloat16 | Original | PENDING | FINISHED | FINISHED | FINISHED | FINISHED |
main | false | preference-tuned | auto | Original | RERUN | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | instruction-tuned | auto | Original | RERUN | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | RERUN | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | auto | Original | PENDING | PENDING | PENDING | PENDING | PENDING | |
main | false | preference-tuned | auto | Original | PENDING | PENDING | PENDING | PENDING | PENDING | |
main | false | instruction-tuned | auto | Original | RERUN | FINISHED | FINISHED | FINISHED | FINISHED | |
main | false | preference-tuned | bfloat16 | Original | PENDING | FINISHED | FINISHED | FINISHED | FINISHED |
✉️✨ Submit your model here!
Is your model medically oriented?
Is your model a chat model?