🤗 Welcome to MLLM-Bench auto-evaluation platform! We offer limited number of FREE evaluations using GPT-4V (once per day per email). 🤗
Please follow the instructions below to submit your answer file.
Submission Format
import json
samples = json.load(open(fp))
assert type(samples) == list
assert len(samples) == 420, 'the length of the answer file should be 420.'
assert all([type(s) == dict for s in samples])
# "id", "gen_model_id" and "answer" are required keys for each sample. Redundant keys have no effect on evaluation.
assert all(['id' in s for s in samples]), "'id' must be a key of every sample"
assert all(['gen_model_id' in s for s in samples]), "key `gen_model_id` is the name for your model and is missing"
assert all(['answer' in s for s in samples]), "'answer' must be a key of every sample"
assert sorted([s['id'] for s in samples]) == list(range(420)), 'ids must start from 0 and end at 419'
print('good to go!')
.json
file by clicking the "Choose File" button.