Leaderboard

We present the results of voting using LLaVA-v1.5-13B as anchor. The numbers denote win/tie/lose of a benchmarked model over LLaVA-v1.5-13B. See more results of different evaluation protocols and anchors in our paper. The information of benchmarked models is here.

Rank	Models	Perception	Understanding	Applying	Analyzing	Evaluation	Creation	Win Rates over LLaVA-v1.5-13B
🏅️	Claude-3	56/13/1	98/9/3	45/11/4	83/14/3	33/5/2	33/6/1	0.83
🥈	GPT-4V	56/10/4	101/6/3	29/12/19	73/22/5	33/2/5	2/0/38	0.70
🥉	LLaVA-v1.6-34B	46/17/7	78/22/10	36/15/9	61/28/11	33/3/4	24/10/6	0.66
4	LLaVA-v1.6-Vicuna-13B	40/21/9	65/33/12	35/19/6	51/26/23	33/5/2	27/9/4	0.60
5	LLaVA-v1.6-Vicuna-7B	31/25/14	56/37/17	26/23/11	40/31/29	22/10/8	19/10/11	0.46
6	ALLaVA-3B-Longer	22/21/27	57/30/23	23/17/20	44/30/26	16/10/14	17/12/11	0.43
7	Gemini-1.0-Pro	45/10/15	36/35/39	24/19/17	33/28/39	9/8/23	16/8/16	0.39
8	Qwen-VL-Chat	34/22/14	38/36/36	26/18/16	35/29/36	15/6/19	9/12/19	0.37
9	LVIS	22/28/20	32/39/39	11/27/22	33/36/31	14/9/17	9/16/15	0.29
10	mPLUG-Owl2	16/24/30	30/34/46	17/17/26	23/38/39	15/8/17	11/14/15	0.27
11	LLaVA-v1.5-7B	19/22/29	27/47/36	13/29/18	21/43/36	9/14/17	8/13/19	0.23
12	MiniGPT-v2	12/25/33	24/32/54	11/25/24	17/38/45	9/9/22	6/6/28	0.19
13	InstructBLIP	15/16/39	13/36/61	6/23/31	13/29/58	10/7/23	4/9/27	0.15
14	Cheetor	12/20/38	7/27/76	10/22/28	16/23/61	4/4/32	3/4/33	0.12
15	SEED-LLaMA	16/15/39	5/25/80	10/21/29	7/25/68	3/7/30	3/3/34	0.10
16	kosmos2	6/22/42	6/18/86	6/15/39	10/20/70	1/4/35	2/3/35	0.07
17	Yi-VL-6B	4/17/49	8/22/80	5/27/28	5/29/66	3/9/28	3/9/28	0.07
18	Fuyu-8B	7/19/44	7/27/76	6/14/40	4/22/74	3/7/30	0/6/34	0.06
19	LWM	2/18/50	5/15/90	4/21/35	2/18/80	3/2/35	2/6/32	0.04
20	OpenFlamingo	8/13/49	2/8/100	3/14/43	2/21/77	1/2/37	1/5/34	0.04
21	BLIP2	3/13/54	2/15/93	6/8/46	0/22/78	0/1/39	0/2/38	0.03