-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimodal prototyping #2243
Multimodal prototyping #2243
Conversation
…_image still WIP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are accessibility issues in these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are accessibility issues in these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are accessibility issues in these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are accessibility issues in these changes.
…I/lm-evaluation-harness into multimodal-prototyping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baberabb and I are merging this, though we'll continue iterating on the model/task design from here!
mmmu_val
scores on a few models we specifically used for testing during development can be found in the MMMU-specific readme. Scores tend to match or slightly exceed the lmms-eval
implementation although they don't always match the model authors' reported scores (which don't have code published).
* add WIP hf vlm class * add doc_to_image * add mmmu tasks * fix merge conflicts * add lintang's changes to hf_vlms.py * fix doc_to_image * added yaml_path for config-loading * revert * add line to process str type v * update * modeling cleanup * add aggregation for mmmu * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP) * implemented doc_to_image * update doc_to_image to accept list of features * update functions * readd image processed * update args process * bugfix for repeated images fed to model * push WIP loglikelihood code * commit most recent code (generative ; qwen2-vl testing) * preliminary image_token_id handling * small mmmu update: some qs have >4 mcqa options * push updated modeling code * use processor.apply_chat_template * add mathvista draft * nit * nit * ensure no footguns in text<>multimodal LM<>task incompatibility * add notification to readme regarding launch of prototype! * fix compatibility check * reorganize mmmu configs * chat_template=None * add interleave chat_template * add condition * add max_images; interleave=true * nit * testmini_mcq * nit * pass image string; convert img * add vllm * add init * vlm add multi attr * fixup * pass max images to vllm model init * nit * encoding to device * fix HFMultimodalLM.chat_template ? * add mmmu readme * remove erroneous prints * use HFMultimodalLM.chat_template ; restore tasks/__init__.py * add docstring for replace_placeholders in utils * fix `replace_placeholders`; set image_string=None * fix typo * cleanup + fix merge conflicts * update MMMU readme * del mathvista * add some sample scores * Update README.md * add log msg for image_string value --------- Co-authored-by: haileyschoelkopf <[email protected]> Co-authored-by: Baber Abbasi <[email protected]> Co-authored-by: Baber <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]>
No description provided.