Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Source Dataset in MLFlow #1045

Closed
wants to merge 112 commits into from
Closed

Tracking Source Dataset in MLFlow #1045

wants to merge 112 commits into from

Conversation

KuuCi
Copy link
Contributor

@KuuCi KuuCi commented Mar 21, 2024

This PR aims to properly log the original dataset in MLFlow for finetuning by tracking source datasets in the fintuning config and mapping it to the appropriate MLFlow DataStore

https://databricks.atlassian.net/browse/GRT-2724

Relevant PRs:
https://github.com/mosaicml/mcloud/pull/3723

@KuuCi KuuCi marked this pull request as draft March 21, 2024 00:09
@KuuCi KuuCi requested a review from dakinggg March 21, 2024 00:09
b-chu and others added 28 commits April 17, 2024 17:12
This adds support for the other common chat format. We just remap keys and add a new role.
#1111 needed to revert #1104 because the #1104 PR caused issues. Removing TODO and marking Jira with wont-do
* start

* still need to migrate fixtures

* wip onboarding tests

* still workin'

* still wip

* maybe done; test out on mcli now

* mcli

* remove calibration error

* migration

* migration

* full migration

* precommit

* fix

* fix pytests

* refactor QA

* update

* restore

* add

* fix

* wip

* update readme

* final pyright

* done

* pass prelimiter into ALL the ICL task datasets

* allow QA task name stil lfor backward compatibility

* fix

* fix test

* add generation length

* remove max_new_tokens

* fix cpu trsts

* try and fix lm eval test

* temp disable lm task eval test

* fix test?

* fix tet

* finish

* fix

* Update scripts/eval/README.md

Co-authored-by: Daniel King <[email protected]>

* fix comments

* fix bug with seq len

* restore mcli

* merge

* fix builder

* add deprecation warning

* add deprecation warning

* merge

* merge

* add logging necessities to nlp.py

* add attention_mask test update

* fix generation_length in tests

* fix bug

* restore yamls

* fix typos

* add deprecation warning for code

* pyright wip

* fix pyright

* fix pyright error again

* fix pyright

* fix pyright

* update version

---------

Co-authored-by: Eitan Turok <[email protected]>
Co-authored-by: Max Marion <[email protected]>
Co-authored-by: Daniel King <[email protected]>
Co-authored-by: Max Marion <[email protected]>
@KuuCi KuuCi closed this Apr 22, 2024
@KuuCi
Copy link
Contributor Author

KuuCi commented Apr 22, 2024

Moved to #1119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.