Finetuning Mistral with deepspeed #101

achangtv · 2024-01-29T18:38:08Z

In fine_tune_deepspeed.py, the first part of the load_training_dataset function looks like this:

def load_training_dataset(
    tokenizer,
    path_or_dataset: str = DEFAULT_TRAINING_DATASET,
    max_seq_len: int = 256,
    seed: int = DEFAULT_SEED,
) -> Dataset:
    logger.info(f"Loading dataset from {path_or_dataset}")
    dataset = load_dataset(path_or_dataset)
    logger.info(f"Training: found {dataset['train'].num_rows} rows")
    logger.info(f"Eval: found {dataset['test'].num_rows} rows")

The way this function is written, it seems like I have to upload a path to a huggingface dataset. Because this is in Databricks, I would like to pass in a spark dataframe, but load_dataset doesn't accept pyspark dataframes, so I edited to line to read dataset = Dataset.from_spark(path_or_dataset) but this gave me the error pyspark.errors.exceptions.base.PySparkRuntimeError: [MASTER_URL_NOT_SET] A master URL must be set in your configuration. You also cannot pass in an already created dataset object to load_dataset(). Should I just change the code to dataset = path_or_dataset? Or should I keep the code as-is and pass in a dbfs path to a, dataset object?

The text was updated successfully, but these errors were encountered:

es94129 · 2024-01-31T00:31:58Z

If you would like to pass in a Spark dataframe, dataset = Dataset.from_spark(df) looks good to me.

Regarding the PySparkRuntimeError, are you running the code in Databricks? It shall set the Spark master for you.

achangtv · 2024-01-31T14:44:32Z

I am running the code in Databricks, although I did clone the repo so I am running it within Repos and not Workspace. Should I just copy over the whole folder into workspace? Or maybe the problem is the type of compute? I was using a multi GPU compute with an ML runtime, I can try again with a single GPU set up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning Mistral with deepspeed #101

Finetuning Mistral with deepspeed #101

achangtv commented Jan 29, 2024

es94129 commented Jan 31, 2024

achangtv commented Jan 31, 2024 •

edited

Loading

Finetuning Mistral with deepspeed #101

Finetuning Mistral with deepspeed #101

Comments

achangtv commented Jan 29, 2024

es94129 commented Jan 31, 2024

achangtv commented Jan 31, 2024 • edited Loading

achangtv commented Jan 31, 2024 •

edited

Loading