mlx-engine
- Apple MLX LLM Engine for LM Studio
MLX engine for LM Studio
- mlx-lm - Apple MLX inference engine (MIT)
- Outlines - Structured output for LLMs (Apache 2.0)
- mlx-vlm - Vision model inferencing for MLX (MIT)
LM Studio 0.3.4 and newer for Mac ships pre-bundled with mlx-engine. Download LM Studio from here
To run a demo of model load and inference:
- Clone the repository
git clone https://github.com/lmstudio-ai/mlx-engine.git
cd mlx-engine
- Create a virtual environment (optional)
python -m venv .venv
source .venv/bin/activate
- Install the required dependency packages
pip install -U -r requirements.txt
Run the demo.py
script with an MLX text model:
python demo.py --model ~/.cache/lm-studio/models/mlx-community/Meta-Llama-3.1-8B-Instruct-4bit
mlx-community/Meta-Llama-3.1-8B-Instruct-4bit - 4.53 GB
This command will use a default prompt that is formatted for Llama-3.1. For other models, add a custom --prompt
argument with the correct prompt formatting:
python demo.py --model ~/.cache/lm-studio/models/mlx-community/Mistral-Small-Instruct-2409-4bit --prompt "<s> [INST] How long will it take for an apple to fall from a 10m tree? [/INST]"
mlx-community/Mistral-Small-Instruct-2409-4bit - 12.52 GB
Run the demo.py
script with an MLX vision model:
python demo.py --model ~/.cache/lm-studio/models/mlx-community/pixtral-12b-4bit --prompt "<s>[INST]Compare these images[IMG][IMG][/INST]" --images demo-data/chameleon.webp demo-data/toucan.jpeg
Currently supported vision models and download links:
- Llama-3.2-Vision
- Pixtral
- mlx-community/pixtral-12b-4bit - 7.15 GB
- Qwen2-VL
- mlx-community/Qwen2-VL-2B-4bit - 1.26 GB
- mlx-community/Qwen2-VL-7B-Instruct-4bit - 4.68 GB
- Llava-v1.6
- mlx-community/llava-v1.6-mistral-7b-4bit - 4.26 GB