Vector Embedding Markup Language - markup language designed specifically for annotating and structuring data related to vector embeddings. This could be used to represent, exchange, or store vector embeddings in a structured way that's easily readable by both humans and machines.
Embedditor • Discord • Try demo on IngestAI
Running IngestAI project since February 2023 we faced a lot of issues from thousands of our users. Almost all of these issues were connected with the dataset structure and ability to influence on the vector search results.
VEML file is saved in .veml format and consists of following structure:
- "html": an array of pure HTML code of a chunks to make it presentable for the users.
- "tokens": an array of pure texts part that will be embedded
- "vectors": an array of embeddings for every chunk (can be empty if chunk was disabled)
- "meta": an array of meta information for every chunk, consists of strings, that have such strcture: key:value, ex. link:https://wikipedia.com
You can see the structure of VEML file in schema.json file in this repository, and also you can see examples in the examples folder.
The implementation of VEML offers numerous advantages, such as:
- Standardization: VEML provides a standardized format for pre-processing and editing vector embeddings.
- Interoperability: It ensures better interoperability among different applications and systems that utilize vector embeddings.
- Extensibility: Just like XML, VEML has the potential to be extensible, allowing users to add new tags and attributes to represent additional properties or metadata associated with the vector embeddings.
- Machine Readability: A well-defined markup language would also be easily parseable by ML, ensuring efficient processing and manipulation of vector embeddings by various software applications.
We understand that developing a markup without an app that supports it, is not a good idea, so we created open-source tool called Embedditor. You can download it from Github or Docker and run it on your local server to start working with the VEML files and editor.