LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface
New 'Lemon' language distinguishes instructions from data to prevent prompt injection attacks.
A team of researchers from IBM, including Michael Hind and seven others, has proposed LLMON (LLM Object Notation, pronounced 'Lemon'), a new markup language designed specifically for interacting with large language models. The core problem LLMON addresses is the current flat, unstructured nature of text prompts. Most prompts mix instructions (like 'Summarize this') with the data to be processed, but LLMs receive this as a single, undifferentiated string. This lack of structure can confuse the model and creates security vulnerabilities, most notably prompt injection attacks where malicious data can override the original instructions.
LLMON introduces a formal way to embed structure and semantic metadata directly into the prompt. This allows developers to explicitly label different parts of the input, such as distinguishing system instructions from user data or marking sensitive content. The researchers argue this structured approach can be leveraged during model training, prompting, and inference, leading to tangible improvements in accuracy, safety, and security. They draw an analogy to programming language types, which enable static checking, better tooling, and runtime safety.
The 28-page paper provides preliminary empirical evidence supporting LLMON's value for both training and inference use cases. By giving the model a clearer understanding of the prompt's intent and components, it can more reliably follow instructions and resist manipulation. The work also opens up broader research opportunities for 'LLM-native' interfaces, suggesting a future where models are designed from the ground up to understand structured, semantically rich inputs rather than plain text.
- LLMON (LLM Object Notation) is a new markup language from IBM researchers that adds structure to LLM prompts.
- It explicitly separates instructions from data, aiming to prevent prompt injection attacks and reduce model confusion.
- Early evidence shows potential for improved model accuracy and security, analogous to type systems in programming languages.
Why It Matters
Provides a foundational method to make LLM interactions more secure, reliable, and structured, critical for enterprise deployment.