Guide for implementing parsers with error recovery for new languages in Biome. Use when creating parsers for JavaScript, CSS, JSON, HTML, GraphQL, or adding new language support. Examples:<example>User needs to add parsing support for a new language</example><example>User wants to implement error recovery in parser</example><example>User is writing grammar definitions in .ungram format</example>
Install
npx skillscat add biomejs/biome/parser-development Install via the SkillsCat registry.
Purpose
Use this skill when creating or modifying Biome's parsers. Covers grammar authoring with ungrammar, lexer implementation, error recovery strategies, and list parsing patterns.
Prerequisites
- Install required tools:
just install-tools - Understand the language syntax you're implementing
- Read
crates/biome_parser/CONTRIBUTING.mdfor detailed concepts
Common Workflows
Create Grammar for New Language
Create a .ungram file in xtask/codegen/ (e.g., html.ungram):
// html.ungram
// Legend:
// Name = -- non-terminal definition
// 'ident' -- token (terminal)
// A B -- sequence
// A | B -- alternation
// A* -- zero or more repetition
// (A (',' A)* ','?) -- repetition with separator and optional trailing comma
// A? -- zero or one repetition
// label:A -- suggested name for field
HtmlRoot = element*
HtmlElement =
'<'
tag_name: HtmlName
attributes: HtmlAttributeList
'>'
children: HtmlElementList
'<' '/' close_tag_name: HtmlName '>'
HtmlAttributeList = HtmlAttribute*
HtmlAttribute =
| HtmlSimpleAttribute
| HtmlBogusAttribute
HtmlSimpleAttribute =
name: HtmlName
'='
value: HtmlString
HtmlBogusAttribute = /* error recovery node */Naming conventions:
- Prefix all nodes with language name:
HtmlElement,CssRule - Unions start with
Any:AnyHtmlAttribute - Error recovery nodes use
Bogus:HtmlBogusAttribute - Lists end with
List:HtmlAttributeList - Lists are mandatory (never optional), empty by default
Generate Parser from Grammar
# Generate for specific language
just gen-grammar html
# Generate for multiple languages
just gen-grammar html css
# Generate all grammars
just gen-grammarThis creates:
biome_html_syntax/src/generated/- Node definitionsbiome_html_factory/src/generated/- Node construction helpers- Parser skeleton files (you'll implement the actual parsing logic)
Implement a Lexer
Create lexer/mod.rs in your parser crate:
use biome_html_syntax::HtmlSyntaxKind;
use biome_parser::{lexer::Lexer, ParseDiagnostic};
pub(crate) struct HtmlLexer<'source> {
source: &'source str,
position: usize,
current_kind: HtmlSyntaxKind,
diagnostics: Vec<ParseDiagnostic>,
}
impl<'source> Lexer<'source> for HtmlLexer<'source> {
const NEWLINE: Self::Kind = HtmlSyntaxKind::NEWLINE;
const WHITESPACE: Self::Kind = HtmlSyntaxKind::WHITESPACE;
type Kind = HtmlSyntaxKind;
type LexContext = ();
type ReLexContext = ();
fn source(&self) -> &'source str {
self.source
}
fn current(&self) -> Self::Kind {
self.current_kind
}
fn position(&self) -> usize {
self.position
}
fn advance(&mut self, context: Self::LexContext) -> Self::Kind {
// Implement token scanning logic
let start = self.position;
let kind = self.read_next_token();
self.current_kind = kind;
kind
}
// Implement other required methods...
}Implement Token Source
use biome_parser::lexer::BufferedLexer;
use biome_html_syntax::HtmlSyntaxKind;
use crate::lexer::HtmlLexer;
pub(crate) struct HtmlTokenSource<'src> {
lexer: BufferedLexer<HtmlSyntaxKind, HtmlLexer<'src>>,
}
impl<'source> TokenSourceWithBufferedLexer<HtmlLexer<'source>> for HtmlTokenSource<'source> {
fn lexer(&mut self) -> &mut BufferedLexer<HtmlSyntaxKind, HtmlLexer<'source>> {
&mut self.lexer
}
}Write Parse Rules
Example: Parsing an if statement:
use biome_parser::prelude::*;
use biome_js_syntax::JsSyntaxKind::*;
fn parse_if_statement(p: &mut JsParser) -> ParsedSyntax {
// Presence test - return Absent if not at 'if'
if !p.at(T![if]) {
return Absent;
}
let m = p.start();
// Parse required tokens
p.expect(T![if]);
p.expect(T!['(']);
// Parse required nodes with error recovery
parse_any_expression(p).or_add_diagnostic(p, expected_expression);
p.expect(T![')']);
parse_block_statement(p).or_add_diagnostic(p, expected_block);
// Parse optional else clause
if p.at(T![else]) {
parse_else_clause(p).ok();
}
Present(m.complete(p, JS_IF_STATEMENT))
}Parse Lists with Error Recovery
Use ParseSeparatedList for comma-separated lists:
struct ArrayElementsList;
impl ParseSeparatedList for ArrayElementsList {
type ParsedElement = CompletedMarker;
fn parse_element(&mut self, p: &mut Parser) -> ParsedSyntax<Self::ParsedElement> {
parse_array_element(p)
}
fn is_at_list_end(&self, p: &mut Parser) -> bool {
// Stop at array closing bracket or file end
p.at(T![']']) || p.at(EOF)
}
fn recover(
&mut self,
p: &mut Parser,
parsed_element: ParsedSyntax<Self::ParsedElement>,
) -> RecoveryResult {
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(
JS_BOGUS_EXPRESSION,
token_set![T![']'], T![,]]
),
expected_array_element,
)
}
fn separating_element_kind(&mut self) -> JsSyntaxKind {
T![,]
}
}
// Use the list parser
fn parse_array_elements(p: &mut Parser) -> CompletedMarker {
let m = p.start();
ArrayElementsList.parse_list(p);
m.complete(p, JS_ARRAY_ELEMENT_LIST)
}Implement Error Recovery
Error recovery wraps invalid tokens in BOGUS nodes:
// Recovery set includes:
// - List terminator tokens (e.g., ']', '}')
// - Statement terminators (e.g., ';')
// - List separators (e.g., ',')
let recovery_set = token_set![T![']'], T![,], T![;]];
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(JS_BOGUS_EXPRESSION, recovery_set),
expected_expression_error,
)Handle Conditional Syntax
For syntax only valid in certain contexts (e.g., strict mode):
fn parse_with_statement(p: &mut Parser) -> ParsedSyntax {
if !p.at(T![with]) {
return Absent;
}
let m = p.start();
p.bump(T![with]);
parenthesized_expression(p).or_add_diagnostic(p, expected_expression);
parse_statement(p).or_add_diagnostic(p, expected_statement);
let with_stmt = m.complete(p, JS_WITH_STATEMENT);
// Mark as invalid in strict mode
let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| {
p.err_builder(
"`with` statements are not allowed in strict mode",
marker.range(p)
)
});
Present(conditional.or_invalid_to_bogus(p))
}Test Parser
Create test files in tests/:
crates/biome_html_parser/tests/
├── html_specs/
│ ├── ok/
│ │ ├── simple_element.html
│ │ └── nested_elements.html
│ └── error/
│ ├── unclosed_tag.html
│ └── invalid_syntax.html
└── html_test.rsRun tests:
cd crates/biome_html_parser
cargo testTips
- Presence test: Always return
Absentif the first token doesn't match - never progress parsing before returningAbsent - Required vs optional: Use
p.expect()for required tokens,p.eat()for optional ones - Missing markers: Use
.or_add_diagnostic()for required nodes to add missing markers and errors - Error recovery: Include list terminators, separators, and statement boundaries in recovery sets
- Bogus nodes: Check grammar for which
BOGUS_*node types are valid in your context - Checkpoints: Use
p.checkpoint()to save state andp.rewind()if parsing fails - Lookahead: Use
p.at()to check tokens,p.nth_at()for lookahead beyond current token - Lists are mandatory: Always create list nodes even if empty - use
parse_list()notparse_list().ok()
Common Patterns
// Optional token
if p.eat(T![async]) {
// handle async
}
// Required token with error
p.expect(T!['{']);
// Optional node
parse_type_annotation(p).ok();
// Required node with error
parse_expression(p).or_add_diagnostic(p, expected_expression);
// Lookahead
if p.at(T![if]) || p.at(T![for]) {
// handle control flow
}
// Checkpoint for backtracking
let checkpoint = p.checkpoint();
if parse_something(p).is_absent() {
p.rewind(checkpoint);
parse_something_else(p);
}References
- Full guide:
crates/biome_parser/CONTRIBUTING.md - Grammar examples:
xtask/codegen/*.ungram - Parser examples:
crates/biome_js_parser/src/syntax/ - Error recovery: Search for
ParseRecoveryTokenSetin existing parsers