Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ Contents
internals/layout_in_calldata.rst
internals/variable_cleanup.rst
internals/source_mappings.rst
internals/ethdebug_internal_metadata.rst
internals/optimizer.rst
metadata.rst
abi-spec.rst
Expand Down
258 changes: 258 additions & 0 deletions docs/internals/ethdebug_internal_metadata.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
.. index:: ethdebug, debug info, metadata

**************************
ETHDebug Internal Metadata
**************************

.. warning::

ETHDebug support is experimental. The internal representation described here is
intended for compiler development and may change before the public ETHDebug
output is stabilized.

The compiler can emit debug information in the
`ethdebug format <https://github.com/ethdebug/format>`_. The JSON outputs are
validated against the upstream schemas, but the compiler does not construct that
JSON directly during the Solidity-to-Yul lowering. This internal representation
is the compiler-side carrier for semantic information that needs to survive the
Yul pipeline before it can be lowered into public ETHDebug type and pointer
entities.

This page describes the internal representation used for semantic debug metadata.
It complements :doc:`source_mappings`, which describe source ranges and bytecode
instruction mapping.

Overview
========

The internal ETHDebug metadata flow is:

1. Solidity analysis assigns AST IDs and type information to declarations. AST
IDs are stable within a single compilation and are the internal join key used
by this metadata pipeline.
2. The IR generator prints AST ID comments into generated Yul when AST ID debug
info is enabled. Semantic metadata reattachment relies on those comments
being present at Yul parse/reparse boundaries. Because of that, selecting
``ethdebug`` debug info in ``CompilerStack`` implicitly enables ``ast-id``
debug info as well.
3. The compiler builds a side table keyed by Solidity AST ID.
4. Generated Yul is parsed and analyzed into a ``YulStack``.
5. Semantic metadata is attached to Yul ``DebugData`` objects when the AST ID
in the Yul node's debug data has an entry in the side table.
6. If the Yul optimizer reparses optimized IR, semantic metadata is collected
before reparse and reattached afterward by AST ID.

The AST ID is the join key. It allows semantic information from the Solidity AST
to survive the text-based Yul print/parse boundary.
The side table itself is not serialized; the serialized part that crosses the
Yul text boundary is the AST ID comment in the Yul source.

Core Structures
===============

``langutil::DebugData`` is the common debug payload carried by Yul AST nodes.
For ETHDebug, it contains:

* the native Yul source location,
* the original Solidity source location,
* the optional Solidity AST ID,
* optional semantic debug metadata.

The semantic payload is represented by ``langutil::SemanticDebugData``. It
currently contains:

* ``lexicalScopeID``: the Solidity AST ID of the lexical scope represented by
this metadata,
* ``variableDefinitions``: the variables introduced in that scope.

Each ``SemanticDebugVariable`` contains:

* ``name``: Solidity source-level name,
* ``declarationAstID``: AST ID of the Solidity declaration,
* ``declarationLocation``: Solidity source location of the declaration,
* ``typeID``: compiler-internal Solidity type identifier,
* ``ethdebugType``: ETHDebug-oriented type descriptor derived from the Solidity
type,
* ``location``: initial internal variable location,
* ``ethdebugPointer``: ETHDebug-oriented pointer descriptor derived from the
initial internal location.

The variable location is represented by ``SemanticDebugVariableLocation``. The
location kind can describe stack, storage, transient storage, memory, calldata,
immutable, constant, or optimized-out values. The current implementation
populates stack locations for function parameters and named return variables,
and storage locations for named state variables.

Side Table
==========

``langutil::SemanticDebugDataTable`` maps Solidity AST IDs to
``SemanticDebugData`` instances.

This table is intentionally separate from the Yul AST because generated IR is
still passed through textual Yul at multiple points. The table lets the compiler
reattach semantic metadata whenever a Yul AST is reconstructed from text.
It is an in-memory compiler data structure, not a public output format.

The current table entries are keyed by lexical-scope AST IDs, such as function
or modifier AST IDs. Variable declaration AST IDs are stored inside
``SemanticDebugVariable`` records as declaration identities; they are not
top-level keys in the table.

The table is used in two places:

* ``CompilerStack`` builds the table from the Solidity contract and attaches it
after generated IR is parsed into a ``YulStack``. This happens both for the
freshly generated IR and when the optimized IR is reloaded from text for EVM
code generation.
* ``YulStack`` retains the attached table as a member. ``reparse()`` merges
metadata collected from the current Yul AST into the retained table before
printing, then reattaches it to the new Yul AST by AST ID. Retaining the table
also preserves entries that are not attached to any Yul node, such as the
contract-scope storage metadata, which has no corresponding ``@ast-id``
comment in generated Yul.

The transfer between the table and Yul ASTs is implemented in
``libyul/SemanticDebugDataTransfer.h``.

Current Producer
================

The current Solidity-side producer is
``frontend::buildSemanticDebugDataTable(ContractDefinition const&)``.

It records named state variables, named function parameters, named modifier
parameters, and named function return variables. Functions and modifiers are
collected from all linearized base contracts, because inherited definitions are
compiled into the most derived contract's IR with their original AST IDs. Free
functions from the contract's source unit and all recursively referenced source
units are collected as well.
For each variable it stores:

* the declaration name,
* the declaration AST ID,
* the declaration source location,
* the compiler type identifier, for example ``t_uint256``,
* an ETHDebug-oriented type descriptor, for example ``uint`` with ``bits = 256``,
* the initial location,
* an ETHDebug-oriented pointer descriptor for that initial location.

The stack pointer is based on ``IRVariable`` and therefore matches the names used
by generated IR, such as ``var_value_42`` for a Solidity variable named
``value`` with AST ID ``42``. Multi-slot variables use the stack slot list
produced by ``IRVariable``. In the ETHDebug-oriented pointer descriptor, one-slot
variables become stack region pointers and multi-slot variables become groups of
stack region pointers.

The storage pointer is based on the compiler's existing storage layout
calculation. Named state variables use contract-scope semantic metadata keyed by
the contract AST ID. Each storage variable records a storage region pointer with
the base slot, and for packed variables, the byte offset and byte length within
the slot.

Current Scope
=============

This is not yet the complete ETHDebug variable model. The current scope is a
minimal internal carrier and first real producer:

* function parameters,
* modifier parameters,
* named function return variables,
* named state variables,
* ETHDebug-oriented type descriptors for elementary scalar types and basic
complex type categories,
* initial stack locations and ETHDebug-oriented stack pointer descriptors,
* initial storage locations and ETHDebug-oriented storage pointer descriptors.

Local variables, memory pointers, calldata pointers, transient storage pointers,
immutable and constant values, and detailed optimizer location updates are still
future work. The current implementation exports schema-valid elementary type
entries to ``ethdebug.resources.types`` and schema-valid storage pointer
templates to ``ethdebug.resources.pointers``. It does not yet add per-instruction
variable contexts to the public ETHDebug JSON output.

Type and Pointer Mapping
========================

The current ``typeID`` is the compiler's existing internal type identifier. This
is useful as a stable compiler-side key, but it is not the final ETHDebug type
schema representation.

The current ``ethdebugType`` descriptor is the first bridge from Solidity types
to the public ETHDebug type vocabulary. It records whether the type is
elementary, complex, or unknown, plus the ETHDebug kind. The first mapping covers
``uint``/``int`` bit widths, ``fixed``/``ufixed`` bit widths and decimal places,
``bool``, fixed and dynamic ``bytes``, ``string``, ``address`` payable-ness,
contracts, enums, aliases, tuples, arrays, mappings, and structs. Complex
descriptors do not yet recursively contain member, key, value, or element type
wrappers.

Similarly, the current stack ``pointerID`` is an internal pointer into generated
Yul stack slots. The current ``ethdebugPointer`` descriptor is the first bridge
from these internal locations to the public ETHDebug pointer vocabulary. It
records stack region pointers for one-slot variables and pointer groups for
multi-slot variables. The stack slot expression is still the generated Yul
variable name, not a runtime stack depth, so stack pointer descriptors remain
internal for now.

The public resource exporter currently emits schema-valid elementary type
descriptors to ``ethdebug.resources.types``, keyed by compiler type ID, for
example ``t_uint256``. It also emits storage pointer templates to
``ethdebug.resources.pointers`` for named state variables. Storage pointer keys
are compiler-generated pointer IDs, and the template body contains the storage
slot plus optional byte offset and length for packed values.

Future work should define:

* memory, calldata, transient, immutable, constant, and optimized-out variable
location to ETHDebug pointer mapping,
* complete recursive storage pointer mapping for structured values,
* optimizer rules for updating, splitting, merging, or removing variable
locations,
* schema-validated emission of per-instruction variable contexts.

Optimizer Update Rules
======================

The first implemented optimizer rule is conservative and runs whenever semantic
metadata is attached to a Yul AST, including across the Yul text reparse
boundary and when optimized IR is reloaded from text for EVM code generation:

* semantic metadata is collected before reparse and reattached afterward by AST
ID,
* surviving Yul variable names are collected separately for each object in the
Yul object tree, so a variable that only survives in the creation code is
still marked ``OptimizedOut`` in the deployed code, and vice versa,
* stack-backed semantic variables keep their locations if all referenced stack
slots still exist in the object the metadata is attached to,
* stack-backed semantic variables become ``OptimizedOut`` if any referenced
stack slot no longer exists there.

This avoids reporting stale stack locations after an optimizer pass removes the
generated Yul variables that originally held a Solidity value.

The current rule does not yet infer new locations. If an optimizer pass renames,
splits, merges, inlines, or rematerializes a value, the compiler must eventually
provide an explicit debug-location update. Until those mappings exist, losing the
old stack slot is treated as ``OptimizedOut`` rather than guessed.

Testing
========

The internal metadata plumbing is covered by focused tests:

* ``DebugDataTest`` checks that ``DebugData`` can carry semantic metadata and
that the AST-ID side table resolves it.
* ``YulDebugDataTest`` checks that semantic metadata survives Yul reparse by AST
ID and that missing stack locations are marked ``OptimizedOut``.
* ``SemanticDebugDataTest`` checks that Solidity function variables produce
semantic metadata with declaration IDs, type IDs, ETHDebug-oriented type
descriptors, initial stack locations, and ETHDebug-oriented pointer
descriptors. It also checks state-variable storage pointer descriptors and
verifies that function variable metadata can be attached to generated Yul and
survives the Yul reparse path.

These tests deliberately target the internal model. Schema validation tests cover
the public ETHDebug JSON output separately, including the exported storage
pointer templates.
11 changes: 8 additions & 3 deletions libevmasm/Ethdebug.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -204,12 +204,17 @@ Json ethdebug::program(std::string_view _name, unsigned _sourceID, Assembly cons
};
}

Json ethdebug::resources(std::vector<Source> const& _sources, std::string_view _version)
Json ethdebug::resources(
std::vector<Source> const& _sources,
std::string_view _version,
Json _types,
Json _pointers
)
{
schema::info::Resources result;
result.compilation = materialCompilation(_sources, _version);
result.types = Json::object();
result.pointers = Json::object();
result.types = std::move(_types);
result.pointers = std::move(_pointers);
return result;
}

Expand Down
7 changes: 6 additions & 1 deletion libevmasm/Ethdebug.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,12 @@ struct Source
Json program(std::string_view _name, unsigned _sourceID, Assembly const& _assembly, LinkerObject const& _linkerObject);

// returns ethdebug/format/info/resources
Json resources(std::vector<Source> const& _sources, std::string_view _version);
Json resources(
std::vector<Source> const& _sources,
std::string_view _version,
Json _types = Json::object(),
Json _pointers = Json::object()
);

// returns the 'compilation' object from ethdebug/format/info/resources
Json compilation(std::vector<Source> const& _sources, std::string_view _version);
Expand Down
2 changes: 2 additions & 0 deletions liblangutil/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ set(sources
Scanner.cpp
Scanner.h
CharStreamProvider.h
SemanticDebugData.h
SemanticDebugDataTable.h
SemVerHandler.cpp
SemVerHandler.h
SourceLocation.h
Expand Down
16 changes: 12 additions & 4 deletions liblangutil/DebugData.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@

#pragma once

#include <liblangutil/SemanticDebugData.h>
#include <liblangutil/SourceLocation.h>
#include <optional>
#include <memory>
#include <utility>

namespace solidity::langutil
{
Expand All @@ -32,23 +34,27 @@ struct DebugData
explicit DebugData(
langutil::SourceLocation _nativeLocation = {},
langutil::SourceLocation _originLocation = {},
std::optional<int64_t> _astID = {}
std::optional<int64_t> _astID = {},
SemanticDebugData::ConstPtr _semanticDebugData = {}
):
nativeLocation(std::move(_nativeLocation)),
originLocation(std::move(_originLocation)),
astID(_astID)
astID(_astID),
semanticDebugData(std::move(_semanticDebugData))
{}

static DebugData::ConstPtr create(
langutil::SourceLocation _nativeLocation,
langutil::SourceLocation _originLocation = {},
std::optional<int64_t> _astID = {}
std::optional<int64_t> _astID = {},
SemanticDebugData::ConstPtr _semanticDebugData = {}
)
{
return std::make_shared<DebugData>(
std::move(_nativeLocation),
std::move(_originLocation),
_astID
_astID,
std::move(_semanticDebugData)
);
}

Expand All @@ -65,6 +71,8 @@ struct DebugData
langutil::SourceLocation originLocation;
/// ID in the (Solidity) source AST.
std::optional<int64_t> astID;
/// Extended semantic debug data that cannot be represented in Yul comments.
SemanticDebugData::ConstPtr semanticDebugData;
};

} // namespace solidity::langutil
Loading