Using sdb as a library¶

sdb is not just a standalone debugger – it exposes a Python API that lets you embed an sdb REPL inside your own tool or run sdb pipelines programmatically. A real-world example of this is GhostWire, a remote kernel introspection tool for NVIDIA BlueField DPUs that builds a drgn.Program backed by TCP (or RDMA) memory reads and then drops the user into sdb.

`sdb.open_dump()` – the fastest way to get started¶

For crash/core dump analysis, open_dump() is the simplest entry point. It creates the drgn.Program, loads debug info, and initialises the sdb runtime in a single call:

import sdb

sdb.open_dump("vmlinux", "vmcore")

pools = sdb.run("spa | member spa_name")
for p in pools:
    print(p.string_().decode())

You can also pass extra symbol search paths and custom command directories:

sdb.open_dump(
    "vmlinux", "vmcore",
    symbol_search=["/lib/modules/6.1.0/extra"],
    command_paths=["/opt/mytools/sdb_commands"],
)

`sdb.connect()` and `sdb.run()` – the programmatic API¶

If you need more control over how the drgn.Program is created (e.g. for live kernel debugging, remote memory targets, or custom memory readers), use connect() + run():

import drgn, sdb

prog = drgn.Program()
prog.set_core_dump("vmcore")
prog.load_debug_info(["vmlinux"])

sdb.connect(prog)

# Run a pipeline and get results as a list
pools = sdb.run("spa | member spa_name")
for p in pools:
    print(p.string_().decode())

# Count threads
count = sdb.run("threads | count")
print(f"Thread count: {count[0].value_()}")

sdb.run() eagerly evaluates the pipeline and returns a concrete list. For lazy evaluation (e.g. processing very large result sets), use sdb.invoke() which returns a generator:

for obj in sdb.invoke([], "spa | member spa_name"):
    name = obj.string_().decode()
    print(f"Pool: {name}")

Jupyter notebook integration¶

The sdb API works naturally in Jupyter notebooks:

# Cell 1: Setup
import sdb
sdb.open_dump("/path/to/vmlinux", "/path/to/vmcore")

# Cell 2: Explore
tasks = sdb.run("threads")
print(f"Found {len(tasks)} threads")

# Cell 3: Drill down
names = sdb.run("threads | member comm")
for name in names:
    print(name.string_().decode())

Multiple sdb.run() calls work independently – each one parses and evaluates a fresh pipeline. You can also pass the output of one pipeline as input to another:

interesting = sdb.run("threads | filter 'obj.state.value_() != 0'")
stacks = sdb.run("stacks", input_objs=interesting)

The `sdb.start()` entry point (REPL)¶

For interactive use, hand sdb a drgn.Program and let it run the REPL:

import drgn
import sdb

prog = drgn.Program()
prog.set_kernel()
prog.load_default_debug_info()

sdb.start(prog)

sdb.start() accepts several optional parameters:

sdb.start(
    prog,
    command_paths=["/path/to/my/commands"],  # extra command directories
    prompt="mydb> ",                          # custom REPL prompt
    pre_cmd_hook=my_refresh_fn,               # called before each command
    eval_cmd="stacks | count",                # run one command and exit
    history_file="~/.mydb_history",           # readline history path
)

Running pipelines with `sdb.invoke()`¶

sdb.invoke() is the lower-level API that returns a generator:

import sdb

# After sdb.connect() or sdb.start()
for obj in sdb.invoke([], "spa | member spa_name"):
    name = obj.string_().decode()
    print(f"Pool: {name}")

invoke returns a generator of drgn.Object values. The first argument is the initial input to the pipeline (an empty list to start from scratch).

You can also build and execute pipelines at a lower level:

from sdb.pipeline import execute_pipeline
from sdb.command import get_registered_commands

cmds = get_registered_commands()
pipeline = [cmds["spa"]([], "spa"), cmds["count"]([], "count")]
pipeline[0].isfirst = True
pipeline[-1].islast = True

for obj in execute_pipeline([], pipeline):
    print(obj)

The `pre_cmd_hook` pattern¶

Many tools that embed sdb maintain caches or transport state that must be refreshed between commands. The pre_cmd_hook parameter solves this.

It accepts any callable with no arguments. sdb calls it immediately before evaluating each REPL command (but not for meta-commands like %session).

A typical pattern for remote debuggers:

sdb.start(
    prog,
    command_paths=[sdb_commands_dir],
    prompt="sdb[remote]> ",
    pre_cmd_hook=transport.bump_generation,
)

The effect: each new command sees fresh host memory, while reads within a single command’s execution are cached for performance.

Typical use cases:

Invalidating a memory cache so each command sees fresh data.
Reconnecting a transport layer if the connection was dropped.
Refreshing authentication tokens for a remote target.
Logging or telemetry (count commands, measure latency).

Loading custom commands¶

Pass directories or files containing your sdb.Command subclasses via command_paths:

sdb.start(prog, command_paths=[
    "/opt/mytools/sdb_commands",
    "/home/me/debug/my_walkers.py",
])

You can also load commands after initialization:

sdb.load_external_commands("/opt/mytools/sdb_commands")
sdb.register_commands()

See Creating and loading external commands for how to write custom commands.

Manual setup (without `sdb.connect()` or `sdb.start()`)¶

Note

For most use cases, sdb.connect(prog) is the recommended way to set up the runtime without a REPL. The manual approach below is only needed if you want to skip thread/frame initialisation or customise the setup sequence.

If you need tighter control you can set up the sdb runtime manually:

import drgn
import sdb
import sdb.target as sdb_target

prog = drgn.Program()
prog.set_kernel()
prog.load_default_debug_info()

sdb_target.set_prog(prog)
sdb_target.set_thread(next(prog.threads()).object)
sdb_target.set_frame(-1)

sdb.register_commands()

# Now you can call invoke() directly
for obj in sdb.invoke([], "slabs | head 3"):
    print(obj)

Real-world example: GhostWire¶

GhostWire is a remote Linux kernel introspection tool that runs on an NVIDIA BlueField DPU. It reads the host machine’s physical memory over the network (using TCP or RDMA), constructs a drgn.Program from those remote reads, and then offers the user either a plain drgn REPL or a full sdb session.

The interesting part is how the remote drgn.Program is built. GhostWire’s transport layer provides a read_phys(address, size) function that fetches physical memory from the host over the network. The program construction looks roughly like this:

import drgn

# 1. Read vmcoreinfo from the host (needed for KASLR and type info)
vmcoreinfo = transport.read_phys(host.vmcoreinfo_phys, host.vmcoreinfo_size)

# 2. Create a Program with the host's platform and vmcoreinfo
prog = drgn.Program(
    platform=drgn.Platform(drgn.Architecture.X86_64),
    vmcoreinfo=vmcoreinfo,
)

# 3. Register physical memory segments backed by the network transport
read_fn = transport.make_drgn_read_fn()
for r in host.ranges:
    prog.add_memory_segment(
        address=r.start,
        size=r.end - r.start,
        read_fn=read_fn,
        physical=True,
    )

# 4. Initialize kernel page-table walking and KASLR relocation
prog.set_linux_kernel_custom(vmcoreinfo=vmcoreinfo, is_live=True)

# 5. Load debug info (vmlinux with DWARF)
prog.main_module("vmlinux", create=True).try_file("/path/to/vmlinux")

At this point prog is a fully functional drgn.Program that resolves types, walks page tables, and reads kernel memory – all transparently backed by network reads. The user doesn’t need to know or care that the data comes from a remote machine.

The sdb integration is then just a thin bridge:

import os
import sdb

def run_sdb(prog, transport):
    sdb_commands_dir = os.path.join(
        os.path.dirname(__file__), "sdb_commands"
    )
    sdb.start(
        prog,
        command_paths=[sdb_commands_dir],
        prompt="sdb[gw]> ",
        pre_cmd_hook=transport.bump_generation,
    )

Key points:

``command_paths`` loads GhostWire-specific commands from a directory next to the bridge module.
``pre_cmd_hook`` bumps the transport cache generation so each REPL command sees fresh host memory. GhostWire uses a generation-based page cache: reads within a single command are cached (for fast page-table walks), but the cache is invalidated between commands.
``prompt`` is customized to sdb[gw]> so the user knows they are in a GhostWire sdb session.
sdb is an optional dependency – GhostWire imports it lazily and only when the user passes --sdb.

GhostWire also adds its own domain-specific commands. For example, a command to print the host’s firmware version:

from typing import ClassVar, Iterable, List

import drgn
import sdb


class GwFirmwareVersion(sdb.Command):
    """Print the firmware version from the kernel command line."""

    names: ClassVar[list[str]] = ["gw_firmware_version"]
    load_on: ClassVar[list[sdb.Runtime]] = [sdb.Kernel()]

    def _call(self, objs: Iterable[drgn.Object]) -> Iterable[drgn.Object]:
        cmdline = sdb.get_object("saved_command_line").string_().decode()
        for part in cmdline.split():
            if part.startswith("fw_ver="):
                print(part.split("=", 1)[1])
                break
        else:
            print("firmware version not found in command line")
        return iter(())

This demonstrates how an embedding tool can expose its own domain-specific commands through sdb’s REPL while reusing the entire sdb infrastructure – pipelines, tab completion, help, recording, and everything else come for free.

Using sdb as a library¶

sdb.open_dump() – the fastest way to get started¶

sdb.connect() and sdb.run() – the programmatic API¶

Jupyter notebook integration¶

The sdb.start() entry point (REPL)¶

Running pipelines with sdb.invoke()¶

The pre_cmd_hook pattern¶

Loading custom commands¶

Manual setup (without sdb.connect() or sdb.start())¶

Real-world example: GhostWire¶

`sdb.open_dump()` – the fastest way to get started¶

`sdb.connect()` and `sdb.run()` – the programmatic API¶

The `sdb.start()` entry point (REPL)¶

Running pipelines with `sdb.invoke()`¶

The `pre_cmd_hook` pattern¶

Manual setup (without `sdb.connect()` or `sdb.start()`)¶