CES 520 - WEEK 5 September 19, 2006 -- Debugging
Design techniques to reduce bugs
- Debugging often burns half or more of total engineering design time
- The best debugging technique is not to have bugs in the first place
- Design reviews: Most studies conclude that code inspections are about 20 times more efficient at detecting bugs than testing.
- Reliable hardware:
- Read the data sheets! Look for obscure design oddities hidden in footnotes.
- Do careful timing analysis
- Check for excessive trace length causing timing skew (beware autorouted PC boards)
- Check for proper power supply decoupling and bypassing
- Well-structured, well-documented, easy-to-understand software.
- Interrupt Service Routines can be very difficult to debug
- Rule #1: Keep ISRs short. Complex calculations should go in another routine.
- Analyze execution time of all ISRs and worst-case stackup.
- Make all ISRs reentrant.
- Access all shared (global) variables and hardware registers atomically.
- Don't forget variables default to
static in Dynamic C.
- Do not call any non-reentrant functions.
- Some C library routines are not re-entrant
- All unused interrupts should branch to a null routine so you can set emulator to break on that routine.
- Hardware vs software tradeoff: Somtimes additional HW can simplify the SW.
- Use different control latches for different functions. Rule: Only one software function per latch.
- Serial port physical layer in hardware. Ditto PWM motor controller, quadrature decoder, etc.
- Extra HW timer for slow events instead of counting system ticks in SW
- Multiprocessor systems can be easier to debug, if the software is well-partitioned between processors.
- Each hardware/software function should be separately tested before testing the entire system
- Makes it much more likely that system will turn on with few problems
- May reveal subtle bugs that don't show up in system-level testing
- Test at the lowest level possible. The textbook's rule-of-thumb for embedded programming:
- When writing code, test it every 20-50 lines.
- Regression testing
- Standard suite of tests that are run repeatedly throughout the code development process
- May replace hardware input functions with dummy functions that generate simulated input
- Ideally should be run after every major firmware build
- Is it a hardware or software problem?
- Debugging embedded systems requires close collaboration between HW and SW engineers.
- Sometimes a HW bug is most easily fixed in SW.
- Sometimes SW bugs can be found most easily by adding debug HW. (See below)
- In-Circuit Emulator (ICE)
- Replaces the processor in the target system
- Overlay RAM: Replace target system's ROM program memory with emulator's internal RAM
- Allows fast download of code changes
- Allows inserting breakpoints so that the processor can be stopped temporarily
- True hardware emulators have hardware breakpoints
- Works even with ROM program memory
- Runs at full execution speed
- Traditional breakpoints are at specific addresses
- More sophisticated emulators can also break on other conditions
- Allows examining and changing the contents of registers and memory
- Other debugging features can include:
- Single-stepping through the code
- Event tracing (in real time)
- Measurement of execution time
- Histograms of code usage, etc.
- Traditional emulator:
- Remove processor from IC socket, plug in emulator pod, connected with cable to additional hardware
- Emulator RAM replaces system program ROM so breakpoints can be added
- Traditional emulator method hard to implement at high clock speeds
- Many surface-mount IC packages don't allow sockets
- Some emulators use connectors that clip over the top of the processor IC package
- The processor must have a special mode to tri-state all its pins
- Many processors have internal memory with no access to internal busses
- Some manufacturers make bondout versions of their chips for attaching emulators
- Solder-in adapter in place of the processor: Makes that particular board a test board forever
- A Built-in Debugger allows debugging without removing the processor
- Debugging hardware is designed into the processor chip itself
- Since it runs on the target processor, high clock speed is not a problem
- Typically has less-sophisticated features than a full hardware emulator
- Unlike a true hardware emulator, uses some processor resources
- Debugger hardware plugs into a special connector in the target system
- A simulator is a software program that runs on a PC to simulate the hardware
- Good tool to start software development before the hardware is ready
- Can be set up to simulate different memory configurations
- Also simulates on-chip peripherals
- Not real-time
- Debugger, also known as a software monitor
- Write your own or purchase. Software stored in ROM in the target hardware.
- User interface on PC via serial port, Ethernet, etc.
- Can be a simple command-line terminal program or elaborate debugging environment.
- Typical features:
- Read and write memory and I/O ports
- Report warnings and error conditions
- Set breakpoints in the code
- Download code into RAM or Flash memory
- Start and stop execution from the keyboard
- Typically requires considerable program memory and processor overhead for its operation
- May or may not be removed before final shipment.
- Small embedded controllers are a special problem for debugging
- Often there are no embedded debugging features.
- Not enough pins to support external program RAM memory
- Some vendors offer different versions of the same processor with different program memory types
- Develop code on the version with RAM program memory, ship with ROM-memory version
- Fortunately, code size is limited on these processors anyway, simplifying debugging
- Hardware tools:
- Logic analyzers
- Agilent and Tektronix own the market
- Connects to signals on PC board, displays many signal traces on CRT
- Unlike an oscilloscope, does not display voltage, only HI or LOW
- Timing mode: Latch data continuously on internal or external clock edge
- Can be used to measure timing
- State mode: Latch data on a signal state (confusingly called a clock)
- Some logic analyzers can disassemble op codes and link to your source code
- Can trigger the trace on complicated events
- Displays bit-by-bit or entire bus as one signal
- User programs the analyzer with signal names
- Connect to test points or use special multi-pin connectors, e.g. AMP's Mictor
- Things logic analyzers can do that emulators cannot:
- Analyze signals not connected to a processor pin
- Show true real time
- Input/output external trigger signals
- Some FPGA vendors offer on-chip internal logic analyzers interfaced through JTAG port
- Oscilloscopes
- May be the only tool available in production test environments
- Shows things logic analyzers may not
- Related analog signals (ADC input, DAC output)
- Bad clock signal
- Bus conflicts (voltage in-between low and high)
- Power supply noise
- Very long time periods
- Useful for measuring execution time of software routines.
- Software sets I/O bit to 1 at start, 0 at finish
- Bandwidth must be 3x to 5x clock speed to show edges.
- Scope probes
- Should have higher bandwidth than the scope
- 10x probe has lower capacitance
- Must adjust calibration capacitor for flat bandwidth
- Digital oscilloscopes
- Infinite screen persistence
- Can store very long traces - good detail of events far apart in time
- Computations on the data: Peak or RMS voltage, frequency, time
- Some digital oscilloscopes include a built-in basic logic analyzer.
- Allows displaying many more channels
- Alows triggering the analog display on a complex digital event.
- Sample rate must be more than 2x signal bandwidth
- Aliasing can cause misleading displays
- For periodic signals, oscilloscope can dither clock phase to fill in the dots
- The computer-like user interface can be an annoyance
- Buy a unit with easy access for commonly-used features
- To save cost, debug hardware may not be loaded in the final product
- Software must function the same whether debug hardware is present or not
- Add extra program RAM to support a software debugger or other debugging software.
- A simple LED can be surprisingly useful.
- At first turn-on, test software blinks the LED to prove the processor and program memory are working
- During normal operation, software blinks the LED to confirm nothing is hung up
- "Morse code" mode - LED blinks different number of times for different error codes
- LED works even if much of the rest of the system is non-functional
- Asserts/exceptions
- Allow the programmer to trap conditions that should not exist
- Normally removed in production code
- Be sure to throroughly test the version without the asserts
- Write your assert macro such that it has no affect on system operation
- No function calls. Shouldn't change memory, interrupt state, etc.
- Use
if ... else construct in case there's an unmatched else after it in the code.
- In embedded systems with no console, replace
printf() with a software interrupt
- Exception routine pops the stack, saves the address of the assert and an error code
- Signal user with LED or set an emulator breakpoint on the exception routine
- Other special temporary test software at turn-on. For example:
- RAM tests
- Test must run only in ROM. No function calls (uses the stack).
- Read and write 0x55 then 0xAA to a single address.
- Tests data bus, chip select, bad device.
- Write address LSBs as data to all memory locations. Read back. Write MSBs. Read back.
- Test switches and controls - blink LED(s) in a different pattern for each switch
- Direct control of I/O ports from front panel or monitor program
- Include pullup resistors and a socketed jumper on the data bus for the "dead computer" test:
- Uses Rabbit "RST 38" command (Z-80 "RST 7"). Op code = 0xFF
- Remove jumper to disconnect data bus. Pull-up resistors generate continuous 0xFF.
- Stack pointer decrements - address lines sequence through all address space.
- RST command pushes the stack so data lines cycle through all values as well.
- You can check each address and data line with an oscilloscope.
- A great diagnostic for production technicians.
- Other processors require different op code. Requires pull-up as well as pull-down resistors.
- Debug hardware useful for situations where an emulator/debugger cannot be connected because of:
- Physical location (e.g. problem may only show up at customer site)
- Not enough processor resources to support a full debug environment
- No possible physical connection on PC board (e.g. ball-grid-array IC package)
- Compared to using an emulator, debug hardware allows device to run at full speed. (no breakpoints)
- Allows triggering on events that are outside an emulator's trace memory.
- Makes it easier to time-correlate outside events with what's going on in software.
- Hardware debug output
- Processor writes event codes to an I/O port
- Capture on logic analyzer: Wire I/O strobe, address, and data to Mictor connector
- I/O address identifies the particular SW routine, I/O data is status or other data
- Typical events to capture:
- Each entry and/or exit from each interrupt routine
- Each execution of a major software function
- Each communication with a controller or other processor
- If no I/O ports are available, can write to an unused memory address
- If no unused memory space available, can write to external ROM
- This technique is good for measuring software timing.
- Sometimes toggling a single I/O pin is sufficient to measure timing
- Circular trace buffer
- If hardware output is not usable, write event codes to memory
- Use a circular buffer with a special code (e.g. 0xFF) to indicate last value written
- Allows much longer trace time than the emulator's trace function
- Can also store a timer count along with event code to get time information
- Memory dump
- Connect an unused interrupt input to a switch or logic analyzer trigger output
- Interrupt routine sequentially reads all memory, which is captured by logic analyzer
Miscellaneous debugging tips and tricks
- Design phase:
- Add lots of test points for connecting an oscilloscope.
- Don't forget grounds for the scope probe
- Add test point(s) to unused port outputs for software troubleshooting
- Critical signals important for troubleshooting:
- Basic system timing signals: clock, read, write, processor status
- Power supplies
- Memory and device chip selects
- Programmable logic devices should be designed with a debug pin or two connected to test point(s)
- Consider adding a logic analyzer connector for signals likely to need probing
- Amp's Mictor connectors are small and supported by HP and Tektronix logic analyzers
- Mictor adapters available to allow connecting oscilloscope to logic analyzer connector
- Product design: How will you attach a logic analyzer cable with the board installed in the product?
- Debugging Rules:
- When debugging, always try to find the root cause of the problem.
- Never change more than one thing at a time.
- If the change doesn't fix it, change it back before trying something else.
- For complicated problems, keep a troublshooting log.
- Check simple things first
- Visual inspection: Shorts, ICs installed backwards, etc.
- Are any parts excessively hot? (finger test)
- Does the device have proper power supply voltage(s)? (voltmeter)
- Check CPU control inputs: Reset, memory wait, interrupts, clock, etc. (oscilloscope)
- "Divide and conquer" - Isolate the problem to a specific area.
- Comment out specific functions or code sections to see if the problem persists.
- "Binary search" - Comment out half the code to see if the other half fails.
- Similar techniques sometimes work for hardware. Disconnect devices one at a time.
- Debugging hardware and software can affect the real-time response and thus functionality
- Does the problem go away when the emulator is turned on (or "nodebug" turned off)?
- Setup/hold problem with some device (because it's working with the processor slowed down)
- Race condition
- Does the problem go away when the emulator is turned off?
- Not enough CPU resources to run application and emulator at the same time
- Change functions that are known to work correctly to "nodebug"
- Intermittent softare problems often caused by:
- Unitialized pointers
- Unitialized static variables
- Buffer or array overflow
- Stack over/underflow. Worst-case stack size is very difficult to predict.
- Some real-time operating systems include stack monitor. Turn it on during debugging phase.
- Program logic analyzer or debugger to break on address near end of stack area
- Preload stack area with 0x55. Makes it easy to see maximum stack usage.
- C's
malloc is dangerous in embedded systems.
- Best to allocate data structures statically if possible.
- Excessive total execution time of interrupt service routines that only occasionally stack up
- Intermittent hardware problems often caused by
- Setup and hold times with insuficient margin
- Insufficient power supply decoupling
- Electromagnetic interference (EMI)
- Poor PC board layout (long parallel traces, etc.)
- External interference
- Try disabling all interrupts except the one that causes the problem
- If problem disappears, turn on interrupts one-by-one until problem reappears
- Look for shared-data problem or insufficient execution time problem
- Slow down timer interrupts to make it easier to see what is happening
- Check trace memory or circular trace buffer (see above) around the error
- Look for patterns around the area where the error occurred.
- If the same interrupt always occurs in the same place in the idle loop, may be a shared-data problem.
- If the same interrupt always occurs twice in a row it may be a non-reentrant interrupt service routine.
- Is the problem correlated with a specific software revision? (What changed?)
- Archive all software builds for later troubleshooting
- Program unused program ROM with single-word jump instructions to an error-handling routine
- Do the same with unused interrupt vectors
- Error handler may just be an infinite loop. Set debugger breakpoint on it.
Assignments:
- Read An Embedded Software Primer, Chapter 10
- Read Embedded Systems Design, Chapter 4