CES 520 - WEEK 14 November 28, 2006 - Test and Verification
- Testing is different from debugging
- The purpose of debugging is to fix bugs and get the system working.
- Test to pass: Once the system works properly under normal conditions, the bug is "fixed".
- The purpose of testing is to find bugs in a "working" system.
- Test to fail: Exercise the system in all possible states, trying to "break" the system.
Tracking Bugs
- There should be some formal way to track bugs discovered in testing during the life of the project.
- Most defect-tracking tools are intended for software bugs only.
- Hardware bugs and their resolutions are traditionally recorded in engineering notebooks.
- These are formal documents, also used to prove invention dates for patents, etc.
- Other listings of available software bug-tracking tools are here and here.
- Anyone can enter a new bug into the error-tracking system.
- Anyone can add additional data or comments.
- Only the originator or the list administrator (often the project manager) can close an entry.
- The resolution should be recorded.
- The entry can be re-opened if the bug crops up again.
- Entries should include a complete description of the problem, including:
- Software version
- Procedure to reproduce the bug
- Copy of error messages, incorrect output, oscilloscope traces, etc.
- Any workarounds
- Bugs should be rated as to severity.
- Low severity - OK to ship.
- Example: Feature enhancement request
- Medium severity - OK to ship, with a later firmware upgrade to fix the bug.
- Example: A minor or rarely-encountered user inconvenience for which a workaround is available.
- High severity - Must be fixed before shipment.
- Example: An important feature that does not work correctly.
- Other data that should be in a bug report:
- Name/email/phone number of originator
- Name of responsible engineer (the "owner")
- Date bug was recorded
- Current status and date of last update
- Resolution date
- Software version changed
- Description of how the bug was fixed.
- Some companies have a procedure to allow customers to initiate bug reports.
Test Plan
- A test plan is not just a collection of test cases. That is more accurately called the test specification.
- ANSI/IEEE 829 describes a software test plan as: "A document describing the scope, approach, resources, and schedule of intended testing activities. It identifies test items, the features to be tested, the testing tasks, who will do each task, and any risks requiring contingency planning."
- Introduction
- Summary the items and features to be tested
- References to related documents and lower-level test plans.
- Test items
- Test items and version
- Characteristics of their transmittal media
- Related documents
- Any related bug reports
- Features
- Features to be tested, and the reasons for them.
- Features not to be tested, and the reasons for not testing them.
- Approach to testing for each major group of features
- Major activities, techniques, and tools
- Degree of comprehensiveness required
- Completion criteria
- Constraints on testing (Availability of test items and test software, deadline, etc.)
- Pass/fail criteria
- Criteria used to suspend testing, and which tests must be re-done if testing is suspended.
- Test deliverables
- Test documents and reports
- Test input and output data
- Method of storing test results (e.g. the software tool used)
- Test task list, including task interdependencies
- Required equipment and facilities
- Source for all needs which are not currently available
- Required personnel and any special skills needed
- Training options to provide necessary skills
- Responsible groups or individuals
- Schedule
- Risks and contingencies
- Approvals
- A recent theory is that rather than using a detailed test plan that is scripted in advance, use exploratory testing which involves simultaneous learning, test design, and test execution.
- Often combined with a test charter that defines the high-level test functionality, but leaves the details to the individual test designers.
Types of Test Procedures
- Can be formal or informal.
- Generally, the people doing the testing should not be the design engineers.
- Static testing
- Analysis of the code without actually executing it.
- Peer review of the source code is one method.
- Usually the term "static test" refers to some form of automated testing of source or object code.
- Highlights possible coding errors.
- Formal verification is mathematical proof that a system meets certain formal specifications.
- Create an abstract mathematical model of the system and prove mathematically that it is correct.
- Proving that the actual code is correct is called program verification.
- Can be used for hardware as well as software.
- Non-formal verification can prove incorrect operation, but not correct operation.
- Formal verification only proves correctness of the specific properties tested for.
- "Beware of bugs in the above code; I have only proved it correct, not tried it." - Donald Knuth
- Often used to test safety-critical systems.
- Dynamic testing
- Run the software and check the results.
- Special software can be used to measure the software under test as it runs the test suite. For example:
- Check for memory allocation errors.
- Check for test coverage - what percentage of the code has been tested?
- Performance analysis - Test the speed, response time, or memory usage.
- A profiler records parameters such as frequency and duration of interrupts or function calls.
- Instrumentation means to insert code into the software to collect data.
- Affects execution speed and code size.
- Automated tools exist to instrument code. For example:
- GNU's gprof adds instrumentation to C++ source code.
- Other tools add instrumentation to the compiled binary or modify the code at runtime.
- A sampling profiler samples the program counter using operating system interrupts.
- Allows software to run nearly full-speed.
- Gives a less-accurate statistical sample.
- Validation and Verification (V&V)
- Verification - Are we building the product right?
- Validation - Are we building the right product?
- Black box testing: The testers know nothing about the internal workings of the system.
- System designers often miss faults in their own design due to over-familiarity.
- Black box testing is most common at the highest test levels, but it can be used at all levels.
- Example: "user input validation" checks for correct response to arbitrary inputs.
- White box testing
- The design is visible to the testers.
- Peer reviews: Source code review by engineers who did not work on the module under test.
- Managers are normally prohibited from attending.
- Done early in the development cycle to catch problems early.
- It is 4x more costly to fix bugs later than in the peer review stage.
- Walkthrough: The design engineer describes the circuit or code to fellow engineers.
- Peer review is a good way to capture software faults before they become software failures.
- Open source software is effectively peer-reviewed by thousands.
- Code coverage is the amount of code exercised and tested by the test software.
- Can be tested by inspection or by automated means.
- Automated software to test code coverage generally adversely affects software performance.
- Three definitions
- Percentage of lines of code executed
- Percentage of condition tests executed and tested for all possible conditions
- Percentage of possible execution paths executed
- Path coverage implies the other two.
- Safety-critical systems may require 100% test coverage of some portions of the software.
- Stress testing
- Testing beyond normal operational limits, in an attempt to induce failure.
- "Test to failure" - No failure means no information.
- Identifies areas of insufficient design margin.
- A key component of performance testing - See what happens when a system's input is overloaded.
- "Strife" = "Stress/Life" = Life testing at high-stress levels.
- Usability testing
- Testing should involve "naive" users using the actual software and documentation.
- Each user should be given one or more tasks to accomplish, with no help from experts.
- The best procedure is to have observers taking notes as users attempt to perform the tasks.
- Users can "think aloud" as they work or an eye-tracking apparatus can be used.
- A simpler method is just to have the users themselves take notes on difficulties encountered.
- Criteria
- Time on task - How long does it take to complete basic tasks?
- Accuracy - How many mistakes? Are mistakes easily recoverable?
- Recall - How much do users remember after periods of non-use?
- Emotional response - How often do users throw things at the computer screen?
- Frequent testing with small a numbers of users (some say as few as 5) is more efficient than large surveys.
Test Levels
- Unit or module testing
- Incremental testing of individual system elements.
- Tests should not depend on any other modules. All interfaces to other modules should be simulated.
- Simplifies the integration phase.
- The unit test specification documents unit functionality.
- Regression testing guarantees that the unit test spec is always up to date.
- Unit testing is a key component of so-called extreme programming (XP).
- Write the test suite before writing the software.
- Tests must include all important characteristics.
- When the code passes all tests, then the coding is complete by definition.
- Integration testing
- Testing of modules together, after integration.
- Tests higher-level functionality and inter-process communication
- System testing
- Testing the entire embedded system, with all elements integrated.
- Purely black-box testing at this level.
- All hardware is included.
- The highest level of test-to-failure (i.e. trying to force errors)
- Acceptance test
- Test to customer requirements.
- A suite of black-box tests on the completed system.
- Results of each test are boolean: Pass or Fail.
- Failure(s) may or may not result in rejection of the product.
- Performed by or in conjunction with the customer.
- May be done in the manufacturer's facility or at the customer site.
Regression Testing
- A regression bug is an unintended defect introduced into software by a software modification.
- A regression test is a standard test run after each modification to be sure nothing was broken.
- Typically performed on individual software modules, but may also be done at higher levels.
- Tries to achieve highest possible code coverage.
- As each bug is fixed, the regression test suite is checked to be sure it fails if the bug reappears.
- Software tools can be used to run the test suite automatically.
- Automated testing requires up-front and continuing commitment of engineering resources.
- Ensures that tests are always performed the same way.
- Frees up test personnel for higher-value manual testing.
- Record-playback method records user or other inputs and checks for consistent output.
- Hard to ensure good code coverage
- Can make test software hard to maintain as the system is updated.
- Usually it is best to engineer the tests rather than relying on record-playback.
- Test software must have error recovery procedures to report the error and continue with testing.
- Does not eliminate the need for manual testing.
Timing analysis:
- The key testing issue with RTSs
- Metrics
- Execution time of a task, not including interruptions
- Used to compute computing resources required by the task
- Worst-case, best-case and average
- Response time
- Time from invocation to completion of a task
- Includes interference from other tasks and parts of the system
- End-to-end delay
- From input of a signal to the output of the response
- May involve several tasks, including communication time
- Jitter - variability in time
- Difference between best and worst case time of a task or system response
- Analyzing worst-case execution or response time
- Dynamic analysis, i.e. run it and measure it
- Time-consuming and error-prone
- The test instrumentation and code themselves affect the real-time response
- Static analysis - analyze the code and count the clock cycles
- Hardware affects the result. (e.g. memory cache can result in overestimation)
- Analyzing code can be difficult (e.g. number of loops in response to an input)
- Schedulability - Ability to guarantee that all task deadlines will always be met
- Trivial for static (table-driven) schedulers
- Response-time analysis - analyze worst-case response time of each task
- Utilization analysis - Based on fraction of CPU utilization for relevant subset of tasks
- End-to-end delay estimation
- Add response times for each task or message in the chain
- For distributed RTSs, it may not be possible to calculate all response times in one pass
- Delays on one node can lead to jitter on another node, which can propagate in several steps in both directions.
- Solution is to repeatedly calculate worst-case response times until answer does not change.
- Jitter estimation
- Requires best-case analysis as well as worst-case
- Jitter increases the number of possible execution paths - difficult to get good test coverage.
Assignment:
- Write a test plan for the class project. Be sure to include pass/fail criteria. Due in 1 week.